臺大管理論叢 NTU Management Review VOL.27 NO.2S

臺大管理論叢

第

卷第

期

were implemented using Microsoft Visual C++ 2010. Section 4.1 introduces the datasets.

Sections 4.2 and 4.3 present the two set of experiments, respectively.

4.1 The Synthetic and Real Datasets

The procedure for generating the synthetic datasets is as follows. A transaction contains

at most 15 attributes, i.e., the length of a transaction is at most 15. An interval of quantitative

values between 0 and 511 is generated for each attribute. The number of transactions, i.e., the

size

of a dataset, is set at 100,000 for each synthetic dataset.

The first real dataset, AirQuality, contains indices of the daily air quality in Taiwan in

2008, containing such measures as the concentrations of suspended particulates, sulfur

dioxide, and nitrogen dioxide. The data was collected by the Taiwan Environmental

Protection Administration and can be downloaded from the Environmental Protection

Administration (2015). We selected five indices from the AirQuality dataset, namely,

suspended particulates, sulfur dioxide, nitrogen dioxide, carbon monoxide, and ozone.

According to Environmental Protection Administration (2015), these are the key indices used

to measure the level of air quality. For every observation station, the original AirQuality

dataset lists hourly readings for each index. However, to mine FU2Ps, we modify the dataset

to obtain five intervals formed by the daily minimum and maximum values of the five

indices at each observation station. There are 26,527 transactions in total.

The second real dataset, DY2009, contains a variety of data on daily weather conditions

in Taiwan in 2009, including atmospheric pressure, temperature, and relative humidity

readings. The data was collected by the Department of Atmospheric Science, National

Taiwan University, and can be downloaded from the Taiwan Typhoon and Flood Research

Institute (2015). For each transaction, we selected the following five attributes from the

DY2009 dataset: atmospheric pressure at the observation stations, atmospheric pressure at

sea level, temperature, vapor pressure, and relative humidity. The original dataset is reduced

to a set of transactions, each of which contains five intervals that indicate the daily minimum

and maximum values of the five attributes with respect to an observation station. There are

8,187 transactions in the DY2009 dataset. In addition, the probability density function

associated with each interval in the synthetic and real datasets is set as a uniform

distribution.

4.2 The Experiment Results under Various Parameter Settings

The SFC algorithm uses three parameters, i.e.,

, and

, in the clustering process. To