臺大管理論叢 NTU Management Review VOL.27 NO.2S

為頻繁單變量不確定樣式產生摘要

explore the performance of the SFC algorithm under different parameters, we set

from 0 to

1.0 at an interval of 0.1,

from 0.01 to 0.15 at an interval of 0.01, and

from 1.0 to 3.0 at an

interval of 0.05. These parameters are set by experience. The

summarization quality

and

runtime of each setting are reported in Figs. 10 and 11. The summarization quality is affected

by two factors. The first factor is the

total distance

, which is the summation of the distances

between the medoid and the FUPs in each cluster. The second factor is the number of

clusters. We formally define the total distance as equation (7):

(7)

In equation (7),

is the number of clusters,

is the medoid of cluster

, and

represents

FU2P belonging to cluster

dist

(

．

) returns the distance between two

FU2Ps, which is defined in equation (4). A lower total distance implies a better clustering

result. However, a lower total distance may result from a high number of clusters. Since we

also prefer a low number of clusters, we define a quality index as equation (8). Therefore, a

lower quality index indicates a better summarization quality.

quality index = total distance × number of clusters

(8)

In the experiments, we set minimum support to 0.013 for the synthetic dataset and 0.03

for the two real datasets. We found that the quality indices derived by using a fixed

and a

fixed

but various

do not change significantly. This is because not many clusters were

merged in the process. In fact, most attempts failed due to the negative silhouette criterion

(i.e. there exists no FU2P with a negative silhouette in the merged cluster). The runtimes

derived also present similar results. Figures 4(a), 4(b), and 4(c) show the quality index under

various parameter settings in the three datasets, respectively. In the figures, the x axis and y

axis indicate different values of

and

, respectively. Because the quality indices derived by

using different values of

do not vary very much at the same level of

and

, the z axis

presents the average quality index derived by averaging the quality indices at the same level

and

. Figures 5(a), 5(b), and 5(c) show the runtimes required for generating summary

from FU2Ps under various parameter settings in the three datasets, respectively, where the z

axis presents average runtimes. In Figs. 6(a), 6(b), and 6(c), we present the number of