Table of Contents Table of Contents
Previous Page  50 /342 Next Page
Information
Show Menu
Previous Page 50 /342 Next Page
Page Background

為頻繁單變量不確定樣式產生摘要

50

explore the performance of the SFC algorithm under different parameters, we set

w

from 0 to

1.0 at an interval of 0.1,

ξ

from 0.01 to 0.15 at an interval of 0.01, and

δ

from 1.0 to 3.0 at an

interval of 0.05. These parameters are set by experience. The

summarization quality

and

runtime of each setting are reported in Figs. 10 and 11. The summarization quality is affected

by two factors. The first factor is the

total distance

, which is the summation of the distances

between the medoid and the FUPs in each cluster. The second factor is the number of

clusters. We formally define the total distance as equation (7):

(7)

In equation (7),

n

is the number of clusters,

m

i

is the medoid of cluster

C

i

, and

f

j

represents

j

th

FU2P belonging to cluster

C

i

.

dist

(

,

) returns the distance between two

FU2Ps, which is defined in equation (4). A lower total distance implies a better clustering

result. However, a lower total distance may result from a high number of clusters. Since we

also prefer a low number of clusters, we define a quality index as equation (8). Therefore, a

lower quality index indicates a better summarization quality.

quality index = total distance × number of clusters

(8)

In the experiments, we set minimum support to 0.013 for the synthetic dataset and 0.03

for the two real datasets. We found that the quality indices derived by using a fixed

w

and a

fixed

ξ

but various

δ

do not change significantly. This is because not many clusters were

merged in the process. In fact, most attempts failed due to the negative silhouette criterion

(i.e. there exists no FU2P with a negative silhouette in the merged cluster). The runtimes

derived also present similar results. Figures 4(a), 4(b), and 4(c) show the quality index under

various parameter settings in the three datasets, respectively. In the figures, the x axis and y

axis indicate different values of

ξ

and

w

, respectively. Because the quality indices derived by

using different values of

δ

do not vary very much at the same level of

ξ

and

w

, the z axis

presents the average quality index derived by averaging the quality indices at the same level

of

ξ

and

w

. Figures 5(a), 5(b), and 5(c) show the runtimes required for generating summary

from FU2Ps under various parameter settings in the three datasets, respectively, where the z

axis presents average runtimes. In Figs. 6(a), 6(b), and 6(c), we present the number of