為頻繁單變量不確定樣式產生摘要
54
members. Therefore, we can derive the quality index of a set of MFU2Ps by using equations
(7) and (8). Tables 9, 10, and 11 list the performance comparisons between the SFC
algorithm and the MFU2Ps derived from the three datasets, respectively. In these tables,
#FU2Ps means the number of FU2Ps and #Clusters means the number of clusters (it is also
the number of MFU2Ps). SFC (best) lists the statistics of the best summarization quality and
SFC (worst) lists the statistics of the worst summarization quality. The total distance, number
of clusters, and quality index listed in Table 9 are derived by setting
w
to 0.9,
ξ
to 0.01, and
δ
to 1.0 for SFC (best) and
w
to 0.0,
ξ
to 0.15, and
δ
to 1.0 for SFC (worst). In Table 10 and
Table 11,
w
is set to 1.0,
ξ
to 0.01, and
δ
to 1.0 for SFC (best).
w
is set to 0.0,
ξ
to 0.07, and
δ
to 1.0 for SFC (worst) in Table 10 and 0.0, 0.12, 1.0 in Table 11. In both real datasets, the
summarization quality of the SFC algorithm is much better than the summarization quality of
the MFU2Ps even in the worst case. The number of MFU2Ps in each real dataset is nearly
the number of FU2Ps. In the synthetic dataset, the best summarization quality is much better
than the summarization quality of MFU2Ps. Although the worst case performs worse than
the MFU2Ps, most of the settings still do better than the MFU2Ps. For generating summary,
the set of FU2Ps has to be retrieved first. In Tables 9, 10, and 11, Runtime
1
means the
runtime required for retrieving FU2Ps (for the SFC algorithm) or MFU2Ps; Runtime
2
means
the runtime required for generating summary from FU2Ps. The runtime required for
retrieving FU2Ps and then generating summary is roughly the same as the runtime required
for generating MFU2Ps.
In the second-last paragraph of the previous section, we introduce a representative
FU2P in the DY2009 dataset, [temperature 26°C to 30°C, relative humidity 67% to 76%];
All of its 94 cluster members are MFU2Ps. If we present MFU2Ps to users, the users have to
check all 94 FU2Ps even these FU2Ps looks very similar. Instead, the users only need to see
one representative FU2P when we summarize the FU2Ps.
Table 9 The Comparison on the Synthetic Dataset
SFC (best)
SFC (worst)
MFU2Ps
#FU2Ps
12409
12409
12409
Total distance
612.91
1859.86
753.46
#Clusters
220
3005
1085
Quality index
134840.20
5588876.69
817504.10
Runtime
1
(s)
121.36
121.36
132.15
Runtime
2
(s)
9.81
12.56
-