

為頻繁單變量不確定樣式產生摘要
36
uncertain data have also been addressed (Liu and Wang, 2013; Liu, 2014); Liu (2013) also
explored mining univariate uncertain data streams. For mining tuple uncertain data, Sun et
al. (2010) proposed two algorithms, the p-Apriori and TODIS, to mine frequent patterns by
utilizing the bottom-up framework and the top-down framework, respectively. Several
studies contributed to the literature on retrieving frequent patterns from itemset uncertain
data. In general, they can be classified as the Apriori-based (Abd-Elmegid, El-Sharkawi,
El-Fangary, and Helmy, 2010; Chui and Kao, 2008; Chui, Kao, and Hung, 2007), the
FP-growth-based (Leung, Carmichael, and Hao, 2007; Leung, Mateo, and Brajczuk, 2008),
and the H-mine-based methods (Aggarwal, Li, Wang, and Wang, 2009). Of these, the
H-mine-based method performed better than the other methods (Aggarwal et al., 2009).
Recently, Muzammal and Raman (2010a, 2010b, 2011) proposed pioneering studies for
mining sequential patterns from uncertain data by utilizing the candidate generate-and-test
methodology. Later, Muzammal (2011) extended their studies by using the pattern growth
methodology. In their studies, the researchers identified two kinds of uncertainty in the
sequence database: source-level uncertainty and event-level uncertainty. The former means
that the source that generates each transaction is uncertain. The latter means that each
transaction in a sequence is uncertain, i.e., it contains tuple uncertain data.
As we state in the end of Section 1, all of the above discussed methods concerning
concise representations handle precise data. None of them can be directly applied to
univariate uncertain data. Furthermore, to the best of our knowledge, construction of a
concise representation of frequent patterns in univariate uncertain data has not been
addressed in the literature. In this study, we propose a novel algorithm for summarizing the
full set of FU2Ps. Our proposed algorithm, the SFC (Summarize FU2Ps by Clustering)
algorithm, is a lossy approximation method. Aside from mining univariate uncertain data, the
SFC algorithm also differs from most of the existing approximation methods that also adopt
the clustering technique in that our method uses real frequent patterns (medoids) as
representative patterns, instead of using virtual patterns (means). The medoid of a cluster is a
frequent pattern in the cluster that has the smallest average distance between this frequent
pattern and other frequent patterns in the cluster. The mean of a cluster is usually the union
of the frequent patterns in the cluster. Using means is prone to bias caused by outliers or
noises, and the means are not real patterns, which may mislead users. In contrast, outliers or
noises do not affect mediods, and medoids are real patterns. In addition, a very important
advantage of using medoids is that the clustering process is greatly accelerated by recording
the distance between each pair of frequent patterns in advance. During the clustering process,