臺大管理論叢
第
27
卷第
2S
期
33
univariate uncertain data.
To decompose FU2Ps into groups, we adopt a hierarchical clustering technique. A
hierarchical clustering technique further decomposes clusters that are not compact in the
current iteration, which is an important advantage over non-hierarchical clustering
techniques. In addition, the proposed method uses medoids to serve as cluster centers,
instead of means, because we need true FU2Ps to serve as representative FU2Ps.
The structure of this article is as follows. Section 1 gives an introduction. We review
related studies in Section 2. Section 3 elaborates on the proposed method. The experiment
results are presented in Section 4. Section 5 concludes the study.
2. Literature Review
In this section, we review the studies concerning retrieving a concise representation of
frequent patterns and the studies about mining uncertain data in Section 2.1 and Section 2.2,
respectively.
2.1 Concise Representations of Frequent Patterns
In the literature, there are two directions that will allow us to derive a concise
representation of frequent patterns: lossless compression methods and lossy approximation
methods (Pasquier, Bastide, Taouil, and Lakhal, 1999). In the following discussion, both
“pattern” and “itemset” represent a set of items. In the lossless compression methods, the
supports of frequent patterns can be fully derived from the concise representation. There are
two main methods used in lossless compression. First, Pasquier et al. (1999) proposed
mining closed frequent patterns. A frequent pattern is a closed frequent pattern if and only if
the support of any superset of this frequent pattern does not equal the support of this frequent
pattern. The full set of frequent patterns and the support of each frequent pattern can be
derived from the set of closed frequent patterns. Calders and Goethals (2007) proposed
mining the
non-derivable itemsets
from the database. The authors used the deduction rules to
derive the upper bound and lower bound of itemsets. If the upper bound and lower bound of
an itemset are identical, which means the support of this itemset can be derived from other
itemsets, the itemset is removed. The remaining frequent itemsets comprise the non-
derivable itemsets.
In the lossy approximation methods, the supports of frequent patterns cannot be fully
recovered. There are many methods that have been used in lossy approximation. Bayardo
(1998) proposed mining maximal frequent patterns, which precisely generates the full set of