臺大管理論叢 NTU Management Review VOL.27 NO.2S

臺大管理論叢

第

卷第

期

univariate uncertain data.

To decompose FU2Ps into groups, we adopt a hierarchical clustering technique. A

hierarchical clustering technique further decomposes clusters that are not compact in the

current iteration, which is an important advantage over non-hierarchical clustering

techniques. In addition, the proposed method uses medoids to serve as cluster centers,

instead of means, because we need true FU2Ps to serve as representative FU2Ps.

The structure of this article is as follows. Section 1 gives an introduction. We review

related studies in Section 2. Section 3 elaborates on the proposed method. The experiment

results are presented in Section 4. Section 5 concludes the study.

2. Literature Review

In this section, we review the studies concerning retrieving a concise representation of

frequent patterns and the studies about mining uncertain data in Section 2.1 and Section 2.2,

respectively.

2.1 Concise Representations of Frequent Patterns

In the literature, there are two directions that will allow us to derive a concise

representation of frequent patterns: lossless compression methods and lossy approximation

methods (Pasquier, Bastide, Taouil, and Lakhal, 1999). In the following discussion, both

“pattern” and “itemset” represent a set of items. In the lossless compression methods, the

supports of frequent patterns can be fully derived from the concise representation. There are

two main methods used in lossless compression. First, Pasquier et al. (1999) proposed

mining closed frequent patterns. A frequent pattern is a closed frequent pattern if and only if

the support of any superset of this frequent pattern does not equal the support of this frequent

pattern. The full set of frequent patterns and the support of each frequent pattern can be

derived from the set of closed frequent patterns. Calders and Goethals (2007) proposed

mining the

non-derivable itemsets

from the database. The authors used the deduction rules to

derive the upper bound and lower bound of itemsets. If the upper bound and lower bound of

an itemset are identical, which means the support of this itemset can be derived from other

itemsets, the itemset is removed. The remaining frequent itemsets comprise the non-

derivable itemsets.

In the lossy approximation methods, the supports of frequent patterns cannot be fully

recovered. There are many methods that have been used in lossy approximation. Bayardo

(1998) proposed mining maximal frequent patterns, which precisely generates the full set of