

臺大管理論叢
第
27
卷第
2S
期
35
methods have tried to find the most important frequent patterns, such as
statistically
significant frequent itemsets
(Kirsch, Mitzenmacher, Pietracaprina, Pucci, Upfal, and Vandin,
2009),
top-k frequent patterns
(Han, Wang, Lu, and Tzvetkov, 2002),
top-k redundancy-
aware patterns
(Xin, Cheng, Yan, and Han, 2006) and
error-tolerant frequent itemsets
(Yang,
Fayyad, and Bradley, 2001). Kirsch et al. (2009) used a statistical test based on a Poisson
approximation to model the distribution of itemset supports. This technique distinguishes
statistically significant frequent patterns from random fluctuations in data. Han et al. (2002)
used a top-down and bottom-up combined FP-tree mining strategy to accelerate support
counting operations and the discovery of closed frequent patterns. Then, the top-
k
frequent
closed frequent patterns of length no less than a threshold are determined. Xin et al. (2006)
extended the work in (Han et al., 2002) by considering redundancy between frequent
patterns. Discovery of error-tolerant frequent itemsets (Yang et al., 2001) relaxes the support
criteria by allowing some degree of error in counting itemsets' supports. In general, this kind
of method does not provide a good representation for the entire set of frequent patterns.
The representation derived by using one of the lossless compression methods usually
still contains a large number of representative patterns. In contrast, many of the lossy
approximation methods provide a tractable number of representative patterns. Some of the
lossy approximation methods (Afrati et al., 2004; Bayardo, 1998; Wang, 2008; Xin et al.,
2005) do not estimate the supports of frequent patterns. However, the aim is to present the
approximation of frequent patterns' supports. In many applications, it is sufficient to have
approximate supports instead of precise supports. Therefore, the lossy approximation
methods that estimate frequent patterns' supports are preferred in practice; our proposed
algorithm constitutes this kind of method.
2.2 Mining Uncertain Data
In the literature, three types of uncertain data are identified for mining frequent patterns:
univariate uncertain data, itemset uncertain data, and tuple uncertain data (Sun, Cheng,
Cheung, and Cheng, 2010). The univariate uncertain data have been introduced before. For
itemset uncertain data, an item like milk exists in a transaction with an
existential probability
indicating the possibility that the item exists in the transaction; while in tuple uncertain data,
each transaction (tuple) is associated with a probability indicating the possibility of its
existence. Liu (2012) proposed the U2P-Miner algorithm, which adopts the pattern growth
methodology, for mining frequent patterns from univariate uncertain data. Subsequently,
mining constrained frequent patterns and maximal frequent patterns from univariate