臺大管理論叢 NTU Management Review VOL.27 NO.2S

臺大管理論叢

第

卷第

期

methods have tried to find the most important frequent patterns, such as

statistically

significant frequent itemsets

(Kirsch, Mitzenmacher, Pietracaprina, Pucci, Upfal, and Vandin,

2009),

top-k frequent patterns

(Han, Wang, Lu, and Tzvetkov, 2002),

top-k redundancy-

aware patterns

(Xin, Cheng, Yan, and Han, 2006) and

error-tolerant frequent itemsets

(Yang,

Fayyad, and Bradley, 2001). Kirsch et al. (2009) used a statistical test based on a Poisson

approximation to model the distribution of itemset supports. This technique distinguishes

statistically significant frequent patterns from random fluctuations in data. Han et al. (2002)

used a top-down and bottom-up combined FP-tree mining strategy to accelerate support

counting operations and the discovery of closed frequent patterns. Then, the top-

frequent

closed frequent patterns of length no less than a threshold are determined. Xin et al. (2006)

extended the work in (Han et al., 2002) by considering redundancy between frequent

patterns. Discovery of error-tolerant frequent itemsets (Yang et al., 2001) relaxes the support

criteria by allowing some degree of error in counting itemsets' supports. In general, this kind

of method does not provide a good representation for the entire set of frequent patterns.

The representation derived by using one of the lossless compression methods usually

still contains a large number of representative patterns. In contrast, many of the lossy

approximation methods provide a tractable number of representative patterns. Some of the

lossy approximation methods (Afrati et al., 2004; Bayardo, 1998; Wang, 2008; Xin et al.,

2005) do not estimate the supports of frequent patterns. However, the aim is to present the

approximation of frequent patterns' supports. In many applications, it is sufficient to have

approximate supports instead of precise supports. Therefore, the lossy approximation

methods that estimate frequent patterns' supports are preferred in practice; our proposed

algorithm constitutes this kind of method.

2.2 Mining Uncertain Data

In the literature, three types of uncertain data are identified for mining frequent patterns:

univariate uncertain data, itemset uncertain data, and tuple uncertain data (Sun, Cheng,

Cheung, and Cheng, 2010). The univariate uncertain data have been introduced before. For

itemset uncertain data, an item like milk exists in a transaction with an

existential probability

indicating the possibility that the item exists in the transaction; while in tuple uncertain data,

each transaction (tuple) is associated with a probability indicating the possibility of its

existence. Liu (2012) proposed the U2P-Miner algorithm, which adopts the pattern growth

methodology, for mining frequent patterns from univariate uncertain data. Subsequently,

mining constrained frequent patterns and maximal frequent patterns from univariate