Table of Contents Table of Contents
Previous Page  34 /342 Next Page
Information
Show Menu
Previous Page 34 /342 Next Page
Page Background

為頻繁單變量不確定樣式產生摘要

34

frequent patterns. Afrati, Gionis, and Mannila (2004) proposed mining

boundary cover sets

from the frequent patterns, which involves searching for supersets of frequent patterns that

can be used to derive all frequent patterns. Jin, Abu-Ata, Xiang, and Ruan (2008) used

cluster analysis and decision tree techniques to derive groups of frequent patterns. The

regression technique was adopted to minimize the frequency restoration error. Xin, Han, Yan,

and Cheng (2005) adopted clustering analysis to partition frequent patterns into groups. A

representative pattern of a group “

δ

-covers” the frequent patterns in the group, which means

the representative pattern is the superset of each frequent pattern in the group, and the

distance between the representative pattern and each frequent pattern in the group is within a

fraction (

δ

). Distance is defined as one minus the ratio between the supports of two frequent

patterns. Later, Yan, Cheng, Han, and Xin (2005) extended the idea in (Xin et al., 2005) to

mine

pattern profiles

by clustering the frequent patterns. A pattern profile records the union

of the frequent patterns in a cluster and the occurrence probability of each item in the union.

In addition, the number of frequent patterns in the cluster is also recorded. The occurrence

probability of each item is used to estimate the support of the frequent patterns in the cluster.

Wang (2008) applied the concept of “

δ

-cover” to clustering sequential patterns, where the

number of clusters is minimized. Mimaroglu and Simovici (2007) modeled the database as a

matrix of bit sequences, where an item is represented by a bit and a transaction forms a bit

sequence. They applied a hierarchical clustering technique to the matrix, where the Jacquard-

Tanimoto measure is used to compute the distance between two frequent patterns. Al Hasan,

Salem, and Zaki (2011) proposed the SimClus algorithm, which performs cluster analysis

with a lower bound on similarity between frequent patterns to minimize the number of

clusters. There are also some methods that adopted the coding schemes to find the concise

representation of frequent patterns (Siebes, Vreeken, and van Leeuwen, 2006; Tatti and

Vreeken, 2012; Lam, Mörchen, Fradkin, and Calders, 2014; Vreeken, van Leeuwen, and

Siebes, 2011). These studies searched for a code table (a subset of all frequent patterns) that

can efficiently code (represent) the entire database. Jin, Xiang, Hong, and Huang (2010)

proposed the block-interaction model, which is similar to a code table, to summarize the full

set of frequent itemsets. They used a set of

core blocks

, each of which is the Cartesian

product of a frequent itemset and its support transactions. These core blocks interact with

each other by using two operators (horizontal union and vertical union) to form the

complexity of frequent patterns. Pei, Dong, Zou, and Han (2002) proposed mining the

condensed frequent patterns-base

, which is a subset of the full set of frequent patterns and

estimates the support of each frequent pattern with a user-specified error bound. Some