為頻繁單變量不確定樣式產生摘要
34
frequent patterns. Afrati, Gionis, and Mannila (2004) proposed mining
boundary cover sets
from the frequent patterns, which involves searching for supersets of frequent patterns that
can be used to derive all frequent patterns. Jin, Abu-Ata, Xiang, and Ruan (2008) used
cluster analysis and decision tree techniques to derive groups of frequent patterns. The
regression technique was adopted to minimize the frequency restoration error. Xin, Han, Yan,
and Cheng (2005) adopted clustering analysis to partition frequent patterns into groups. A
representative pattern of a group “
δ
-covers” the frequent patterns in the group, which means
the representative pattern is the superset of each frequent pattern in the group, and the
distance between the representative pattern and each frequent pattern in the group is within a
fraction (
δ
). Distance is defined as one minus the ratio between the supports of two frequent
patterns. Later, Yan, Cheng, Han, and Xin (2005) extended the idea in (Xin et al., 2005) to
mine
pattern profiles
by clustering the frequent patterns. A pattern profile records the union
of the frequent patterns in a cluster and the occurrence probability of each item in the union.
In addition, the number of frequent patterns in the cluster is also recorded. The occurrence
probability of each item is used to estimate the support of the frequent patterns in the cluster.
Wang (2008) applied the concept of “
δ
-cover” to clustering sequential patterns, where the
number of clusters is minimized. Mimaroglu and Simovici (2007) modeled the database as a
matrix of bit sequences, where an item is represented by a bit and a transaction forms a bit
sequence. They applied a hierarchical clustering technique to the matrix, where the Jacquard-
Tanimoto measure is used to compute the distance between two frequent patterns. Al Hasan,
Salem, and Zaki (2011) proposed the SimClus algorithm, which performs cluster analysis
with a lower bound on similarity between frequent patterns to minimize the number of
clusters. There are also some methods that adopted the coding schemes to find the concise
representation of frequent patterns (Siebes, Vreeken, and van Leeuwen, 2006; Tatti and
Vreeken, 2012; Lam, Mörchen, Fradkin, and Calders, 2014; Vreeken, van Leeuwen, and
Siebes, 2011). These studies searched for a code table (a subset of all frequent patterns) that
can efficiently code (represent) the entire database. Jin, Xiang, Hong, and Huang (2010)
proposed the block-interaction model, which is similar to a code table, to summarize the full
set of frequent itemsets. They used a set of
core blocks
, each of which is the Cartesian
product of a frequent itemset and its support transactions. These core blocks interact with
each other by using two operators (horizontal union and vertical union) to form the
complexity of frequent patterns. Pei, Dong, Zou, and Han (2002) proposed mining the
condensed frequent patterns-base
, which is a subset of the full set of frequent patterns and
estimates the support of each frequent pattern with a user-specified error bound. Some