臺大管理論叢
第
27
卷第
2S
期
31
uniform distribution.
Mining frequent patterns is a good way to discover the intrinsic properties and hidden
regularity of a univariate uncertain database. In addition, frequent patterns can be utilized to
deal with classification (Thabtah, 2007) and cluster analysis (Wang, Wang, Yang, and Yu,
2002). Hereafter, a
U2 pattern
refers to a pattern composed of univariate uncertain attributes,
and a
frequent U2 pattern
indicates a U2 pattern that frequently appears. We hereafter
abbreviate a frequent U2 pattern as
FU2P
.
Mining FU2Ps has been addressed in (Liu, 2012). However, the number of FU2Ps
derived from univariate uncertain data is usually large, except for cases of mining with a
very high minimum support. Returning the full set of FU2Ps to users may frustrate them
because it is impossible for the users to manually go over such a large collection of patterns.
Therefore, deriving a concise representation of the FU2Ps is required. To the best of our
knowledge, in the literature, only one proposal attempts to get a concise representation of the
FU2Ps: Liu (2014) proposed mining maximal frequent patterns from univariate uncertain
data. A maximal frequent pattern is a frequent pattern which has no frequent supersets. A
maximal frequent pattern of univariate uncertain data will be hereafter referred to as a
MFUP. The left column of Table 2 presents the first set of FU2Ps, which contains six FU2Ps.
For example, [
A
1
:[15, 30],
A
2
:[104, 178],
A
3
:[55, 66]] represents a FU2P, where the intervals
of attributes
A
1
,
A
2
, and
A
3
are [15, 30], [104, 178], and [55, 66], respectively. Only two
FU2Ps are MFU2Ps in the first set, namely, [
A
1
:[15, 30],
A
2
:[104, 178],
A
3
:[55, 66]] and
[
A
1
:[20, 35],
A
2
:[203, 251],
A
3
:[44, 51]]. The second and third FU2Ps are subsets of the first
FU2P; while the fifth and sixth FU2Ps are subsets of the fourth FU2P. In addition, the full set
of FU2Ps can be enumerated from the set of MFU2Ps by using the Apriori property, i.e., a
non-empty subset of a frequent pattern is also a frequent pattern. Therefore, we can return
only the set of MFU2Ps to users instead of the full set of FU2Ps. However, the set of
MFU2Ps may not concisely represent the full set of FU2Ps in some univariate uncertain
datasets. For instance, the right column of Table 2 shows the second set of FU2Ps, where
each FU2P is not a subset of the other five FU2Ps. Therefore, all FU2Ps are MFU2Ps. In
fact, the second and third FU2Ps are only slightly different from the first FU2P in the
intervals of
A
1
and
A
3
respectively. We can treat the three FU2Ps as a group. The same
observation is made for the remaining three FU2Ps, while the fifth and sixth FU2Ps are
slightly different from the fourth FU2P in the intervals of
A
2
and
A
1
respectively. The three
FU2Ps can form another group. For each group, we generate an informative summary to
represent the FU2Ps in the group. We select the first FU2P as the representative FU2P for the