為頻繁單變量不確定樣式產生摘要
30
1. Introduction
Mining big data, i.e., discovering useful knowledge from big data, has become an
important research topic. One characteristic of big data is the variety of data types (Laney,
2001). While many data types record
precise data
, i.e., each transaction records an accurate
value for an attribute, more and more
uncertain data
are being recorded all the time. The
value of an attribute is not accurate in uncertain data. In the literature, univariate uncertain
data is one type of uncertain data, and refers to cases where each attribute in a transaction is
associated with a quantitative interval and a probability density function, which assigns a
probability to each value in the interval (Liu, 2012). For example, a low sensitivity sensor
used to record atmospheric pollution may record a quantitative interval, instead of a precise
value, to indicate the amounts of suspended particulates at 06:00 every day. Then, a
probability density function is explicitly or implicitly assigned to the interval to indicate the
possibility that each value exists in the quantitative interval. Table 1 shows a univariate
uncertain database, i.e., a database of univariate uncertain data, which is concerning air
quality. A quantitative interval is recorded for each of the two attributes in a transaction;
namely, suspended particulates and sulfur dioxide. The probability density function for each
quantitative interval can be assigned according to the observed status, or simply as a uniform
or normal distribution. An attribute whose value is represented in this format is called a
univariate uncertain attribute
.
Table 1 A Univariate Uncertain Database
Attribute
Transaction
suspended particulates
sulfur dioxide
T1
[15, 30]
[54, 78]
T2
[13, 31]
[54, 78]
T3
[15, 30]
[52, 78]
Univariate uncertain data can also be constructed intentionally. For instance, instead of
presenting detailed daily stock prices, we can treat the minimum stock price and the
maximum stock price appearing in a time span, e.g., a week or a month, as the minimum
value and the maximum value of a quantitative interval, respectively. Then, the probability
density function associated with the interval is assigned according to the detailed daily stock
prices. In addition, for privacy's sake, the prices of all houses in an area, e.g., a street or a
block, can likewise form a univariate uncertain attribute. Depending on the level of
disclosure, the probability density function is set either according to the real prices or a