臺大管理論叢 NTU Management Review VOL.27 NO.2S

為頻繁單變量不確定樣式產生摘要

1. Introduction

Mining big data, i.e., discovering useful knowledge from big data, has become an

important research topic. One characteristic of big data is the variety of data types (Laney,

2001). While many data types record

precise data

, i.e., each transaction records an accurate

value for an attribute, more and more

uncertain data

are being recorded all the time. The

value of an attribute is not accurate in uncertain data. In the literature, univariate uncertain

data is one type of uncertain data, and refers to cases where each attribute in a transaction is

associated with a quantitative interval and a probability density function, which assigns a

probability to each value in the interval (Liu, 2012). For example, a low sensitivity sensor

used to record atmospheric pollution may record a quantitative interval, instead of a precise

value, to indicate the amounts of suspended particulates at 06:00 every day. Then, a

probability density function is explicitly or implicitly assigned to the interval to indicate the

possibility that each value exists in the quantitative interval. Table 1 shows a univariate

uncertain database, i.e., a database of univariate uncertain data, which is concerning air

quality. A quantitative interval is recorded for each of the two attributes in a transaction;

namely, suspended particulates and sulfur dioxide. The probability density function for each

quantitative interval can be assigned according to the observed status, or simply as a uniform

or normal distribution. An attribute whose value is represented in this format is called a

univariate uncertain attribute

Table 1 A Univariate Uncertain Database

Attribute

Transaction

suspended particulates

sulfur dioxide

[15, 30]

[54, 78]

[13, 31]

[54, 78]

[15, 30]

[52, 78]

Univariate uncertain data can also be constructed intentionally. For instance, instead of

presenting detailed daily stock prices, we can treat the minimum stock price and the

maximum stock price appearing in a time span, e.g., a week or a month, as the minimum

value and the maximum value of a quantitative interval, respectively. Then, the probability

density function associated with the interval is assigned according to the detailed daily stock

prices. In addition, for privacy's sake, the prices of all houses in an area, e.g., a street or a

block, can likewise form a univariate uncertain attribute. Depending on the level of

disclosure, the probability density function is set either according to the real prices or a