Table of Contents Table of Contents
Previous Page  55 /342 Next Page
Information
Show Menu
Previous Page 55 /342 Next Page
Page Background

臺大管理論叢

27

卷第

2S

55

Table 10 The Comparison on the DY2009 Dataset

SFC algorithm

SFC (worst)

MFU2Ps

#FU2Ps

3856

3856

3856

Total distance

60.55

195.70

1201.73

#Clusters

43

2731

3572

Quality index

2603.65

534444.57

4292579.56

Runtime

1

(s)

480.85

480.85

486.77

Runtime

2

(s)

3.51

1.93

-

Table 11 The Comparison on the AirQuality Dataset

SFC algorithm

SFC (worst)

MFU2Ps

#FU2Ps

5969

5969

5969

Total distance

52.61

265.50

2379.21

#Clusters

81

3015

4710

Quality index

4261.41

800479.83

11206079.10

Runtime

1

(s)

426.47

426.47

428.67

Runtime

2

(s)

7.46

3.81

-

5. Conclusion and Future Work

In big data analytics, the presentation of the mining result is an important issue. For data

that is composed of univariate uncertain attributes, mining frequent univariate uncertain

patterns (FU2Ps) is an effective way to understand the nature of the data. However, the

number of FU2Ps is usually too large to be comprehended by users. In this study, we propose

the SFC algorithm for summarizing FU2Ps. The SFC algorithm summarizes a set of FU2Ps

by representative FU2Ps and statistics of the FU2Ps belonging to each representative FU2P.

The hierarchical clustering technique is adopted to retrieve the summary. Instead of

examining a large number of FU2Ps, users only need to check tens of or perhaps hundreds of

representative FU2Ps. This greatly improves the practicability of FU2Ps.

In the future, we plan to investigate the possibility of further merging. The merging

effect is not very obvious in the experiments. To provide a more concise and still meaningful

representation, we will develop auxiliary or new methods for merging. We also try to find or

develop more suitable clustering techniques.