臺大管理論叢
第
27
卷第
2S
期
45
(6)
The range of silhouette is between -1 and 1. A positive silhouette means the FU2P
i
is
more similar to the FU2Ps in the same cluster than the FU2Ps in other clusters. In contrast, a
negative silhouette means
i
is more similar to the FU2Ps in other clusters, i.e.,
i
is
inappropriately assigned to the current cluster. If silhouette value is equal to 0,
i
is on the
border of two clusters.
For each cluster derived in Step 2, we compute the silhouette of each FU2P in the
cluster. If one or more FU2Ps in the cluster have a negative silhouette, which often happens
in a large cluster, this cluster is an
unsatisfied cluster
. A large cluster means the number of
FU2Ps in the cluster is large. This implies the medoid of the cluster cannot well represent all
of the FU2Ps in the cluster. Therefore, for each unsatisfied cluster, we apply the same
procedure described in Steps 1 and 2 to decompose the unsatisfied cluster into several
clusters. Note that the number of initial medoids is calculated by multiplying the number of
FU2Ps in the unsatisfied cluster with
ξ
.
If there exists no unsatisfied cluster, the SFC algorithm tries to merge similar clusters. It
is possible that we might decompose too much because the number of initial clusters is too
large. We discuss how to merge clusters in the next section.
3.3.4 Step 5: Merge Similar Clusters
The SFC algorithm uses a recursive process to merge clusters. At each iteration, the
distance between each pair of medoids is computed. The pair of clusters which has the
smallest distance between medoids is considered for merging. For each cluster of the pair,
we compute the average of the distances between the medoid and the other FU2Ps in the
cluster. We denote the weighted average of the two average distances derived from the two
clusters as
AvgDin
, i.e.,
, where
N
1
(or
N
2
) and
AvgD
1
(or
AvgD
2
) is the number of FUPs and the average distance in the first (or second)
cluster, respectively. Then, the medoid of the merged cluster formed by merging two clusters
is identified. The average of the distances between the new medoid and the FU2Ps in the
merged cluster is computed, and is denoted as
AvgDout
. The two clusters are merged if the
following two criteria are met: 1)
AvgDout
<
AvgDin
×
δ
; and 2) there exists no FU2P with a
negative silhouette in the merged cluster.