A Clustering Method to Identify Representative Financial Ratios

Abstract

[7] and K-means [1,12,18,21,23]. Cluster analysis, discriminant analysis, factor analysis and principal compo-nent analysis are often applied in classic statistical problems, such as for a large sample or long-term data. To

* Corresponding author.E-mail address: [email protected] (Y.-J. Wang).

Available online at www.sciencedirect.com

Information Sciences 178 (2008) 10871097

www.elsevier.com/locate/ins0020-0255/$ - see front matter 2007 Elsevier Inc. All rights reserved.In order to evaluate the nancial performance of a company, nancial ratios [22] are commonly used asevaluation criteria. Since some nancial ratios have quite similar patterns, it would be inecient to take allnancial ratios into consideration for evaluation. To avoid repeating the evaluation process on similar nan-cial ratios, we should partition the nancial ratios into clusters. After partitioning, the characteristics of thenancial ratios in the same cluster will be more similar, whereas the inter-cluster characteristics will be lesssimilar. Therefore, we can select a representative indicator in each cluster as the evaluation criterion. For thisreason, the evaluation criteria depend highly on the clustering method adopted.

Numerous clustering methods [25,9,11,13,17,19,20] have been proposed so far. These methods includecluster analysis, discriminant analysis, factor analysis, principal component analysis [10], grey relation analysisWhen companies evaluate their performance, it is impractical to take all of their nancial ratios into consideration. Toevaluate the nancial performance of a company, only a fraction of the available nancial ratios are considered andselected as evaluation criteria. In general, nancial ratios presented as sequences (or called nancial ratio sequences),are rst clustered and then a representative indicator is chosen from each cluster to serve as an evaluation criterion. Tocluster nancial ratios, we propose a clustering method in which the nancial ratios of dierent companies with similarvariations are partitioned into the same cluster. In other words, a fuzzy relation is proposed to represent the similaritybetween the nancial ratios, and a cluster validation index is also provided to determine the number of clusters. Oncethe nancial ratios are clustered, the representative indicator for each cluster will be identied. 2007 Elsevier Inc. All rights reserved.

Keywords: Clustering; Financial ratios; Fuzzy relation; Representative indicator

1. IntroductionA clustering method to identify representative nancial ratios

Yu-Jie Wang a,*, Hsuan-Shih Lee b

a Department of Shipping and Transportation Management, National Penghu University, Penghu 880, Taiwan, ROCb Department of Shipping and Transportation Management, National Taiwan Ocean University, Keelung 202, Taiwan, ROC

Received 4 October 2004; received in revised form 2 September 2007; accepted 3 September 2007doi:10.1016/j.ins.2007.09.016

Sectiostudy

Let Rk be a binary relation on S dened as

1088 Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097Rk fx; yjrx; yP k 8x; y 2 Sg;where 0 6 k 6 1.Lemma 2.1. Let R be a fuzzy equivalence relation on S. Then Rk is an equivalence relation on S.

Proof. Rk is an equivalence relation on S since it satises the following three conditions:

1. Reexive: r(x,x) = 1P k "x 2 S.2. Symmetric: If (x,y) 2 Rk, then (y,x) is also in Rk for r(x,y) = r(y,x)P k "x,y 2 S.3. Transitive: Suppose both (x,y) and (y,z) are in Rk, then (x,z) is also in Rk for

r(x,z) = min{r(x,y), r(y,z)}P k "x,y,z 2 S. h

If R is a fuzzy compatible relation on S, then Rk is a compatible relation on S. Generally speaking, a par-tition of items may root in a compatible relation or an equivalence relation.

A partition or classication of S is a family of disjoint subsets, say {S1,S2, . . . ,Si, . . .}. The partition must1. R is reexive, if r(x,x) = 1 "x 2 S.2. R is symmetric, if r(x,y) = r(y,x) "x,y 2 S.3. R is transitive, if r(x,y)P maxy2S min{r(x,y),r(y,z)} "x,y,z 2 S.

If R is reexive and symmetric, R is said to be a fuzzy compatible relation on S. If R is reexive, symmetricand transitive, R is said to be a fuzzy equivalence relation on S [6,17].satisf2. Mathematical preliminaries

To select the representative indicators of nancial ratios, relevant mathematical theories are stated as fol-lows. First, we review the compatible and equivalence relations [6,17].

A fuzzy binary relation R on X Y is dened as the set of ordered pairs:

R fx; y; rx; yjx; y 2 X Y g;where r is a function that maps X Y! [0, 1].

In particular, the relation R is called a fuzzy binary relation on X when X = Y.Let R be a fuzzy binary relation on S. The following conditions may hold for R:n 3. In Section 4, we introduce our clustering method and two numerical examples. Finally, an empiricaland a comparison of our method and the K-means is provided in Section 5.deal with a small sample or short-term data, grey relation analyses and K-means are preferred. Because mostnancial ratios are short-term data, classic statistical methods are not suitable means to cluster nancial ratios.Although K-means, proposed by MacQueen [18], can partition items into clusters because the clustering num-ber is known, this method is sometimes combined with other approaches, such as a self-organizing map(SOM) or neural network [1416], so that the clustering number is determined automatically. However, weprefer that the nancial ratios presented by sequences be clustered in cases where the clustering number isunknown, hence the original K-means approach is not suitable for the clustering problem.

In this paper, we suggest a new clustering method based on a fuzzy relation between nancial ratiosequences. The fuzzy relation is constructed on a compatible relation which represents the variation betweentwo nancial ratio sequences. A cluster validation index is provided as well in order to objectively determinethe number of clusters. Since clusters based on this fuzzy relation may overlap, additional mechanisms areintroduced to resolve ambiguities. After the disjoint clusters of nancial ratios are identied, a representativeindicator will be drawn from each cluster through comparison of the nancial ratios within a given cluster.

For the sake of clarity, mathematical preliminaries are presented in Section 2. Financial ratios are stated iny the following conditions. That is

S1 [ S2 [ [ Si [ Sand

Si \ Sj 8i 6 j:Given an equivalence relation R dened on S, we can use the equivalence relation to form a partition or clas-sication of S. A partition of S is induced by Rk. However, it is dicult to determine the threshold value kwithout performing a lot of experiments on dierent ks.

3. Financial ratios

In accounting, nancial ratios on a balance sheet or income statement can be initially classied into fourcategories: solvency, protability, asset and debt turnover, and return on investment.

The nancial ratios that fall within these four categories are shown in Table 1.We chose to partition these nancial ratios based on these four categories. Financial ratios in dierent cat-

egories are considered to be unrelated, so that ratios in the same category are clustered together. The repre-sentative indicator for each cluster will be identied by the method presented in Section 4.

4. The clustering method

In this section, we propose a clustering method based on the variations of nancial ratio sequences of dif-ferent companies. This method partitions nancial ratio sequences into clusters and then selects the represen-tative indicators of the nancial ratios from these clusters. The details are as stated below:

Let D be the matrix consisting of n nancial ratios of m companies:

D yijnm; 1where yij is the value of nancial ratio i of company j.

Table 1

Source

Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097 1089Long-term liabilities turnover Operation revenue/long-term liabilitiesStockholders equity turnover Operation revenue/ stockholders equityCurrent liabilities turnover Operation revenue/current liabilitiesTotal assets turnover Operation revenue/total assetsBalance sheet Solvency Current ratio Current assets/current liabilitiesFixed ratio Stockholders equity/xed assetsEquity ratio Stockholders equity/total assetsFixed/long-term ratio Fixed assets/long-term liabilitiesDebt ratio Total liabilities/ total assetsEquity/debt ratio Total liabilities/ stockholders equity

Income statement Protability Operation cost ratio Operation cost/operation revenueGross prot ratio (Operation revenue operation cost)/

operation revenueOperation prot ratio Operation income(loss)/ operation revenueIncome before tax ratio Income(loss) before tax/operation revenueNet income ratio Net income(loss)/operation revenue

Balance sheet and incomestatement

Return oninvestment

Return on current assets Net income(loss)/current assetsReturn on xed assets Net income(loss)/xed assetsReturn on total assets Net income(loss)/total assetsReturn on stockholders equity Net income(loss)/stockholders equityReturn on operation to capital Operation income(loss)/average capitalReturn on income before tax tocapital

Net income(loss)/average capital

Asset and debtturnover

Current assets turnover Operation revenue/current assetsFixed assets turnover Operation revenue/xed assetsCategory Ratio FormulaThe four categories of nancial ratiosTotal liabilities turnover Operation revenue/total liabilities

Then0 6 rhand

Proof

so yikTh

previ

Lemm

Proof

2. Symmetric: r(Yi,Yj) = r(Yj,Yi) "Yi,Yj 2 Y.

Sin

Based on R of Lemma 4.2, we can construct R with a given threshold value k. In Section 2,

1090 Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097Rk = {(Yi,Yj)jr(Yi,Yj)P k "Yi,Yj 2 Y} and 0 6 k 6 1. Since R is a fuzzy compatible relation, Rk is also acompatible relation. In the case where k = 0, Rk partitions n nancial ratios into one cluster. In case thatk = 1, Rk commonly partitions n nancial ratios into n clusters. In what follows, we present a method to deter-

minece R satises both the reexive and symmetric law, R is a fuzzy compatible relation on Y. h

k1. Reexive: r(Yi,Yi) = 1 "Yi 2 Y.ous procedure should simply be reversed. Therefore, this proof has been omitted. h

a 4.2. The fuzzy relation R defined in (3) is a fuzzy compatible relation on Y.i j ik jk

On the other hand, we can also prove r(Yi,Yj) = 1 if yik = tyjk, t > 0 "k. To prove this statement, the= tyjk, t > 0 "k.at is to say, if r(Y ,Y ) = 1, then y = ty , t > 0 "k.rY i; Y j 1m 1Xm1

k1 1jMik; k 1 Mjk; k 1j

kMk 1:

To satisfy the equation, 1 jMik;k1Mjk;k1jkMk 1 8k.That is, Mi(k,k + 1) Mj(k,k + 1) = 0 "k.

yi;k1 yikPmt1yit2

q yj;k1 yjkPmt1yjt2

q 8k;. By (3), Lemma 4.1. Based on the two sequences Yi and Yj, we know that the fuzzy relation r(Yi,Yj) = 1 iff yik = tyjk,

t > 0 "k, where yik 2 Yi and yjk 2 Yj.r(Yi,Yj) measures the similarity between the two nancial ratio sequences Yi and Yj, where(Yi,Yj) 6 1. The nancial ratio Yi is closer to nancial ratio Yj, as r(Yi,Yj) approaches 1. On the other, the nancial ratio Yi is farther from nancial ratio Yj, as r(Yi,Yj) approaches 0.Assume Y is the set composed of all nancial ratios. Let Yi = (yi1,yi2, . . . ,yik, . . . ,yim) 2 Y denote thesequence of nancial ratio i consisting of m entries. Dene Mi(k,k + 1) to indicate the variation of companyk to k + 1 for Yi, where

Mik; k 1 yi;k1 yikPm

t1yit2q : 2

Let

R fY i; Y j; rY i; Y jjY i; Y j 2 Y g 3be a fuzzy relation dened on Y, where

rY i; Y j 1m 1Xm1

k1 1jMik; k 1 Mjk; k 1j

kMk

and

kMk maxi;k

fMik; k 1g mini;k

fMik; k 1g:k objectively.

Leis clok, a v

Thter re

ratio

Y2

Then

Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097 1091rY i; Y j 1 if 1 6 i; j 6 2 or 3 6 i; j 6 4;0 otherwise:

Based, the fuzzy relation dened on Y1, Y2, Y3 and Y4 is presented as follows:Y3 = (40,30,20,10)andY4 = (400,300,200,100).

Observing the four patterns, we nd that Y1 and Y2 are similar in variation, as are Y3 and Y4, although Y2 andY3 are more similar than Y1 and Y2. Given the pattern variation, Y1 and Y2 should be in the same cluster, andY3 and Y4 should be in another cluster. Applying our clustering method to partition Y1, Y2, Y3 and Y4, wehave

Mik; k 1 0:183 if k 1; 2; 3 and i 1; 2;0:183 if k 1; 2; 3 and i 3; 4:

= (100,200,300,400),

Y1 = (1,2,3,4),sequences as shown in the following:relation. That is, the number of clusters increases as the value of the inter-cluster relation rises. Hence, we useCn(k)/n to represent the inter-cluster relation, where Cn(k)/n 6 1. On the other hand, the intra-cluster relationis represented by k. The relation between the nancial ratios within a cluster would be high if k is large. Thusthe validation index k Cn(k)/n would be large, as k is large and Cn(k)/n is small, which proves to be a goodpartition.

If Rk is an equivalence relation, we can use the equivalence relation to form a partition or classication ofY. The partition or classication of Y refers to a family of disjoint subsets, say, {SP1,SP2, . . . ,SPi, . . .}. Thepartition must satisfy the following conditions, i.e.

SP 1 [ SP 2 [ [ SP i [ Y 5and

SP i \ SP j 8i 6 j; 6where SPi indicates the set composed of the nancial ratios in cluster i.

In this paper, Rk merely satises the reexive and symmetric laws, but may not satisfy the transitive law. Itmay be that SPi \ SPj5 B, $i5 j, that is, a nancial ratio may belong to two dierent clusters. In such cases,a transitive closure [8,17] is commonly employed to construct an equivalence relation from the compatiblerelation. Here, we deal with the situation using a dierent approach. To solve the problem, two additionaldenitions are introduced as follows.

Denition 4.1. If r(Yi,Yk) 2 Rk, r(Yi,Yj) 2 Rk and r(Yk,Yj) 2 Rk, then Yi, Yk and Yj belong to the same cluster.

Denition 4.2. If Yt 2 SPi \ SPj and RSPi(Yt)P RSPj(Yt), then SP 0j SP j fY tg and SP 0j substitutes for SPjuntil SP i \ SP 0j , for i5 j, where RSP iY t minY k2SP ifrY t; Y kg and RSP jY t minY l2SP jfrY t; Y lg.A simple example is presented to demonstrate that the method is rationale. Assume there are four nanciale rationale for the validation index is presented as follows. A good partition should have a low inter-clus-lation value and a high intra-cluster relation value. If there is only one cluster, there is no inter-clusterk Cnk=n: 4t Cn(k) be the number of clusters partitioned by R . Cn(k) is closer to n as k approaches 1, whereas Cn(k)ser to 1 as k approaches 0. Once k is determined, Cn(k) is decided as well. To determine a suitable value ofalidation index is dened as follows:kon the above fuzzy relation, we can construct a 4 4 matrix corresponding to the relation:

11 1

0 0 1

0 0 1 1

26664

37775;

where r(Yi,Yj) = r(Yj,Yi), so r(Yj,Yi) is omitted when j < i.The partitions for dierent ks are shown in Table 2.According to the validation index, the sequences are partitioned into two clusters, (Y1,Y2) and (Y3,Y4),

which coincide with our intuition.Another example is also presented to demonstrate the eectiveness of the method. Assume that there are six

nancial ratio sequences, f1, f2, . . . , f6 of four companies A1, A2, A3, A4, shown in Table 3.According to Table 3, the variation for each pair of dierent companies is presented in Table 4.Based on Table 4, the matrix of the fuzzy relation is

1

0:56 1

0:93 0:62 1

0:98 0:57 0:94 1

0:64 0:92 0:67 0:63 1

0:57 0:97 0:62 0:58 0:93 1

2666666664

3777777775:

In the matrix, the entry r(Yi,Yj) = r(Yj,Yi), i, j = 1,2, . . . , 6, so r(Yj,Yi) is omitted when j < i.The partitions of ks with their validation indices are shown in Table 5, where we nd that the maximum

value of the validation index is 0.587 with k = 0.92. By taking k = 0.92, we partition the nancial ratios intotwo clusters.

10

f1f2f3f4f5f6

TableThe n

Ratio

f1

1092 Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 108710970.149 0.014 0.1760.513 0.048 0.5250.133 0.076 0.0190.158 0.039 0.158

0.396 0.010 0.4350.538 0.040 0.474Table 4The pair-wise variation for two dierent companies

Ratio Mi(A1,A2) Mi(A2,A3) Mi(A3,A4)

f2 0.55 0.12 0.16 0.60f3 0.19 0.26 0.30 0.29f4 0.50 0.70 0.75 0.55f5 0.66 0.25 0.26 0.71f6 0.88 0.20 0.25 0.852 (Y1,Y2), (Y3,Y4) 0.51 (Y1,Y2,Y3,Y4) 0.25

3ancial ratio sequences of four companies

A1 A2 A3 A4

0.62 0.84 0.86 0.60Table 2Partitions of dierent ks

k Cn(k) Clustering arrangement Validation index

Finally, we can select a representative indicator from each cluster according to Denitions 4.3 and 4.4. Thetwo denitions are described below.

Denition 4.3. Let cluster SPi be a set composed of several nancial ratios. Then Yt is a candidaterepresentative indicator of SPi as Yt 2 SPi and RSPi(Yt)P RSPi(Yl), "Yl 2 SPi.

Then, we can select a representative indicator by the following denition.

Denition 4.4. Let CSPi be the set consisting of candidate representative indicators in the cluster SPi. Then Ytis a representative indicator of SPi, if Yt 2 CSPi and ERSPi(Yt) 6 ERSPi(Yj), " Yj 2 CSPi, whereERSP iY t maxY k 62SP ifrY t; Y kg and ERSP iY j maxY k 62SP ifrY j; Y kg. Once the number of the represen-tative indicators in SPi is greater than one, we may choose any of them to serve as the representative.

By the above denitions, we can apply the clustering method to partition the nancial ratio sequences andselect the representative indicators for the empirical study as stated below.

Table 5Partitions of dierent ks


1 6 f1, f2, f3, f4, f5, f6 00.98 5 (f1, f4), f2, f3, f5, f6 0.1470.97 4 (f1, f4), (f2, f6), f3, f5 0.3030.93 3 (f1, f3, f4), (f2, f6), f5 0.4300.92 2 (f1, f3, f4), (f2, f5, f6) 0.5870.56 1 (f1, f2, f3, f4, f5, f6) 0.393

Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097 10935. Empirical study

An empirical study is described in this section. The data come from four shipping companies, A1, A2, A3and A4, in Taiwan. Their nancial ratios are shown in Table 6.

Table 6The nancial ratios of four shipping companies

Ratio A1 A2 A3 A4

Current ratio (F1) 1.002 1.573 1.421 1.025Fixed ratio (F2) 2.742 2.409 1.621 1.428Equity ratio (F3) 0.578 0.532 0.527 0.665Fixed/long-term ratio (F4) 1.044 1.834 1.311 4.399Debt ratio (F5) 0.422 0.468 0.473 0.335Equity/debt ratio (F6) 0.729 0.879 0.898 0.503Operation cost ratio (F7) 0.844 0.983 0.979 0.933Gross prot ratio (F8) 0.156 0.017 0.021 0.067Operation prot ratio (F9) 0.054 0.014 0.003 0.032Income before tax ratio (F10) 0.069 0.049 0.019 0.123Net income ratio (F11) 0.059 0.005 0.036 0.079

Return on current assets (F12) 0.071 0.013 0.143 0.377Return on xed assets (F13) 0.072 0.030 0.114 0.189Return on total assets (F14) 0.015 0.007 0.037 0.088Return on stockholders equity (F15) 0.026 0.012 0.071 0.133Return on income before tax to capital (F16) 0.045 0.019 0.091 0.169Current assets turnover (F17) 1.205 2.586 3.929 4.784Fixed assets turnover (F18) 1.212 6.042 3.151 2.401Total assets turnover (F19) 0.256 1.335 1.024 1.119Stockholders equity turnover (F20) 0.442 2.509 1.943 1.682Current liabilities turnover (F21) 1.207 4.067 5.583 4.901Long-term liabilities turnover (F22) 1.265 11.083 4.132 10.562Total liabilities turnover (F23) 0.606 2.855 2.164 3.345

In Table 6, the debt ratio, equity/debt ratio and operation cost ratio are expressed in the form of areciprocal.

The nancial ratios in Table 1 were initially divided into four categories. We employ the fuzzy relation inSection 4 to construct relation matrices for the four categories. Let T1, T2, T3 and T4 represent relation matri-ces for solvency (F1 F6), protability (F7 F11), return on investment (F12 F16), and asset and debt turn-over (F17 F23), respectively.

1

0:78 1

266

377

and

In thSin

resultIn

we pa

TableThe cl

1094 Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097k Cn(k) Clustering arrangement Validation index

1 6 F1,F2,F3,F4,F5,F6 00.97 5 (F3,F5),F1,F2,F4,F6 0.1370.90 4 (F3,F5,F6),F1,F2,F4 0.2330.78 3 (F1,F2), (F3,F5,F6),F4 0.2800.67 2 (F1,F2), (F3,F4,F5,F6) 0.3370.59 1 (F ,F ,F ,F ,F ,F ) 0.4230:85 0:92 0:90 0:97 1

T 4

1

0:61 1

0:77 0:84 1

0:72 0:89 0:95 1

0:89 0:72 0:80 0:82 1

0:60 0:82 0:78 0:76 0:58 1

0:79 0:76 0:92 0:87 0:77 0:81 1

2666666666664

3777777777775

:

e relation matrices, the entry r(Yi,Yj) = r(Yj,Yi), so r(Yj,Yi) is omitted when j < i.ce nancial ratios are initially divided into four categories, Cn(k) can be calculated respectively. Theis presented in Tables 710.Table 7, we nd that the maximum value of the validation index is 0.423 at k = 0.59. By taking k = 0.59,rtition the nancial ratios into one cluster.

7ustering arrangements for dierent ks in Category 13

0:88 0:89 0:93 1664 775T 2

1

0:79 1

0:71 0:61 1

0:64 0:53 0:82 1

0:73 0:85 0:71 0:53 1

6666664

7777775;

T

1

0:77 1

0:96 0:82 1

26666

37777T 1 0:75 0:84 1

0:63 0:59 0:67 1

0:72 0:82 0:97 0:69 1

0:66 0:77 0:90 0:73 0:93 1

6666664

7777775;

2 31 2 3 4 5 6

Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097 1095Table 8The clustering arrangements for dierent ks in Category 2


1 5 F7,F8,F9,F10,F11 00.85 4 (F8,F11),F7,F9,F10 0.0500.82 3 (F8,F11), (F9,F10),F7 0.2200.73 2 (F7,F8,F11), (F9,F10) 0.3300.53 1 (F7,F8,F9,F10,F11) 0.330

Table 9The clustering arrangements for dierent ks in Category 3


1 5 F12,F13,F14,F15,F16 00.97 4 (F15,F16),F12,F13,F14 0.1700.96 3 (F12,F14), (F15,F16),F13 0.3600.89 2 (F12,F14), (F13,F15,F16) 0.4900.77 1 (F12,F13,F14,F15,F16) 0.570

Table 10The clustering arrangements for dierent ks in Category 4


1 7 F17,F18,F19,F20,F21,F22,F23 0In Table 8, we nd that the maximum value of the validation index is 0.330 at k = 0.73 or 0.53. By takingk = 0.73, we partition the nancial ratios into two clusters because 0.73 > 0.53.

In Table 9, we nd that the maximum value of the validation index is 0.570 at k = 0.77. By taking k = 0.77,we partition the nancial ratios into one cluster.

In Table 10, we nd that the maximum value of the validation index is 0.474 at k = 0.76. By takingk = 0.76, we partition the nancial ratios into two clusters.

Clearly, the maximum values of the validation index in the four categories are 0.423, 0.330, 0.570 and 0.474,respectively. The clustering outcomes of the nancial ratios in the four categories are shown in Table 11.

We then select the representative indicator of nancial ratios in each cluster according to Denitions 4.3and 4.4. The result is shown in Table 12.

Finally, we compare our clustering method with others to demonstrate the feasibility of the proposedmethod. Among the existing methods, K-means is commonly applied in clustering research. We intend to uti-lize K-means to prove whether our method is reasonable or not. A simple introduction to K-means is givenbelow.

K-means assigns each item to the cluster with the nearest mean. The process is composed of three steps:

Step 1. Partition items into k clusters.Step 2. Proceed through the list of items, and assign an item to the cluster whose mean is the closest; then

recalculate the mean for the cluster that received a new item and for the cluster which lost the item.Step 3. Repeat Step 2 until no more reassignments take place.

In Step 2, the distance from each item to the mean is expressed by the Euclidean distance. Thus the objec-tive function of K-means dened with the Euclidean norm is stated as follows:

0.95 6 (F19,F20),F17,F18,F21,F22,F23 0.0930.89 5 (F17,F21), (F19,F20),F18,F22,F23 0.1760.87 4 (F17,F21), (F19,F20,F23),F18,F22 0.2990.82 3 (F17,F21), (F18,F22), (F19,F20,F23) 0.3910.76 2 (F17,F21), (F18,F19,F20,F22,F23) 0.4740.58 1 (F17,F18,F19,F20,F21,F22,F23) 0.437

squarAc

ing rethe pminesK-me

Table 11The clustering arrangements for nancial ratios in the four categories

Category Cluster Ratio

Category 1: 1 F1,F2,F3,F4,F5,F6SolvencyCategory 2: 2 F7,F8,F11Protability 3 F9,F10Category 3: 4 F12,F13,F14,F15,F16Return on investmentCategory 4: 5 F17,F21Asset and debt turnover 6 F18,F19,F20,F22,F23

Table 12Representative indicators for the nancial ratios

Cluster Ratio Representative indicator

1 F1,F2,F3,F4,F5,F6 F5

TableThe sq

Cluste

(F17,F(F18,F

TableThe sq

Cluste

(F7,F8(F9,F1

1096 Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097equences have to be normalized to meet the requirements of the K-means method. The calculation of theed distance sums is shown as follows.Min :Xk

i1

Xj2Ii

kxj zik2; 7where zi

Pj2Ii

xj

NuIi ; Ii represents cluster i, zi indicates the mean of cluster i and Nu(Ii) denotes the number ofitems on Ii.

Since K-means assigns any item to the cluster with the nearest mean, we can utilize the condition to deter-mine whether the result shown in Table 11 is reasonable or not. In Table 11, Categories 1 and 3 are partitionedinto one cluster, so we merely calculate the squared distance sums in Categories 2 and 4. In addition, the nan-cial s

2 F7,F8,F11 F83 F9,F10 F104 F12,F13,F14,F15,F16 F155 F17,F21 F176 F18,F19,F20,F22,F23 F19cording to the K-means calculation, the minimized distance sums in Tables 13 and 14 match the cluster-sult in Table 11, so the proposed method yields reasonable cluster outputs. Comparing the two methods,roposed method has advantages over the K-means method. First, the proposed method objectively deter-the number of clusters. Second, our method does not reassign any items to a cluster, in contrast to theans method. In short, our method is more practical than the K-means method.

14uared distance sums of nancial ratios in Category 4: Asset and debt turnover

r Squared distance sums to group means

F17 F18 F19 F20 F21 F22 F23

21) 0.0083 0.2883 0.0733 0.1079 0.0083 0.2055 0.0568

19,F20,F22,F23) 0.1470 0.0636 0.0062 0.0158 0.0958 0.0483 0.0322

13uared distance sums of nancial ratios in Category 2: Protability

r Squared distance sums to group means

F7 F8 F9 F10 F11

,F11) 0.1326 0.1218 2.5157 1.4137 0.1051

0) 1.7918 2.4983 0.1371 0.1371 1.5522

6. Conclusions

In this paper, we have proposed a clustering method based on the fuzzy relation derived from variations inthe nancial ratio sequences of dierent companies. As the number of clusters is unknown, the proposed clus-tering method objectively partitions nancial ratio sequences into clusters. Therefore, the clustering methodcan be applied in conditions where the cluster number is not determined. This is the major dierence betweenthe proposed method and the K-means method. On the other hand, the four denitions mentioned in thispaper can substitute for the transitive closure law; therefore, the representative indicators can be selected

Y.-J. Wang, H.-S. Lee / Information Sciences 178 (2008) 10871097 1097by this mechanism as well. In short, we provide a new approach to resolve tie-breaks in cluster outcomes.Additionally, the partitioned results are proven to be similar to those of the K-means method when comparedin an empirical study. That is to say, the proposed method has the strengths of the K-means method withoutmany of its weaknesses.

References

[1] S. Bandyopadhyay, U. Maulik, An evolutionary technique based on K-means algorithm for optimal clustering in RN, InformationSciences 146 (2002) 221237.

[2] J.S. Deogun, D. Kratsch, G. Steiner, An approximation algorithm for clustering graphs with dominating diametral path, InformationProcessing Letters 61 (1997) 121127.

[3] R. Dubes, A. Jains, Algorithm that Cluster Data, Prentice-hall, Englewood Clis, NJ, 1988.[4] R.O. Duda, P.E. Hart, Pattern Classication and Scene Analysis, Wiley, New York, 1973.[5] K.B. Eom, Fuzzy clustering approach in supervised sea-ice classication, Neurocomputing 25 (1999) 149166.[6] S.S. Epp, Discrete Mathematics with Applications, Wadsworth, Canada, 1990.[7] C.M. Feng, R.T. Wang, Performance evaluation for airlines including the consideration of nancial ratios, Journal of Air Transport

Management 6 (2000) 133142.[8] F. Guoyao, An algorithm for computing the transitive closure of a fuzzy similarity matrix, Fuzzy Sets and Systems 51 (1992) 189194.[9] S. Hirano, X. Sun, S. Tsumoto, Comparison of clustering methods for clinical databases, Information Sciences 159 (2004) 155165.[10] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, Prentice-Hall, Englewood Clis, NJ, 1992.[11] L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York, 1990.[12] S.S. Khan, A. Ahmad, Cluster center initialization algorithm for K-means clustering, Pattern Recognition Letters 25 (2004) 1293

1302.[13] R. Krishnapuram, J.M. Keller, A possibilistic approach to clustering, IEEE Transactions on Fuzzy Systems 1 (1993) 98110.[14] R.J. Kuo, L.M. Ho, C.M. Hu, Integration of self-organizing feature map and K-means algorithm for market segmentation,

Computers and Operations Research 29 (2002) 14751493.[15] R.J. Kuo, K. Chang, S.Y. Chien, Integration of self-organizing feature maps and genetic algorithm based clustering method for

market segmentation, Journal of Organizational Computing and Electronic Commerce 14 (2004) 4360.[16] R.J. Kuo, J.L. Liao, C. Tu, Integration of ART2 neural network and genetic K-means algorithm for analyzing Web browsing paths in

electronic commerce, Decision Support Systems 40 (2005) 355374.[17] H.S. Lee, Automatic clustering of business processes in business systems, European Journal of Operational Research 114 (1999) 354

362.[18] J.B. MacQueen, Some methods for classication and analysis of multivariate observations, Proceedings of 5th Berkeley Symposium

on Mathematical Statistics and Probability, vol. 1, University of California Press, Berkeley, Calif, 1967, pp. 281297.[19] S. Miyamoto, Information clustering based on fuzzy multisets, Information Processing and Management 39 (2003) 195213.[20] W. Pedrycz, G. Vukovich, Logic-oriented fuzzy clustering, Pattern Recognition Letters 23 (2002) 15151527.[21] H. Ralambondrainy, A conceptual version of the K-means algorithm, Pattern Recognition Letters 16 (1995) 11471157.[22] B.M. Walter, F.M. Robert, Accounting: The Basis for Business Decisions, McGraw-Hill, New York, 1988.[23] K.L. Wu, M.S. Yang, Alternative c-means clustering algorithms, Pattern Recognition 35 (2002) 22672278.

A clustering method to identify representative financial ratiosIntroductionMathematical preliminariesFinancial ratiosThe clustering methodEmpirical studyConclusionsReferences

Documents

A Clustering Method to Identify Representative Financial Ratios