11
J. Cent. South Univ. (2014) 21: 19351945 DOI: 10.1007/s11771-014-2140-z Automated pattern-directed refactoring for complex conditional statements LIU Wei(刘伟) 1 , HU Zhi-gang(胡志刚) 1,2 , LIU Hong-tao(刘宏韬) 2 , YANG Liu(杨柳) 2 1. School of Information Science and Engineering, Central South University, Changsha 410083, China; 2. School of Software, Central South University, Changsha 410075, China © Central South University Press and Springer-Verlag Berlin Heidelberg 2014 Abstract: Complex conditional statement is one of the bad code smells, which affects the quality of the code and design of software. In the proposed approach, two commonly-used design patterns for handling complex conditional statements are selected, and they are the factory method pattern and the strategy pattern. Two pattern-directed refactoring approaches based on the two design patterns are proposed. Each approach contains a refactoring opportunities identification algorithm and an automated refactoring algorithm. After parsing the abstract syntax tree generated from source code, the refactoring opportunities are identified effectively and automatically. Then, for candidate code, refactoring algorithms are executed automatically, which are used to simplify or remove complex conditional statements. By empirical analysis and quality assessment, the code after refactoring has better maintainability and extensibility, and the proposed approach for automated pattern-directed refactoring succeeds to reduce code size and complexity of classes. Key words: refactoring; abstract syntax tree; complex conditional statements; design patterns; factory method pattern; strategy pattern 1 Introduction One of the most common areas of complexity in a program lies in complex conditional statements. In the field of object-oriented programming, it is recommended that few or no complex conditional statements are used in source code. Commonly-used conditional statements include complex if-then-else statements and switch-case statements. If program branches depend on conditional expressions, a very long method and a very large class will be constructed. Length of a method is an important factor that makes source code harder to read, and conditional statements increase the difficulty further. Therefore, some bad smells which are defined by FOWLER and BECK [1] have emerged, such as long method and large class. Moreover, if there are more long and complex conditional statements in source code, the program will be harder to test, and also harder to maintain. In addition, adding a new branch is also very difficult, because the original source needs to modify, and it will violate the open-closed principle (OCP) which is the fundamental principle of object-oriented design. Beyond that, the reusability of code is also affected. It is difficult to reuse the statements in one of the conditional branches. Therefore, identification and refactoring of the complex conditional statements in source code is helpful to improve the code quality. FOWLER and BECK [1] presented four things that make programs hard to work with. One of them is programs with complex conditional logic which are hard to modify. Therefore, software developers hope that conditional logic statements are expressed as simply as possible. In the classic book [1], it is defined that conditional statement is one of the bad smells in code, called switch statements, and they provide several refactoring methods to refactor conditional statements. In most of these methods, polymorphisms are introduced. If the conditional statements switch on a type code, extract method is used to extract the switch statement and then move method is used to get it onto the class where the polymorphism is needed. Replacing type code with subclasses or replacing type code with state/strategy can be used to implement further refactoring. After setting up the inheritance structure, replacing conditional statements with polymorphism can be used. In Ref. [1], some instances were given to illustrate how to implement refactoring for complex conditional statements by manual, but solutions were not provided to realize automated refactoring. Although many developers know the importance of refactoring, refactoring is underused now. VAKILIAN et al [2] conduced a field of programmers in their natural settings working on their code, and they found that some programmers do not use some automated refactorings Received date: 20120820; Accepted date: 20131023 Corresponding author: HU Zhi-gang, Professor, PhD; Tel: +8673182656085; E-mail: [email protected]

Automated pattern-directed refactoring for complex conditional statements

  • Upload
    liu

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945 DOI: 10.1007/s11771-014-2140-z

Automated pattern-directed refactoring for complex conditional statements

LIU Wei(刘伟)1, HU Zhi-gang(胡志刚)1,2, LIU Hong-tao(刘宏韬)2, YANG Liu(杨柳)2

1. School of Information Science and Engineering, Central South University, Changsha 410083, China; 2. School of Software, Central South University, Changsha 410075, China

© Central South University Press and Springer-Verlag Berlin Heidelberg 2014

Abstract: Complex conditional statement is one of the bad code smells, which affects the quality of the code and design of software. In the proposed approach, two commonly-used design patterns for handling complex conditional statements are selected, and they are the factory method pattern and the strategy pattern. Two pattern-directed refactoring approaches based on the two design patterns are proposed. Each approach contains a refactoring opportunities identification algorithm and an automated refactoring algorithm. After parsing the abstract syntax tree generated from source code, the refactoring opportunities are identified effectively and automatically. Then, for candidate code, refactoring algorithms are executed automatically, which are used to simplify or remove complex conditional statements. By empirical analysis and quality assessment, the code after refactoring has better maintainability and extensibility, and the proposed approach for automated pattern-directed refactoring succeeds to reduce code size and complexity of classes. Key words: refactoring; abstract syntax tree; complex conditional statements; design patterns; factory method pattern; strategy pattern

1 Introduction

One of the most common areas of complexity in a program lies in complex conditional statements. In the field of object-oriented programming, it is recommended that few or no complex conditional statements are used in source code. Commonly-used conditional statements include complex if-then-else statements and switch-case statements. If program branches depend on conditional expressions, a very long method and a very large class will be constructed. Length of a method is an important factor that makes source code harder to read, and conditional statements increase the difficulty further. Therefore, some bad smells which are defined by FOWLER and BECK [1] have emerged, such as long method and large class. Moreover, if there are more long and complex conditional statements in source code, the program will be harder to test, and also harder to maintain. In addition, adding a new branch is also very difficult, because the original source needs to modify, and it will violate the open-closed principle (OCP) which is the fundamental principle of object-oriented design. Beyond that, the reusability of code is also affected. It is difficult to reuse the statements in one of the conditional branches. Therefore, identification and refactoring of the complex conditional statements in source code is helpful to improve the code quality.

FOWLER and BECK [1] presented four things that make programs hard to work with. One of them is programs with complex conditional logic which are hard to modify. Therefore, software developers hope that conditional logic statements are expressed as simply as possible. In the classic book [1], it is defined that conditional statement is one of the bad smells in code, called switch statements, and they provide several refactoring methods to refactor conditional statements. In most of these methods, polymorphisms are introduced. If the conditional statements switch on a type code, extract method is used to extract the switch statement and then move method is used to get it onto the class where the polymorphism is needed. Replacing type code with subclasses or replacing type code with state/strategy can be used to implement further refactoring. After setting up the inheritance structure, replacing conditional statements with polymorphism can be used. In Ref. [1], some instances were given to illustrate how to implement refactoring for complex conditional statements by manual, but solutions were not provided to realize automated refactoring.

Although many developers know the importance of refactoring, refactoring is underused now. VAKILIAN et al [2] conduced a field of programmers in their natural settings working on their code, and they found that some programmers do not use some automated refactorings

Received date: 2012−08−20; Accepted date: 2013−10−23 Corresponding author: HU Zhi-gang, Professor, PhD; Tel: +86−731−82656085; E-mail: [email protected]

Page 2: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1936

because they are unaware of the opportunities to use them. In other words, many programmers do not know when and how to refactor. MURPHY-HILL et al [3] examined the foundations of refactoring from several perspectives, and they examined refactoring tool usage and evaluated some of the assumptions made by other researchers. They found three factors: awareness, opportunity and trust, which may limit tool usage. Therefore, automated identification refactoring opportunities are the crucial parts of automated refactoring.

A complete automated refactoring algorithm contains at least two stages: one is how to identify refactoring opportunities automatically and the other is how to implement refactoring automatically. Identifying refactoring opportunities is the first stage to refactor the code of a software. Several researchers have done some works on how to identify the refactoring opportunities. DALLAL [4] used 25 existing size, cohesion, and coupling metrics to predict whether the class is in need of restructuring by extracting a subclass from it, and he used univariate logistic regression analysis and models of combined metrics based on multivariate logistic regression analysis to predict whether a class was in need of extract subclass refactoring ESR, and he got some results indicating that there was a strong statistical relation between some of the quality metrics and decision of whether ESR activity was required, and the models based on combinations of metrics had outstanding abilities to predict classes in need of ESR. TSANTALIS and CHATZIGEORGIOU [5] proposed a methodology for the identification of move method refactoring opportunities that constitute a way for solving many common feature envy bad smells, and they used the notion of distance between system entities (attributes/ methods) and classes to extract a list of behavior-preserving refactoring based on the examination of a set of preconditions. TSANTALIS and CHATZIGEORGIOU [6−7] also proposed an approach based on the union of static slices resulting from the application of a block-based slicing technique to identify opportunities for extract method refactoring. In addition, they proposed a technique that extracted refactoring suggestions introducing polymorphism as a solution to state-checking problems [8]. FOKAEFS et al [9] used an agglomerative clustering algorithm, which identified cohesive sets of class members within the system classes to recognize the extract class opportunities. BAVOTA et al [10] proposed an extract class refactoring method based on graph theory that exploited structural and semantic relationships between methods. LIU et al [11] proposed a tool Generalization referee (GenReferee) to identify potential refactoring opportunities according to conceptual relationship, implementation similarity,

structural correspondence and inheritance hierarchies. HIGO et al [12] proposed a set of metrics to suggest how code clones can be refactored and developed a tool, called Aries, to compute these metrics automatically. KOMONDOOR and HORWITZ [13] used slicing to identify duplication in source code, and they used program dependence graphs (PDGs) and program slicing to find isomorphic PDG subgraphs to represent clones.

Few of the aforementioned methods can identify the refactoring opportunities for complex conditional statements automatically. In addition, few refactoring methods can automatically refactor complex conditional statements introducing design patterns. Therefore, automated pattern-directed refactoring for complex conditional statements is a very significant and valuable work. 2 Overview of our approach 2.1 Pattern-directed refactoring

Pattern-directed refactoring was proposed by KERIEVSKY in Ref. [14], who proposed more than 27 pattern-directed refactorings, and gave some general information and new insights about patterns and refactoring. Many refactorings in Ref. [14] were based on gang of four (GoF) design patterns which were described by GAMMA et al in Ref. [15]. Design patterns are descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context, and they are techniques for documenting solutions to recurring design problems and sharing design expertise in an application-independent fashion.

Pattern-directed refactoring is a mixture refactoring technique combining refactoring with design patterns. KERIEVSKY [14] considered that bad code smells were not inborn, and design patterns should not be determined at first. He emphasized how to implement design pattern through refactoring technique, so as to improve the design quality of existing code.

How to refactor code by some specific design patterns automatically becomes a new research field in software engineering. In Ref. [14], KERIEVSKY proposed some pattern-directed refactoring. For example, he presented a refactoring named replace conditional logic with strategy to handle complex conditional statements, and he considered that decreasing or removing conditional logic could clarify algorithms and simplify a class by moving variations on an algorithm to a hierarchy, so as to avoid the long method and large class. TSANTALIS and CHATZIGEORGIOU [8] proposed a method to identify refactoring opportunities towards the state/strategy design patterns. CHRISTOPOULOU et al [16] presented an algorithm for

Page 3: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1937

the automated identification of refactoring opportunities to the strategy design pattern. A technique was proposed for total replacement of conditional logic with method calling for appropriate concrete strategy instances. The identification algorithm and the refactoring procedure were implemented and integrated in the JDeodorant Eclipse plug-in. JEBELEAN et al [17] proposed a logic approach to the automatic detection of places within object-oriented code where the composite design pattern could have been used. An approach to automated identification of refactoring opportunities to the abstract factory design pattern in a Java code base was proposed by JEON et al [18]. JUILLERAT and HIRSBRUNNER [19] presented an implementation of the “form template method” refactoring, and they specified an algorithm for automated refactoring to the template method design pattern.

However, none of the above-mentioned pattern- directed refactorings is specifical for different types of complex conditional statements refactoring and considers different application situations of complex conditional statements. In this work, more design patterns-directed refactorings are considered and studied. For different contexts, different design patterns can be used to simplify or remove complex conditional statements, and to improve the readability, testability, maintainability, and reusability of a software. 2.2 Design pattern selection

In all of the 23 GoF design patterns [15], some design patterns can simplify and remove complex conditional statements. A systematic research has been done to decide whether a design pattern can be used to handle conditional statements. The intent and motivation of each design pattern are analyzed. Some information was provided to choose appropriate design patterns to deal with conditional logic [14]. Several different approaches were also given to find a design pattern for a particular design problem [15]. Finally, two GoF design patterns are selected, which are the factory method pattern and the strategy pattern.

If the conditional statements are used to decide

which subclass in a product inheritance hierarchy will be created, the factory method pattern can split conditional logic to some new subclasses of a factory class. If the conditional statements are used to decide which algorithm can solve a specific problem, the strategy pattern can interchange algorithms or add a new algorithm without modifying the existing code.

Intents and descriptions of the two design patterns are listed in Table 1. 2.3 Overall architecture of refactoring process

Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure [1]. In Ref. [1], FOWLER and BECK proposed seventy-two refactorings, including seven kinds, such as composing method, moving features between objects, organizing data, simplifying conditional expressions, making method calls simpler, dealing with generalization and some big refactorings. FOWLER and BECK gave some manual methods to identify refactoring opportunities and implement refactorings. But during the process of software development, many developers do not know when to refactor and how to refactor. Thus, research on automated refactoring is a useful and valuable subject in software engineering.

This approach is based on Java source code, but it can also be used in other languages. In the approach, Java source code is represented as abstract syntax tree (AST). The AST maps plain Java source code in a tree form. This tree is more convenient and reliable to analyze and modify programmatically than text-based source [20]. Every Java source file is entirely represented as tree of AST nodes. These nodes are all subclasses of the ASTNode. Every subclass is specialized for an element of the Java programming language. For example, there are nodes for method declarations (Method- Declaration), package declaration (PackageDeclaration), variable declaration (VariableDeclarationFragment), assignment statements and so on. AST is used as an intermediate expression. The next identification and refactoring are based on the AST.

Table 1 Two commonly-used design patterns for handling complex conditional statements

Design pattern Intent Description

Factory method pattern

Define an interface for creating an object, but let subclasses decide which class to instantiate. Factory method lets a class defer instantiation to subclasses.

Split object creation code into several subclasses, avoid forming complex conditional statements for creating

different subclasses of a product inheritance hierarchy.

Strategy pattern Define a family of algorithms, encapsulate each one,

and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it.

Split different algorithms into a strategy inheritance hierarchy, provide an abstract strategy class to declare an

algorithm interface and a series of different concrete strategy subclasses to implement the algorithm.

Page 4: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1938

Figure 1 shows the overall of our approach of

refactoring process. The approach consists of three main stages: the transformation part for parsing and transforming Java source code to AST, the inferencing part for identifying refactoring opportunities and selecting appropriate design pattern, and the implementing part for optimizing and refactoring existing code.

Fig. 1 Overall of refactoring process

In Fig. 1, no matter which design pattern will be

used at the next stage, the first stage, called parsing AST, is in common. Eclipse is used as an integrated development environment (IDE) to analyze Java source code and to implement refactoring [21]. Eclipse provides Java development tools (JDT) and eclipse AST to handle Java source code. Eclipse JDT contains a group of API to access and operate source code, and it has two different ways to access Java source code: Java model and AST. Eclipse AST is an important part of eclipse JDT, which is defined in the package named org.eclipse.jdt.core.dom. AST includes some classes to modify, create, read, and delete source code. Eclipse AST is designed based on the factory method pattern and the visitor pattern [15], which is easy for developers to handle Java source code and has good expandability and flexibility.

The second stage of Fig. 1 is to identify refactoring opportunity and select suitable design pattern for refactoring. Complex conditional statements are identified in this stage. After detecting the conditional statements in source code, the application environment needs to be analyzed. Different identification algorithms are designed to detect different application contexts of complex conditional statements. In the next two sections, two identification algorithms are proposed for pattern-directed refactoring introducing the factory method pattern and the strategy pattern, respectively. If a refactoring opportunity is identified by an identification algorithm, the corresponding design patterns are selected to refactor the existing code.

After identifying refactoring opportunity and selecting appropriate design pattern, the last stage is to

execute automated optimization and refactoring to a specific design pattern. Some automated pattern-directed refactoring algorithms are also proposed in the next two sections. If users are not satisfied with the results of refactoring, they can modify the code after refactoring. As the overall structures of the code have been refactored, users only need to do some small modifications.

Automated identification refactoring opportunities and refactoring existing code can help software developers to look for refactoring time correctly and reduce the errors introduced during manual refactoring. 3 Factory method pattern-directed

refactoring 3.1 Overview of factory method pattern

The factory method pattern is also known as the virtual constructor pattern. Its intent is to define an interface for creating an object, but let subclasses decide which class is needed to instantiate. A class defer instantiation is used to subclass.

In the factory method pattern, there is an existing inheritance hierarchy called product, including an abstract product class and several concrete product classes. Before using the factory method pattern, a large factory class may exist in the source code, and it contains a huge factory method to create different concrete products. The factory method usually has a type code parameter, which is used to decide which concrete product will be created. Different parameters mean different kinds of the return products. The return type of the uniform factory method is the abstract product. Since there is only one factory method in the source code, the kinds of concrete product class are decided by conditional statements. If a new product is wanted to be created, existing codes need to be modified. If the conditional statements are used to decide which subclass in an inheritance hierarchy will be created, the factory method pattern offers a solution: it splits conditional logic to some new subclasses of the creator class. 3.2 Refactoring algorithm

In the factory method pattern, object creation code is split into several factory subclasses, thus, the factory method pattern can be avoided to form complex conditional statements for creating different subclasses of a product inheritance hierarchy. The factory method pattern refactors the long factory method to many factory subclasses, which provides a super factory class (abstract class or interface) to declare factory method, and also provides a series of concrete factory subclasses. Each subclass has a responsibility to create a specific and single product. If a new concrete product class is needed

Page 5: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1939

to add to the system, a new concrete factory class is added to create it. All of the existing codes do not need to be modified, and the system has better expansibility and flexibility.

Before implementing refactoring, the refactoring opportunities must be identified. A pattern-directed refactoring opportunities identification algorithm for refactoring the factory method pattern is proposed in this approach. The algorithms are listed in Table 2.

In Table 2, for each method in a Java file, firstly, the return type of the method is detected (Lines 4−8) firstly. If the return type is void, the method is impossible for a factory method. Furthermore, if the method has no parameter, it means that there is no type code parameter (Lines 9−13), and the method maybe does not have complex conditional statements. Moreover, if an IfStatement node exists, the refactoring opportunities are needed to further identify (Lines 14−39). For each IfStatement node, all the class instance creation nodes in the IfStatement are detected. The class name of the class instance creation node is got and all ancestor names of the class are saved into a list. For each ancestor class name in the list, if the method’s return type equals one of the ancestor classes’ types, it means that object of subclass of the return type is created in method body. This is an important symbol for refactoring the factory method pattern. If all the class names of class instance creation nodes in a branch are not the subclasses of the method’s return type, the factory method pattern can not be used for refactoring. Otherwise, if the aforementioned conditions are met and the parameter of the method is used in condition expression, the subclass name of the created product as the key and the branch statements as the value are added into a map named StatementBlockMap. At last, the StatementBlockMap is returned (Line 40).

After identifying the refactoring opportunities in source code, the pattern-directed refactoring is executed automatically. The factory method pattern-directed refactoring algorithm is listed in Table 3. The input of the algorithm is the map saving the subclass names of products and conditional statement blocks, and the output is the source code after refactoring. Firstly, the original factory class is modified to an abstract class (Line 3). Then, the parameter list and body of the original factory method are removed, and the method is also modified to an abstract method (Line 4−5). After that, for each key/value pair in the input map, new source files will be created to form the factory inheritance structure (Lines 6−11). The algorithm’s specific description is as follows: 1) a new concrete factory class file named key (Product name) + “Factory” is created

automatically, and a factory method is added into the new concrete factory class; 2) the corresponding value (branch statements) of map is added to the new method; 3) all of the new Java class files are saved.

Table 2 Factory method pattern-directed refactoring opportunities identification algorithm

Line Content

12

3456789101112131415161718

19

20

2122

2324252627282930

31

32

33

3435363738394041

Input: The source code before refactoring Output: A map saving subclass names of products and conditional statement blocks (null map means that source code does not need to refactor) for each method in Java file

if return type of method is void continue

else get return type

end if if method has a parameter

get parameter name else

continue end if define a variable named flag(its initial value is false)if an IfStatement node exists in method

for each IfStatement node assign false to flag for each ClassInstanceCreation node in IfStatement node

get class name of ClassInstance-Creation node save all of ancestor classes’ names of this class into a list for each ancestor class name in list

if return type name equals an ancestor class name

assign true to flag end if

end for end for if flag is false

break else

if parameter is used in condition expression

get subclass name of product innode of ClassInstanceCreation get statements in node of ExpressionStatement add subclass name of Product and statements into a map named StatementBlockMap, subclass name as key and statements asvalue

else break

end if end if

end for end if return StatementBlockMap

end for

Page 6: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1940

Table 3 Factory Method pattern-directed refactoring algorithm

Line Content

1 2 3 4 5 6 7 8 9

10 11

Input: Map saving subclass names of products and conditional statement blocks Output: Source code after refactoring modify original factory class to an abstract class remove parameter list and method body of method modify original factory method to an abstract method for each key/value in input map

create a new concrete Factory class file, class name iskey(Product name)+“Factory”, and define newconcrete Factory as subclass of original Factory add a method to concrete Factory class, and method has the same signature as method in abstract Factory classcopy value(branch statements) of map to new added method save the new class file

end for

3.3 Instance Considering a logger tool for recording logs of a software in Fig. 2, the tool provides many methods to save logs, such as database based logger, and log file based Logger. Each of the log recording methods is implemented by a logger class. Before using the factory method pattern, all kinds of the logger object are created by LoggerFactory class. In LoggerFactory, there is a factory method named createLogger() that can create various kinds of logger objects. In this factory method, conditional statements are unavoidable. In each branch of the conditional statements, a specific type code corresponds to decide which specific logger object will be created. The createLogger() method contains all code to judge the type of logger and create all logger objects, so it is a very long method. Therefore, it can cause many bad code smells such as long method, large class and switch statements. After refactoring, complex conditional statements are split by some subclasses of LoggerFactory called concrete factory. Each ConcreteFactory class encapsulates the code to create a specific logger object. If a new kind of logger is needed to add, a new concrete factory is added only, and no original source code needs to be modified yet. 4 Strategy pattern-directed refactoring 4.1 Overview of strategy pattern The strategy pattern is also a very commonly-used design pattern. It defines a family of algorithms, encapsulates each one, and makes them interchangeable. Strategy makes the algorithm vary independently from clients who use it. A function could be implemented by some different algorithms. Before using the strategy pattern, conditional statements are used to decide which algorithm will be selected. All of the algorithms are realized in a long

method, and they can replace each other. Such solution brings some bad code smell, such as long method, large class and switch statements. In addition, choice of algorithm is usually determined by the parameters of the method. To add a new algorithm means that a new branch is needed to add, the original code has to be changed. The system has poor scalability and reusability, and it is imperative to refactor. The strategy pattern provides a definition of an abstract type for a specific algorithm implementation and a series of different concrete subclasses corresponding to alternative implementation of the algorithm. 4.2 Refactoring algorithm In the strategy pattern, the definition and usage of the algorithms are separated. Context class uses algorithms, and strategy classes realize algorithms. The algorithms are broken away from the context class, forming an independent inheritance structure. Abstract strategy class declares an algorithm interface and concrete strategy classes implement algorithms. To add a new algorithm, a new concrete strategy class is added as the subclass of the abstract strategy class, so the original code does not need to modify. To replace an algorithm, alternative algorithm is injected to the context class in client class, and none of the existing codes in strategy class or context class is needed to change, so the system has better expansibility and flexibility.

The same as the factory method pattern-directed refactoring, complete algorithm of the strategy pattern- directed refactoring contains two parts: refactoring opportunities identification algorithm and refactoring algorithm. The strategy pattern-directed refactoring opportunities identification algorithm is listed in Table 4.

In Table 4, for each method in a Java file, firstly, if the method has no parameter, it means that there is no type code parameter (Lines 4−8), and the method maybe does not have complex conditional statements. Furthermore, if an IfStatement node exists, the refactoring opportunities are needed to be further identified (Lines 9−21). For each IfStatement node, if parameter of the method is used in conditional expression, the type code in the conditional expression and the statements in the node of ExpressionStatement are gotten, then the type code as the key and statements as the value are added into a map named Statement- BlockMap. At last, the StatementBlockMap is returned (Line 22).

After identifying the Strategy pattern refactoring opportunities in source code, the pattern-directed refactoring will be executed automatically. The strategy pattern-directed refactoring algorithm is listed in Table 5. The algorithm’s inputs are the original method name and the map saving the type codes and conditional statement

Page 7: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1941

Fig. 2 A refactoring instance introducing factory method pattern

blocks, and output is the source code after refactoring. Firstly, a new abstract class named strategy is created as the abstract strategy class (Line 3), then a method that has the same name with the first input parameter is added into the abstract strategy class (Line 4). Moreover, the original class (context class) is modified (Line 5), specifically, parameter of the original long method is deleted and a new parameter of type strategy named strategy is added, then a code line in this new method is added to call another method which is declared in the strategy class. After that, for each key/value in the input map, new source files will be created to form the strategy inheritance structure (Lines 6−11). The algorithm’s specific description is as follows: 1) a new concrete strategy class file named ConcreteStrategy + code type is

created automatically, which is the subclass of strategy; 2) a method that has the same signature as the method in abstract strategy class is added to concrete strategy class; 3) the corresponding value (branch statements) of map is added to the new method; 4) all of the new Java class files are saved. 4.3 Instance

Considering a cinema ticketing system in Fig. 3, the system can provide different ways of discount for different types of customers, such as student discount, children discount, and member discount. Before using the strategy pattern, type decision and ticket price calculation are implemented in the method of calculate() which is defined in the class of MovieTicket. It will

Page 8: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1942

Table 4 Strategy pattern-directed refactoring opportunities

identification algorithm

Line Content

1 2 3 4 5 6 7 8 9

10 11

12 13

14

15 16 17 18 19 20 21 22 23

Input: Source code before refactoring Output: A map saving type code and conditional statement blocks (Map is null means that source code does not need to refactor) for each method in Java file

if method has a parameter get parameter name

else continue

end if if an IfStatement node exists in method

for each IfStatement node if parameter is used in conditionalexpression

get type code in condition expressionget statements in node of Expression-Statement add type code and statements into amap named StatementBlockMap,with type code as key and statementsas value

else break

end if end for

else continue

end if return StatementBlockMap

end for Table 5 Strategy pattern-directed refactoring algorithm

Line Content

1 2 3 4 5 6 7 8 9

10 11

Input: Method name, map saving type codes and conditional statement blocks Output: Source code after refactoring create an abstract strategy class named Strategy add a method into abstract strategy class, and method name is the first input parameter modify original class, delete original parameter, change method parameter to a strategy object named strategy, and add a code line in this new method to call another method which is declared in Strategy class for each key/value in input map

create a new concrete strategy class file, class name is “ConcreteStrategy” + code type, and define newconcrete strategy as subclass of strategy add a method to concrete strategy class, and method has same signature as method in abstract strategy class copy value (branch statements) of map to new added method save new class file

end for cause many bad code smells such as long method, large class and switch statements. After refactoring,

conditional statements are split into a series of subclasses called ConcreteStrategys. Each ConcreteStrategy class encapsulates a way to calculate discount. If a new way is needed to add, a new ConcreteStrategy is added only, so no source code needs to be modified yet. 5 Experiments and results The evaluation of the refactoring algorithms is based on the manual refactoring of the candidate cases. In order to evaluate the correctness and effectiveness of the algorithms, a series of test cases are designed. All of the test cases are provided by some test members who do not participate in the above algorithms’ construction and implementation. The quality assessments of the refactoring algorithms are based on three metrics: namely precision, recall and accuracy. In order to calculate these metrics, four values are defined as follows. 1) True positive (TP): The number of the test cases which are correctly refactored by the algorithms. The refactoring opportunities are valid. 2) False positive (FP): The number of the test cases which are incorrectly refactored by the algorithms. The refactoring opportunities are invalid. 3) True negative (TN): The number of the test cases which are correctly rejected suggestions by the algorithms. The refactoring opportunities are invalid. 4) False negative (FN): The number of the test cases which are falsely rejected by the algorithms. The refactoring opportunities are valid. Precision (P), recall (R) and accuracy (A) are calculated on the values of TP, FP, TN and FN as following formulas.

PP

P

FT

TP

(1)

NP

P

FT

TR

(2)

NNPP

NP

FTFT

TTA

(3)

For each refactoring algorithm, 20 test cases are

designed and selected by 5 different testers, and some test cases are extracted from several famous Java class libraries and open Java projects, such as AWT, JUnit, JRefactory, and JEdit. After evaluating and calculating, the results are listed in Table 6.

To analyze the experiment results, both the two design patterns of refactoring algorithms are imperfect. There are some false positives and false negatives test cases.

The reasons of these (FP) existing in evaluating the strategy pattern are listed.

1) Some conditional statements are refactored to

Page 9: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1943

Fig. 3 Refactoring instance introducing strategy pattern

Table 6 Evaluation of refactoring algorithms

Design pattern

TP FP TN FN P/% R/% A/%

Factory method

13 0 2 5 100.00 72.22 75.00

Strategy 12 2 2 4 85.71 75.00 70.00

the strategy pattern, but there is no replaceable algorithm in each conditional branch. For example, some object creation statements decided by type code exist in each branch, which should have been refactored to the factory method pattern, but in the proposed approach, they also meet the opportunities of refactoring to the strategy pattern, thus the conditional statements will be refactored to the factory method pattern. It is a false positive

instance. 2) In the proposed approach, some simple instances are refactored to the strategy pattern. For example, in every conditional branch, there are only a few statements for variable assignment, such as “if (i==1) {j=1;} else if (i==2) {j=2;}……”. At this moment, refactoring is not an appropriate choice. The reasons for these false negatives (FN) are listed as follows: 1) The proposed approach refers to a parameter of the candidate method to refactor. However, a method may have more than one parameter. The existing approach can not deal with the satiation of multi parameters and can not identify the refactoring opportunities. 2) If there are nested conditional statements in the

Page 10: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1944

code before refactoring, the proposed approach can not handle the nested effectively. It will lead to losing some refactoring opportunities. 3) The type code is not passed by parameter, but as an attribute of the class. The attribute has a pair of getter method and setter method, and its value is assigned by the setter method. In the conditional expressions of the candidate method to refactor, an attribute is compared to decide which algorithm is used or object is created. Because of the defects of the proposed approach, it can not deal with the situation that acts as the type code in the current version. In the further work, the existing refactoring opportunities identification algorithm and automated algorithm will be improved to get better precision and accuracy. More complex situations will be considered in the further. Since pattern-directed refactoring aims at improving code quality, some software quality metrics are calculated to evaluate the refactoring effects. Cyclomatic complexity and method lines of code (MLOC) are used to assess the improvement after introducing the design patterns. The McCabe’s cyclomatic complexity (MCC) metric is used as a quantitative indicator of conditional complexity and its values before and after introducing design patterns are calculated. To evaluate the impact of the pattern-directed refactoring on software quality metrics, two new metrics named Mcc(avg) and Mloc(avg) are introduced. Mcc(avg) is the average value of McCabe’s cyclomatic complexity metric over all methods that declare a refactoring candidate identified by the proposed approach. Mloc(avg) is the average size in lines of code for methods declaring a refactoring candidate. The Mcc(avg) and Mloc(avg) are calculated by the following formulas:

n

i

n

i

N

j

iN

jiM

M

1om

1 1cc

cc

)(

),(

(avg)

om

(4)

n

i

n

i

N

j

iN

jiM

M

1om

1 1loc

loc

)(

),(

(avg)

om

(5)

where Nom means the number of candidate methods in a class. Mcc(i, j) and Mloc(i, j) mean the value of McCabe’s cyclomatic complexity and value of lines of code of the j-th candidate method in the i-th class, respectively.

Figure 4 shows the impact of the factory method pattern-directed refactoring on software quality metrics. In all of the 13 true positive test cases for refactoring to the factory method pattern, the Mcc(avg) of candidate

Fig. 4 Impact of factory method pattern-directed refactoring

on software quality metrics

factory methods is 4.6 before refactoring. After refactoring, the Mcc(avg) is 1.05, and Mloc(i, j) is 18.85 before refactoring and 7.6 after refactoring, respectively. Figure 5 shows the impact of the strategy pattern-directed refactoring on software quality metrics. In all of the 12 true positive test cases for refactoring to the strategy pattern, the Mcc(avg) is 6.15 before refactoring and 1.45 after refactoring, the Mloc(i, j) is 32.75 before refactoring and 14.2 after refactoring, respectively.

Fig. 5 Impact of strategy pattern-directed refactoring on

software quality metrics

The metrics show a considerable improvement to McCabe’s cyclomatic complexity and method lines of Code. This means that the proposed approach for pattern-directed refactoring succeeds to reduce code size and complexity of classes. At last, the execution time of the proposed algorithms is evaluated. Refactoring programs are performed on a 2.67 GHz dual core Intel processor and 4 GB DDR2 RAM. Table 7 lists the execution time of all 20 test cases for refactoring to the factory method pattern

Page 11: Automated pattern-directed refactoring for complex conditional statements

J. Cent. South Univ. (2014) 21: 1935−1945

1945

and the strategy pattern, respectively. The result shows that the factory method pattern-directed refactoring algorithm needs more execution time than the strategy pattern, since the identification algorithm of refactoring opportunities of the factory method pattern is more complex and more conditions need to be considered. In general, the execution time has a positive correlation to the system’s scale. Table 7 Execution time of refactoring algorithms

Design pattern Number of

classes Number of methods

Source lines of code

CPUtime/ms

Factory method 122 371 6964 12410

Strategy 20 28 917 836

6 Conclusions 1) Two algorithms for identification of refactoring opportunities to two different design patterns, the factory method pattern and the strategy pattern, are presented. According to the analysis of the AST generated from source code, the proposed approach can detect the opportunities for refactoring effectively. 2) Two automated refactoring algorithms for simplifying or removing complex conditional statements to design patterns are proposed. After identifying the refactoring opportunities, the original code is transformed to new structures based on design patterns. The processes are implemented automatically without human intervention. The refactoring can improve the quality of the design and structure of existing code. 3) Some instances are analyzed and refactored. By comparing the instance codes before and after refactoring, the advantages of two design patterns are verified. The empirical analysis explains that the code after refactoring has better maintainability and extensibility. 4) Quality assessment and evaluation results are calculated to support the proposed approach. The results show a considerable improvement in McCabe’s cyclomatic complexity and method lines of code. This means that the proposed approach for automated pattern-directed refactoring succeeds to reduce code size and complexity of classes. References [1] FOWLER M, BECK K. Refactoring−Improving the design of

existing code [M]. Massachusetts: Addison-Wesley, 1999: 63−71.

[2] VAKILIAN M, CHEN N, NEGARA S, RAJKUMAR A B, BAILEY

P B, JOHNSON E R. Use, disuse, and misuse of automated

refactorings [C]// Proceedings of the 34th International Conference

on Software Engineering (ICSE). Zurich, 2012: 233−243.

[3] MURPHY-HILL E, PARNIN C, BLACK A P. How we refactor, and

how we know It [J]. IEEE Transaction on Software Engineering,

2012, 38(1): 5−18.

[4] DALLAL J A. Constructing models for predicting extract subclass

refactoring opportunities using object-oriented quality metrics [J].

Information and Software Technology, 2012, 54(10): 1125−1141.

[5] TSANTALIS N, CHATZIGEORGIOU A. Identification of move

method refactoring opportunities [J]. IEEE Transaction on Software

Engineering, 2009, 35(3): 347−367.

[6] TSANTALIS N, CHATZIGEORGIOU A. Identification of extract

method refactoring opportunities [C]// Proceedings of the 13th

European Conference on Software Maintenance and Reengineering

(CSMR’09). Kaiserslautern, 2009: 119−128.

[7] TSANTALIS N, CHATZIGEORGIOU A. Identification of extract

method refactoring opportunities for the decomposition of methods

[J]. Journal of Systems and Software, 2011, 84(10): 1757−1782.

[8] TSANTALIS N, CHATZIGEORGIOU A. Identification of

refactoring opportunities introducing polymorphism [J]. Journal of

Systems and Software, 2010, 83(3): 391−404.

[9] FOKAEFS M, TSANTALIS N, STROULIA E,

CHATZIGEORGIOU A. Identification and application of extract

class refactorings in object-oriented systems [J]. Journal of Systems

and Software, 2012, 85(10): 2241−2260.

[10] BAVOTA G, De LUCIA A, OLIVETO R. Identifying extract class

refactoring opportunities using structural and semantic cohesion

measures [J]. Journal of Systems and Software, 2011, 84(3):

397−414.

[11] LIU Hui, NIU Zhen-dong, MA Zhi-yi, SHAO Wei-zhong.

Identification of generalization refactoring opportunities [J].

Automated Software Engineering, 2013, 20(1): 81−110.

[12] HIGO Y, KUSUMOTO S, INOUE K. A metric-based approach to

identifying refactoring opportunities for merging code clones in a

Java software system [J]. Journal of Software Maintenance and

Evolution: Research and Practice, 2008, 20(6): 435−461.

[13] KOMONDOOR R, HORWITZ S. Using slicing to identify

duplication in source code [C]// Proceedings of the 8th International

Symposium on Static Analysis. Paris, 2001: 40−56.

[14] KERIEVSKY J. Refactoring to patterns [M]. Massachusetts:

Addison-Wesley, 2004: 52−54.

[15] GAMMA E, HELM R, JOHNSON R, VLISSIDES J. Design patterns:

Elements of reusable object-oriented software [M]. Massachusetts:

Addison-Wesley, 1995: 8−9.

[16] CHRISTOPOULOU A, GIAKOUMAKIS E A, ZAFEIRIS V E,

SOUKARA V. Automated refactoring to the Strategy design pattern

[J]. Information and Software Technology, 2012, 54(11): 1202−1214.

[17] JEBELEAN C, CHIRILA C B, CRETU V. A logic based approach to

locate composite refactoring opportunities in object-oriented code

[C]// Proceedings of the 2010 IEEE International Conference on

Automation Quality and Testing Robotics (AQTR). Cluj-Napoca,

2010: 1−6.

[18] JEON S U, LEE J S, BAE D H. An automated refactoring approach

to design pattern-based program transformations in java programs

[C]// Proceedings of the 9th Asia-Pacific Software Engineering

Conference (APSEC). Queensland, 2002: 337−345.

[19] JUILLERAT N, HIRSBRUNNER B. Toward an Implementation of

the “Form Template Method” [C]// Proceedings of the 7th IEEE

International Working Conference on Source Code Analysis and

Manipulation (SCAM 2007). Paris, 2007: 81−90.

[20] KUHN T, THOMANN O. Eclipse corner article abstract syntax tree

[EB/OL]. [2006−11−20]. http://www.eclipse.org/articles/Article-

JavaCodeManipulation_AST/index.html.

[21] Eclipse-The Eclipse Foundation open source community website.

[EB/OL]. [2013−04−16]. http://www.eclipse.org/.

(Edited by FANG Jing-hua)