Upload
alma
View
32
Download
0
Embed Size (px)
DESCRIPTION
A clone detection approach for a collection of similar large-scale software products. Eunjong Choi† , Norihiro Yoshida‡, Yoshiki Higo†, Katsuro Inoue† †Osaka University ‡Nara Institute of Science and Technology. Software Development for Mobile Device (1/2). - PowerPoint PPT Presentation
Citation preview
Department of Computer Science, Graduate School of Information Science & Technology,Osaka University
A clone detection approach for a collection of similar large-scale software products
Eunjong Choi†, Norihiro Yoshida‡, Yoshiki Higo†, Katsuro Inoue†
†Osaka University‡Nara Institute of Science and Technology
1
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Software Development for Mobile Device (1/2)
Releases a new model in regular and rapid rushed intervals
Adapts to various country constraints and needse.g. Oshaifu-Keitai for Japan
2
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Software Development for Mobile Device (2/2)
Develop software by reusing common pieces and implement unique pieces for each feature.
3
+ + +
Reusedpieces
Unique features
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Reused Source Code Pieces
Source code is reused in code fragment level (code clones) and file level
.
Detecting and managing reused pieces is necessarye.g. Inconsistency management, plagiarism detection
4
A code clone : a code fragment that has lexically, syntactically, or semantically similar code fragments in source code
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Code Clone
Generated by:Code reuse by copy & pasteStereotyped functions or tool generated code
5
Code Clone Code Clone
Code Clone
A clone set: A set of code clones that are similar or identical to each other
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Code Clone Detection
Various detection techniques and tools have been proposed e.g. Text-based and line-based(dup)[Baker1995],
token-base(CP-minder)[Li2006]
CCFinder [Kamiya2002] A token-base clone detection toolMulti language support (C, C++, COBOL, Java, ...)Good speed (5MLOC/20m)
6
[Baker1995] B. S. Baker. On nding duplication and near-duplication in large software systems. In Proc. of WCRE, pages 86, July 1995.[Li 2006] Z. Li, S. Lu, S. Myagmar and Y. Zhou. CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code. IEEE Transactions on Software Engineering, 32: pages 176-192, 2006[Kamiya2002] T. Kamiya, S. Kusumoto and K. Inoue. CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code. IEEE TSE, 28: 654-670, 2002
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example of Clone Detection Technique: CCFinder
Source files
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Code clones
1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }
static void foo ( ) { String a
[ ] = new String [ ] { "123,400" ,
"abc" , "orange 100" } ;
int sum = 0
; for ( int i = 0 ; i <
a . length ; ++ i )
sum
+= pat . getParen 0
; System . out . println ( "sum = "
+ sum ) ; }
throws RESyntaxException
Sample . parseNumber (
) )
if pat
. match a [ i ]( ) )
org . apache . regexp
. RE pat = new org . apache . regexp
. RE ( "[0-9,]+" ) ;
static void goo (
) {
String
a [ ]
int sum = 0
; for ( int i = 0 ; i <
a . length ; ++ i )
System . out . println ( "sum = " + sum
) ; }
throws RESyntaxException
if exp
. match a [ i ]( ) )
exp =
new RE ( "[0-9,]+" ) ;
(
RE
sum
+= exp . getParen 0
;
parseNumber ( ) )(
(
(
[ ] = new String [ ] {
} ;
int sum = 0
; for ( int i = 0 ; i <
a . length ; ++ i )
sum
+= pat . getParen 0
; System . out . println ( "sum = "
+ sum ) ; }
Sample . parseNumber (
) )
if pat
. match a [ i ]( ) )
pat = new
RE ( "[0-9,]+" ) ;
static void goo (
) {
String
a [ ]
int sum = 0
; for ( int i = 0 ; i <
a . length ; ++ i )
System . out . println ( "sum = " + sum
) ; }
throws RESyntaxException
if exp
. match a [ i ]( ) )
exp =
new RE ( "[0-9,]+" ) ;
(
RE
sum
+= exp . getParen 0
;
parseNumber ( (
(
(
static void foo ( ) { String athrows RESyntaxException
$
RE
$ . ) )
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
[ ] = new String [ ] {
} ;
int sum = 0
; for ( int i = 0 ; i <
a . length ; ++ i )
sum
+= pat . getParen 0
; System . out . println ( "sum = "
+ sum ) ; }
Sample . parseNumber (
) )
if pat
. match a [ i ]( ) )
pat = new
RE ( "[0-9,]+" ) ;
static void goo (
) {
String
a [ ]
int sum = 0
; for ( int i = 0 ; i <
a . length ; ++ i )
System . out . println ( "sum = " + sum
) ; }
throws RESyntaxException
if exp
. match a [ i ]( ) )
exp =
new RE ( "[0-9,]+" ) ;
(
RE
sum
+= exp . getParen 0
;
parseNumber ( ) )(
(
(
static void foo ( ) { String athrows RESyntaxException
$
RE
$ .
[ ] = [ ] {
} ;
=
; for ( = ; <
. ; ++ )
+= .
; . . (
+ ) ; }
. (
) )
if
. [ ]( ) )
=
( ) ;
static (
) {[ ]
=
; ( = ; <
. ; ++ )
. . ( +
) ; }
throws
if
. [ ]( ) )
=
new ( ) ;
(
+= .
;
( ) )(
(
(
static $ ( ) {throws
$
$ .
$ $ $ $
$ $
$ $
$ $ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $ $
new
for for
new
[ ] = [ ] {
} ;
=
; for ( = ; <
. ; ++ )
+= .
; . . (
+ ) ; }
. (
) )
if
. [ ]( ) )
=
( ) ;
static (
) {[ ]
=
; ( = ; <
. ; ++ )
. . ( +
) ; }
throws
if
. [ ]( ) )
=
new ( ) ;
(
+= .
;
( ) )(
(
(
static $ ( ) {throws
$
$ .
$ $ $ $
$ $
$ $
$ $ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $
$ $ $ $ $
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
7
1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }
Lexical analysis
Transformation
Token sequence
Match detection
Transformed token sequence
Clones on transformed sequence
Formatting
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Problem of Code Clone Detection Tools
Take enormous time for existing tools to detect code clones on large-scale software
Suggest an approach for detecting code clone for a collection of similar large-scale software productsExcluding detecting code clones among each
set of files that are identical each other
8
A identical file set : A set of files that are identical each other
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Overview Of Our Approach
Step1. Calculate MD5 hash. Step2. Prepare Input Files for CCFinder Step3. Detect code clones using CCFinder Step4. Generate all clone sets
9
(1) Calculate MD5 Hash
(2) Prepare Input Files
for CCFinder
(3) Detect Code Clones Using
CCFinder
(4) Generate All Clone
Sets
Hashed files
Input Files for
CCFinder
Identical File Sets
Clone Sets
All Clone Sets
Source Files
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Step1. Calculate MD5 Hash
Creates MD5 hash value of input filesMD5 hash does not require any large
substitution tables
10
Ce9e187434e35746abf2C9ad2A77bdd27ed90608d1262244897ccd1164dc3
------------------------------------------------------
Calculate
MD5 Hash
Source Files Source Files
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Step2. Prepare Input Files for CCFinder (1/2)
Detect identical file sets
11
0cc0cc 0cc
A1 A2 A3 C1 D1
175 A9
. . . .be05be05
. . . .
B2B1
MD5 Hash
Identical File Set A
Identical File Set B
Identical File SetsInput Files for CCFinder
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 12
C1 D1
175 A9
. . . . be05be05
B2B1
. . . .
Identical File Set A
Identical File Set B
Identical File SetsInput Files for CCFinder
0cc0cc 0cc
A1 A2 A3
Step2. Prepare Input Files for CCFinder (2/2)
Prepare Input Files for CCFinder
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Step3. Detect Code Clones
Use CCFinder to detect code clones
13
C1 D1
. . . .
B2B1
. . . .
Identical File Set A
Identical File Set B
Identical File Sets
A1 A2 A3
Detected Code Clones
Code Clones Detected by CCFinder
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Step3. Detect Code Clones
Use CCFinder to detect code clones
14
C1 D1
. . . .. . . .
Identical File Set A
Identical File Set B
Identical File Sets
A1 A2 A3
Detected Code Clones
B2B1 D1
Clone Sets Detected by CCFinder
Clone Set 1
Clone Set 2
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Step4. Generate all clone sets
Generate all clone sets
15
A1 A2 A3
. . . .
B2B1
. . . .
Identical File Set A
Identical File Set B
A2 A3
C1 D1 A1
D1 B2B1
All Clone Sets
Clone Set 1
Clone Set 2. . . .
Identical File Sets
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Overview of Case Study (2/2)
ApproachCompare detection time between our method and
using only CCFinderConfirm that the detection result of our method is
the same as the one of using only CCFinder
Detection Environment64 bits Windows 7 Professional workstation
equipped with 2 processors and 24 gigabytes of main memory.
16
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Results of Case Study (1/2)
Detection TimeOur approach detects code clones faster than
using only CCFinder.
17
Project Name Only CCFinder Our Approach#Clone Sets
Time (sec.)
#Clone Sets
#Identical File Sets
Time (sec.)
Apache ant 11,169 241 10,692 3,083 89Linux kernel 24,235 1,119 21,343 1,967 168Galxy y pro(GT-B5510 model
325,274 113,445
148,761 28,181 11,902
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Results of Case Study (2/2)
Accuracy of Results : manually checked outputsArbitrary selected 30 clone sets that are
detected by our approach from each OSSSelected 30 identical file sets from each OSS
project.
18
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Summary
Suggest an approach for detecting code clones for a collection of similar large-scale software products. MD5 hash to identify identical file setsCCFinder to detect code clones.
Apply our approach to three OSS projects and compared code clone detection time between using only CCFinder and our approach.Our approach takes shorter time to detect code clones.
19
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Future Work
Improve for detecting files with slightly modification as identical file setsOur current approach detects file that are
identical each as a identical file set Apply to various size of software projects in
different domains Introduce other code clones detection tools
and compare results from them in the case study
20
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Thank you for your attentions
21