Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi...

Preview:

Citation preview

Extracting a Unified Directory Treeto Compare Similar Software Products

Yusuke Sakaguchi, Takashi Ishio,

Tetsuya Kanda, Katsuro Inoue

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

2Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Background: Similar software products

• Clone-and-own approach– Copying and modifying an existing product– Importing libraries

AVer.1.0

Fix a bug

Clone-and-own

t

AVer.1.1

BVer.1.0

BVer.1.1

CVer.1.0

Clone-and-own

Must fix the bug

3Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Software Product Line Engineering

• Promising for managing derived software products

• To apply the approach to existing products, developers need to understandthe commonalities and variabilities of them.– Duszynski et al. [1] proposed to compare source code

in corresponding directories among similar products.• Developers must know corresponding directories among

products (which directories should be compared).

[1] S. Duszynski, J. Knodel, and M. Becker“Analyzing the source code of multiple software variants for reuse potential,” Reverse Engineering, 2011

4Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

The correspondence of directories

• Some existing techniques [2], [3] can extract corresponding directories between two similar products.

• Cannot analyze more than two productsat a time.

[2] K. Yoshimura, D. Ganesan, and D. Muthig“Assessing merge potential of existing engine control systems into a product line” Software Engineering for Automotive Systems, 2006[3] D. Holten and J. J. van Wijk“Visual comparison of hierarchically organized data”Joint Eurographics / IEEE - VGTC Conference on Visualization, 2008

5Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

The proposed method

• Automatically extract a unified directory tree representing corresponding directories among multiple products

• Key idea

– Corresponding directories have similar source files

6Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

S1 S2 S3

src src src

a a a

b b b

c c

Our prototype• Input

– Source code of some products

• Output– a unified directory tree

b

a c

src

S1 S2 S3

src src

a b ca b

S1 S2 S3

src src src

a a a

b b b

c c

3

3 2

2

1

Directory trees

Directory graph

Spanning tree

7Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Node of directory graph

• A node is a set of directories that contain similar source files

• Two directories are represented by a single node

if • sim(d1, d2) : a content similarity metric for two

directories in different products

• th : a predetermined threshold

8Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Similarity

• the Jaccard similarity coefficient

– a set of lines extracted from all non-binary files directly contained in a directory d

9Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

b

a c

src

S1 S2 S3

src src

a b ca b

• The same color nodes are similar.• White nodes have no files.

10Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

a a a

b b b

c c

11Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Special Treatment

• Directories without files

A single node represents such directories

if their subdirectories are represented

by the same node

• Root directoriesA root node represents all the root directories

of products, irrespective of product similarity.

12Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

b

a c

src

S1 S2 S3

src src

a b ca b

13Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

S1 S2 S3

src src src

a a a

b b b

c c

14Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Connecting nodes

• Weighted arcs

– The number of parent-subdirectory relationship

among two nodes. S1 S2 S3

src src src

a a a

b b b

c c

S1 S2 S3

src src src

a a a

b b b

c c

3

3 2

2

1

parent-subdirectory relationship

15Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Greedy tree extraction

1. Initialize a spanning tree – represents a set of vertices– represents a set of selected arcs

2. Select the arc having the maximum weight among – If the weights are identical,

select that which is closest to

3. Update

4. Repeat Steps 2 and 3 until includes all nodes.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Greedy loop

19

S1 S2 S3

src src src

a a a

b b b

cc

3

3 2

2

1

17Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Directory Viewer

Tree View• shows the unified directory tree.• If a node contains multiple directories

having different source code,the node is colored in blue.

Different codeincluded

18Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Directory Viewer

File List View• shows a table of file names and their MD5

hash values contained in a selected node.• A black hash value indicates that the file

content is different from another product.

Differentcontents

19Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Directory Viewer

File Matrix View• shows the similarity of files having the

same file name selected in a file list view.• A darker (yellow/red) color indicates a

lower similarity.

20Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Case study• Input products

– Android OS : 4.2– CPU : Qualcomm MSM8974

Product Vendor MobileNetwork Operator Release #Dirs

FJL22 Fujitsu au 2013/11 7,683

301F Fujitsu SoftBank 2013/12 7,708

F-01F Fujitsu NTT DOCOMO

2013/10 7,582

SO-01F Sony NTT DOCOMO

2013/10 5,840

21Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Output

• A predetermined similarity threshold th = 0.8

• Execution time : 42 minutes– Two Intel Xeon (2.27Ghz, 4cores)

– 24GB RAM

• The resultant unified directory tree comprises 9,037 nodes.

– 673 nodes contain different contents from other products.

22Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example: kernel node --------------Fujitsu-------------- --Sony--

Makefile

AndroidKernel.mk

Developer-specific options for the kernel build is mainly written in this file.

23Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example: Product-specific directory

(Fujitsu)external

(Sony)external

chronium chronium

external(Fujitsu)chronium

(Sony)chronium

(Fujitsu/Sony)Common

subdirectories

(Sony)Unique

subdirectories

Different

Common

Unique to Sony

• Input• Output

24Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion and Future Work

• Our tool visualizes a unified directory tree.• We conducted a case study using four

Android products.– The tool enabled us to quickly focus on

directories having different contents.• Future work: Controlled experiment to

evaluate – The effectiveness for source code comparison

tasks.

25Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Recommended