Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi...

Extracting a Unified Directory Treeto Compare Similar Software Products

Yusuke Sakaguchi, Takashi Ishio,

Tetsuya Kanda, Katsuro Inoue

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

2Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Background: Similar software products

• Clone-and-own approach– Copying and modifying an existing product– Importing libraries

AVer.1.0

Fix a bug

Clone-and-own

AVer.1.1

BVer.1.0

BVer.1.1

CVer.1.0

Clone-and-own

Must fix the bug

Software Product Line Engineering

• Promising for managing derived software products

• To apply the approach to existing products, developers need to understandthe commonalities and variabilities of them.– Duszynski et al. [1] proposed to compare source code

in corresponding directories among similar products.• Developers must know corresponding directories among

products (which directories should be compared).

[1] S. Duszynski, J. Knodel, and M. Becker“Analyzing the source code of multiple software variants for reuse potential,” Reverse Engineering, 2011

The correspondence of directories

• Some existing techniques [2], [3] can extract corresponding directories between two similar products.

• Cannot analyze more than two productsat a time.

[2] K. Yoshimura, D. Ganesan, and D. Muthig“Assessing merge potential of existing engine control systems into a product line” Software Engineering for Automotive Systems, 2006[3] D. Holten and J. J. van Wijk“Visual comparison of hierarchically organized data”Joint Eurographics / IEEE - VGTC Conference on Visualization, 2008

The proposed method

• Automatically extract a unified directory tree representing corresponding directories among multiple products

• Key idea

– Corresponding directories have similar source files

S1 S2 S3

src src src

Our prototype• Input

– Source code of some products

• Output– a unified directory tree

S1 S2 S3

src src

a b ca b

S1 S2 S3

src src src

Directory trees

Directory graph

Spanning tree

Node of directory graph

• A node is a set of directories that contain similar source files

• Two directories are represented by a single node

if • sim(d1, d2) : a content similarity metric for two

directories in different products

• th : a predetermined threshold

Similarity

• the Jaccard similarity coefficient

– a set of lines extracted from all non-binary files directly contained in a directory d

Example

S1 S2 S3

src src

a b ca b

• The same color nodes are similar.• White nodes have no files.

Example

Special Treatment

• Directories without files

A single node represents such directories

if their subdirectories are represented

by the same node

• Root directoriesA root node represents all the root directories

of products, irrespective of product similarity.

Example

S1 S2 S3

src src

a b ca b

Example

S1 S2 S3

src src src

Connecting nodes

• Weighted arcs

– The number of parent-subdirectory relationship

among two nodes. S1 S2 S3

src src src

S1 S2 S3

src src src

parent-subdirectory relationship

Greedy tree extraction

1. Initialize a spanning tree – represents a set of vertices– represents a set of selected arcs

2. Select the arc having the maximum weight among – If the weights are identical,

select that which is closest to

3. Update

4. Repeat Steps 2 and 3 until includes all nodes.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Greedy loop

S1 S2 S3

src src src

Directory Viewer

Tree View• shows the unified directory tree.• If a node contains multiple directories

having different source code,the node is colored in blue.

Different codeincluded

Directory Viewer

File List View• shows a table of file names and their MD5

hash values contained in a selected node.• A black hash value indicates that the file

content is different from another product.

Differentcontents

Directory Viewer

File Matrix View• shows the similarity of files having the

same file name selected in a file list view.• A darker (yellow/red) color indicates a

lower similarity.

Case study• Input products

– Android OS : 4.2– CPU : Qualcomm MSM8974

Product Vendor MobileNetwork Operator Release #Dirs

FJL22 Fujitsu au 2013/11 7,683

301F Fujitsu SoftBank 2013/12 7,708

F-01F Fujitsu NTT DOCOMO

2013/10 7,582

SO-01F Sony NTT DOCOMO

2013/10 5,840

Output

• A predetermined similarity threshold th = 0.8

• Execution time ： 42 minutes– Two Intel Xeon (2.27Ghz, 4cores)

– 24GB RAM

• The resultant unified directory tree comprises 9,037 nodes.

– 673 nodes contain different contents from other products.

Example: kernel node --------------Fujitsu-------------- --Sony--

Makefile

AndroidKernel.mk

Developer-specific options for the kernel build is mainly written in this file.

Example: Product-specific directory

(Fujitsu)external

(Sony)external

chronium chronium

external(Fujitsu)chronium

(Sony)chronium

(Fujitsu/Sony)Common

subdirectories

(Sony)Unique

subdirectories

Different

Common

Unique to Sony

• Input• Output

Conclusion and Future Work

• Our tool visualizes a unified directory tree.• We conducted a case study using four

Android products.– The tool enabled us to quickly focus on

directories having different contents.• Future work: Controlled experiment to

evaluate – The effectiveness for source code comparison

tasks.

Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi...

Documents

Extracting SAMPA response function · 2018. 5. 30. · Extracting SAMPA response function Pulse from Shaper 12 ns 0 200 400 600 800 1000 0 0.5 1 1.5 2 χ2 / ndf 14.39 / 19 Prob 0.7606

An Effective Method to Control Interrupt Handler for Data Race Detection Makoto Higashi †, Tetsuo Yamamoto ‡, Yasuhiro Hayase †, Takashi Ishio † and Katsuro

Extracting biological names and relations from texts

文献紹介：Extracting Opinion Expression with semi-Markov Conditional Random Fields

Extracting Spatial Knowledge from the Web - Kevin McCurley

Extracting Depth and Matte using a Color-Filtered Aperture

Extracting Semantic User Networks from Informal Communication Exchanges

Extracting text from PDF (iOS)

UNIVERSITY OF CALIFORNIA RIVERSIDE Extracting Actionable

“Nichigetsu-sama” [Soliluna], un cuento de Ango Sakaguchi ...Junto a Osamu Dazai, Sakunosuke Oda y otros, Ango Sakaguchi fue catalogado por la crítica como parte de la Burai-ha,

Extracting Market Expectations from Option Prices: Case

Natural Language Processing for Extracting Knowledge from

(131116) #fitalk extracting user typing history on bash in mac os x memory

Extracting Structured Information from Wikipedia Articles ... · Technische Berichte Nr. 38 des Hasso-Plattner-Instituts für Softwaresystemtechnik an der Universität Potsdam Extracting

SUICIDAL WRITERS: EXTRACTING THE STONE OF … · escritoras suicidas: extracciÓn de la piedra de la locura de alejandra pizarnik. suicidal writers: extracting the stone of madness

Extracting biclusters of similar values with Triadic Concept Analysis

Extracting Vanishing Points across Multiple Views

Extracting symbols from thumbnails with perl

1 Overview of Component Search System SPARS-J Tetsuo Yamamoto,Makoto Matsushita, Katsuro Inoue Japan Science and Technology Agency **Osaka University

A Heat Method of Extracting Myzus persicae Sulzer ...psasir.upm.edu.my/2142/1/A_Heat_Method_of_Extracting_Myzus_persicae... · A Heat Method of Extracting Myzus persicae Sulzer (Homoptera: