1 Overview of Component Search System SPARS-J Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue**...

Preview:

DESCRIPTION

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3 Motivation Reuse of Software Components is a technique of developing new software components by using the components developed in the past. Example of reusable components: source code, document ….. improves productivity and quality, and cuts down development cost as a result. However, reuse of components is not utilized effectively. A developer doesn’t know existence of desirable components. Although there are a lot of components, these components are not organized. In order to take advantage of reuse, it is required to manage components and search suitable component easily

Citation preview

1

Overview of Component Search System SPARS-J

Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue**

*Japan Science and Technology Agency**Osaka University

2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

OutlineMotivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

MotivationReuse of Software Components

is a technique of developing new software components by using the components developed in the past.

Example of reusable components: source code, document …..improves productivity and quality, and cuts down development cost as a result.

However, reuse of components is not utilized effectively.A developer doesn’t know existence of desirable components.Although there are a lot of components, these components are not organized.

In order to take advantage of reuse, it is required to manage components and search suitable component easily

4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Research aimWe have built the system which have functions as follows

Collects software components eagerly without preserving their inherent structuresManages the component information automaticallyProvides component be suitable for User’s request

TargetsIntranet

closed software development inside a companyInternet

Large open source software development web site– SourceForge, Jakarta Project. etc.

5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

OutlineMotivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

SPARS-J(Software Product Archive , analysis and Retrieval System for Java)

Java Software Product Archiving, analyzing and Retrieving System

Many components are analyzed automatically. A search engine is built based on the analysis information.Component: a source code of class or interface

FeaturesKeyword searchTwo ranking methods

Frequency in use of a wordUse relation

Analyzed informationComponents using/used by a componentPackage hierarchy

7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Structure of SPARS-J

Component analysis part・ extract components from a file・ store analyzed information to DB・ clustering and rank components using DB

Database

File

Analyzedinformation

・ store analyzed information and component

Component retrieval part・ search components in correspondence with query from DB・ rank components based on frequency in use of a keyword・ aggregate two rankings

User

User interface partQuery

Result

・ deliver query to component retrieval part・ show search results  

QueryHit components

Library(Java source files)

Componentinformation

8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking search resultsRanking method

1. Component suited to a user request– Ranking based on frequency in use of a word

2. Component used mostly– Ranking based on component use relation

We make it high ranking that the component both 1 and 2 are high

Search results are shown to aggregate two ranks

Keyword Rank (KR)

Component Rank (CR)

9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

OutlineMotivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component analysis part

Extract component and its information from a Java source fileThe process

Extract a componentIndex the componentExtract use relationsClustering similar componentsRank components based on use relations (CR method)

11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extract and index a component

Extracting componentFind class or interface block in a java source file

Location information in the file (start line number, end line number)

IndexingExtract index key from the component

Index key : a word and the kind of itNo reserved words are extracted

Count frequency in use of the word

word kindSort Class namequicksort Commentquicksort Method

namepivot Variable

namequicksort Method call

: :Index key

public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); }}

11112:

frequency

12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extract use relationsExtract use relations among components using semantic analysisMake component graph from use relations

Node: componentEdge: use relation Inheritance

Interfaceimplementati

onVariable type

Instance creation

Field accessMethod callThe kind of use relation

public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : }}

Sort

Data

TestComponent graph

InheritanceField access

Method call

13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Similar componentSimilar component is copied component or minor modified componentWe merge similar components into single componentMerged component have use relations that all component before merging have

C

B F

A D

G

EComponent graph

BF

AD E

C G

Clustered component graph

C

B F

A D

G

E

14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clustering componentsWe measure characteristics metrics to merge componentsThe difference ratio of each component metrics

Metricscomplexity

– The number of methods, cyclomatic, etc. – represent a structural characteristic

Token-composition– The number of appearances of each token– represent a surface characteristic

15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking based on use relation

Component Rank (CR)Reusable component have many use relation

The example of use is muchGeneral purpose componentSophisticated component

We measure use relation quantitatively, and rank components

The component used by many components is importantThe component used by important component is also important

Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank: Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.

16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.34 0.33

0.33

0.17

0.17

0.330.33

Ad-hoc weights are assigned to each node

17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.33 0.17

0.5

0.175

0.175

0.170.5

The node weights are re-defined by the incoming edge weights

18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.5 0.175

0.345

0.25

0.25

0.1750.345

We get new node weights

19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.4 0.2

0.4

0.2

0.2

0.20.4

• We get stable weight assignment next-step weights are the same as previous ones

• Component Rank : order of nodes sorted by the weight

20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

OutlineMotivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component retrieval part

Search components from database, rank components The process

Search componentsRanking suited to a user requestAggregate two ranks (CR and KR)

22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Search componentsSearch query

Words a user inputThe kind of an index word, package name

Components contain given query are searched from Database

23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking suited to a user request

Keyword Rank (KR)Components which contain words given by a user are searchedRank components using the value calculated from index word weight Index word weight

– Many frequency in use of a component– A word contained particular components– A word represent the component function such as Class

nameSort the sum of all given word weightTF-IDF weighting using full-text search engine

24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Calculation of KR valueCalculate weight Wct with component c word tTFi: The frequency with which a kind i of word t occurs in component c IDF: the total number of components / the number of components containing word tkwi: Weight of a kind i

KR value is the sum of all word Wct

kindall

iict IDFTFkww ) (

the kind of a word

weight

Class name 200Interface name 50Method name 200Package name 50

Import 30Method call 10Field access 10Variable type 10

Instance creation

10

Local var access 1Comment 30

Doc comment 50Line comment 10

String 1

25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Aggregate two ranksAggregate two ranks KR and CRAggregation method

Borda Count method known a voting systemUse for single or multiple-seat electionsThis form of voting is extremely popular in determining awards

SPARS-JRank components both KR and CRUsing KR and CR, the component that be suitable user’s request, reusable and sophisticated

26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Borda Count methodThere are 10 voters and 5 candidates (from A to E) Each voter rank candidates1 point for last place, 2 points for second from last place …, and N points for first place1st=5points , 2nd=4points ,…

A : 15+3+6+4=28pointsB : 38pointsC : 38pointsD : 22pointsE : 26points

1st

2nd

3rd

4th

5th

3 A B C D E

3 E B C D A

2 C B A E D

2 C D B A E

1st

1st

3rd

4th

5th

B C A D E

Aggregation

27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

OutlineMotivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

User interfaceReceive a user’s query and provide the search results through Web browser

Microsoft Internet Explore, Mozilla, etc.

The processParse query word and the search conditionShow rank ordered resultsShow analyzed information of the component

Used by/Using the componentMetrics

29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Analyzed informationA component information are as follows

MetricsThe number of method, variableLOC, cyclomaticEtc. (measurable metrics in the component itself)

Components used by/using the componentShow lists of nodes followed use relation

Components that are similar to the component

Show lists of similar components

30Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Package browsingThe naming structure for Java packages is hierarchical

A user can search lists of components in same package of a component easily

31Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (top page)

32Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (search results)

33Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (source code)

34Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (similar components)

35Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (using the component)

36Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (used by the component)

37Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (package browsing)

38Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

OutlineMotivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

39Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment(1/2)Comparison with Google

Register about 130,000 components get from InternetQuery words ‘calculator applet’ and ‘chat server client’

Calculate relevance ratio of 10 rank higherRelevance: The component is reusable source code

Google is a web search engine…Add ‘java source’ term to the query wordsFollow one link from the result web page

40Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment(2/2)Example 1 :

”calculator applet”SPARS-J

9 hits7 suited components

Example 2 :”chat server client”SPARS-J

69 hits57 suited components

Using SPARS-J, suited component is high order

orderrank componentrelevant ofnumber Theratio relevance

SAPRS-J Google SPARS-J Google

order

Relevance

Ratio Relevance

Ratio Relevance

Ratio Relevance

ratio

1 ○ 1 ○ 1 ○ 1 × 0

2 ○ 1 × 0.5 ○ 1 × 0

3 ○ 1 ○ 0.67

○ 1 × 0

4 ○ 1 × 0.5 ○ 1 × 0

5 ○ 1 ○ 0.6 ○ 1 × 0

6 × 0.83

○ 0.67

○ 1 × 0

7 ○ 0.86

× 0.57

○ 1 ○ 0.14

8 × 0.75

○ 0.63

○ 1 × 0.13

9 ○ 0.78

× 0.56

○ 1 ○ 0.22

10 - - × 0.5 ○ 1 ○ 0.3

Example1 Example2

41Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion and Future work

We developed component search engine SPARS-JUsing SPARS-J, retrieval of components used well is enabled easily.

Future workMorphological analysis of Index keywordCollaborative filteringInvestigate best ranking method

The value of weightAggregation ranks

Evaluation of SPARS-JUsability

42Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

End

43Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component graph

A B

C

ED

F

G

IH

System X System Y

componentuse relation

44Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Weight of nodes

A B

C

ED

F

G

IH

System X System Y

sum of all node weights = 1 ... (1)weight of node represents significance of node

0.10.1

0.2

0.1 0.1

0.1

0.2

0.050.05

1 w(x) 0

45Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Weights of edges

A0.2

0.05

0.05

0.05

0.05

B

0.2

0.05

0.15

0.4

d=1/4

d=1/4d=1/4d=1/4

d: distribution ratio• Node weight is distributed to each outgoing edge• Edge weights are collected at the destination node

sum of all outgoing edge weights = origin node weight ... (2)sum of all incoming edge weights = destination node weight ... (3)

46Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definition of weightsUnder constraints (1)~(3), we have a simultaneous equation

)(

)()(

2

1

nvw

vwvw

)(

)()(

2

1

nvw

vwvw

t

ddd

dddddd

nnnn

n

n

21

22221

11211

= .

Dt: transposed matrix of distribution ratios

W: node weight vectorThis simultaneous equation can be solved by propagating node weight through edges in the graph

47Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Pseudo use relation

A B C

• Weight computation does not always converge

• Add a pseudo edge from a node to another, if there is no 'real' edge

• Distribution ratios: pseudo edges << real edges

48Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Markov model

• Component rank model can be considered as a Markov Chain of user's focus

• User's focus moves from one component to another along a use relation at a fixed time duration

• Node weight represents the existence probability of the user's focus at infinite future

0.01

0.02 0.01

0.030.05

0.001 0.1

49Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Related WorksMarkov models of documentation traversal

Influence Weight: impact factor of journal publication thought incoming referencesPage Rank: weight of HTML in the Internet through incoming web links

Explicit use relationsNo clustering (important for software products)

Measurement reusability of components or interfaces

Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components

50Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

部品群グラフをもとにした繰り返し計算計算手順1. 各頂点に適当な重みを与える

– 重みの総和は 1 2. 各有向辺の重みを求める

– 頂点の重みを,出ていく辺で分配する3. 各頂点の重みを再計算

– 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定義する4. 頂点の重みが収束するまで, 2.3. を繰り返し計算する5. 収束した頂点の重みを,その頂点に対応する部品群の CR 値とする

– 部品の評価値は属する部品群の CR 値とする

C10.334

C20.333

C30.333

C10.334

C20.333

C30.333

v1×50%

v1×50%v2×100%v3×100%

C1 C2

C3

0.167

0.1670.3330.333

C10.333

C20.167

C30.500

C1 C2

C3

0.1665

0.16650.1670.500

C10.500

C20.1665

C30.3335

C10.400

C20.200

C30.400

0.200

0.2000.2000.400

CR 値の計算

Recommended