33
A Framework for Examning Topical Locality in Object-Oriented Software 2012 IEEE International Conference on Computer Software and Applications p76004546 江江江 P76004685 江江江

A Framework for Examning Topical Locality in Object- Oriented Software 2012 IEEE International Conference on Computer Software and Applications p76004546

Embed Size (px)

Citation preview

A Framework for Examing Topical Locality in Object-Oriented Software

A Framework for Examning Topical Locality in Object-Oriented Software2012 IEEE International Conference on Computer Software and Applications

p76004546 P76004685 OUTLINEIntroductionBackground & Related workFrameworkDataset and Experimental ProcedureStatic analysis resultsConclusions

INTRODUCTIONProgram comprehension is a key developer activity during software maintenance.

Topic models : rely on lexical information to identify topics that are semantically related to high-level domain concepts.LSI ( latent semantic indexing )LDA ( latent Dirichlet allocation )codeLatent : Dirichlet :LSI :identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.

3INTRODUCTIONWhile topics reflect semantic relatedness, it is believed that human evolves spatial cognition strategies to navigate the code base.

for object-oriented (OO) systems built on the principle of encapsulation, the entities should be spatially organized in a way that reflects the topics of softwarecode class header class

4INTRODUCTIONthe tenet of topical localityspatial relatedness entails semantic relatednessSo basic that in many cases it is not mentionedWhen the tenet is mentioned, its validity is not measured explicitly.

our goal is to measure the extent to which this key tenet holds for OO systems.propose a framework to examine what extent three relationships of topical locality hold in large-scale open-source projects.

topical locality ()

OO

paperframework, OOopen source project framework topical locality 5BACKGROUND and Related WorkA. Way-finding in Code BaseB. Relating Spatial and Semantic CuesC. Topical Locality Applied in Software Engineering Tools

6BACKGROUND and Related WorkA. Way-finding in Code BaseDeveloper comprehending a code base can therefore be thought of as continually trying to answer way-finding questions.Moonen has examined way-finding in soft-ware and extended the concept of legibility to software.

code call functioncode, function

class name, variable7BACKGROUND and Related WorkB. Relating Spatial and Semantic CuesWe are interested in the interplay of different cues so that they can be effectively synthesized.We focus on the relationship between two types of cues.Spatial. Semantic.Spatial + Semantic = topical localitythe software entities should be neither randomly named nor randomly placed.Source code entities should be spatially organized to reflect the semantics of software.Semantic Cues class name , variable namepaper code

paper

8BACKGROUND and Related WorkC. Topical Locality Applied in Software Engineering ToolsThe idea of topical locality plays an important role in building a number of software engineering tools.Survey three toolsCode IndexersCode VisualizersCode Summarizerspaper topical locality framework9BACKGROUND and Related WorkCode IndexersAn indexer takes source code and generates profiles of the code for later searchingShould index header comments ?we want to address how well name and header comments represent the target code entitys topic.

10BACKGROUND and Related Work

11BACKGROUND and Related WorkCode VisualizersOnce a relevant code line is located , its surroundings provide valuable contextual information for the developerexamining topical locality of a contiguous fragment allows us to assess to what extent the code line indicates the topic of its surroundings.

BACKGROUND and Related Work

13BACKGROUND and Related WorkCode SummarizersA summarizer generates a snapshot of the source code in order to reduce the cost for developers to read and understand the staggering amount of software repository informationOur contribution is to measure the degree of topical locality of the snapshotBACKGROUND and Related Work

15FRAMEWORKoverviewFramework Overview

FRAMEWORKresearch questionsResearch questions

RQ1 : Which better conveys class bodys topic: class name, header comments, or a combination of both?RQ2 : Can a code line indicate its surroundings topic?RQ3 : Can a contiguous code fragment serve as a snapshot of the entire class?FRAMEWORKmethodindependent variables are concerned with identifying spatial relationshipsdependent variable is about the semantic relatednessThree measures:TFIDF cosine similarityquery term probabilitydocument overlapWe treat source code as documentoutput score in the range [0, 1]FRAMEWORKthree measures (1/3)TFIDF scheme text mining model

= () = () refers to the term frequency of is the inverse document frequency, = 2(+1/), where is the total number of documents in the corpus and is the number of documents in which occurs.

TFIDF: TFTFdIDFtnIDFt19FRAMEWORKthree measures (2/3)Query term probability

measures the likelihood of a term in the query/source being present in the target document.

Q W QW20FRAMEWORKthree measures (3/3)Document overlap

a set-based measure that quantifies the amount of overlap between two documents Q and W

QW21Dataset and Experimental ProcedureLOC : the lines of codeCOM : the lines of commentsCCs : the number of classes

Dataset and Experimental ProcedureUse a source code indexer to process the code base of the selected projects.The indexing process results in the profiles that store partial and important information from the source code.We calculate the three semantic relatedness measures (TFIDF-Cos, Prob and Overlap) based on the profiles.RQ1Can class name (N) and/or header comment (H) convey the topic of class body(B) ?Calculate the lexical similarity for (N,B), (H,B), (NH,B)

RQ2Can a code line indicate the topic of its surroundings?For randomly selected code line(L), we take a contiguous code fragment of 30 lines as its surroundings (S) and select from the same file another 30-line contiguous code fragment(R)Compare the lexical similarity of (L,S) with that of (L,R)Those classes with at least 70 LOC are considered.

RQ3Can a contiguous code fragment serve as a snapshot of entire class?Form a code search perspective, the lexical similarity of the snapshot should indicate the topical closeness of the classesRandomly select a term w(data in Fig.4) to act as query keyword. The snapshot is extracted as 30-line contiguous code fragment.Only consider classes with at least 60 LOC.

Static Analysis ResultsRQ1 : Name vs. HeaderRQ2:Code Line and SurroundingsRQ3: Contiguous Fragment as a SnapshotThreats to ValidityRQ1 : Name vs. HeaderNH is the closet to B in most cases, expect MegaMek when measured by TFIDF, where NB is larger than HB and NHB. => MegaMek classes do not have useful header comments.

RQ1 : Name vs. HeaderLeast Significant Distance(LSD) multiple comparison test: a test places the combinations significantly different from others in separate groups, and allocates the best combination to group A.The result classifies NH-B into group A, indicating that the similarity score of NH-B is significantly higher than N-B and H-B.We conclude that if the class contains useful header comments, then it is important to combine the header comments with the class name in order to convey the topic of the class body.

RQ2:Code Line and SurroundingsA code line indicates the topic of its surroundings more than it indicates the topic of a random code fragment.

RQ3: Contiguous Fragment as a SnapshotWe calculate the Pearson correlation coefficient, which is a parametric statistic that shows the correlation between two variables.From the viewpoint of distinguishing the topics of different classes, a contiguous code fragment can serve as a snapshot of the entire class.

RQ3: Contiguous Fragment as a Snapshot

Threats to ValidityConstruct Validity: the selection of 30-line contiguous, non-empty, and comments-inclusive code fragment for addressing RQ2 and RQ3.Empty lines contribute little to spatial and semantic information. All comments is a choice influenced by RQ1.Internal validity : using three measures derived form different mathematical models diminished the measuring bias.External validity : this analysis may not generalize to other software projects.ConclusionsIn this paper, we contributed a novel experimental framework for testing this tenet of topical locality and applied the framework to provide empirical evidence of topical locality in large-scale OO systems.Our future work includes carrying out more empirical studies to examine other topical locality instances.It is important to integrate the theoretical understandings and empirical findings to enhance the practical tool support for software developers.