Upload
ctadogzero
View
214
Download
0
Embed Size (px)
Citation preview
7/27/2019 Ieee Heroes Oss Sw
http://slidepdf.com/reader/full/ieee-heroes-oss-sw 1/5
Heroes in FLOSS projects: an explorative study
Filippo Ricca
DISI, Universit a di Genova
16146 Genova, Italy [email protected]
Alessandro Marchetto
Fondazione Bruno Kessler–IRST
38123, Trento, [email protected]
Abstract —It is well recognized that the presence of Heroes,i.e., tireless developers who are the only ones who know certaincritical parts of a system, can increase the risk of projectfailure, especially if these developers decide to leave the project.Instead, the relationship between Heroes and maintenancetasks is unknown because little investigated so far.
In this paper, we first have implemented a tool to identifyHeroes. Then, we have conducted an explorative study with 37randomly selected open source projects to discover existing
relationships between the presence of Heroes and the timerequired to implement change requests. Preliminary resultsshow that: (i) Heroes are common in FLOSS projects and (ii)their presence seems to be beneficial because reduce the timeto implement change requests.
Keywords-Heroes, Experimentation, Code Ownership
I. I NT ROD UC TI ON
Human factors (e.g., geographical distribution of teams
and people skills) strongly impact on the overall quality
of software systems and on the systems’ maintenance and
evolution. Hence, emergent studies (e.g., [4], [7]) evaluate
their relevance and discuss how to successfully manage
them. One of such factors playing an important role in
the system maintenance and evolution is the intentional
management of the code ownership [5]. Connected with
code ownership there is the concept of Truck factor.
The Truck factor (TF ) of a project is defined as “the
number of developers on a team who have to be hit with
a truck before the project is in serious trouble” [1]. Clearly,
“be hit by a truck” is a nice extremism to give the idea.
That sentence can be substituted with more realistic ones,
e.g., fall ill, leave the company for another or take a vacation.
Ideally, to avoid potential problems, as advocated by the XP
principle of “collective code ownership” [2], the Truck factor
of a project should be high. Indeed, if all the knowledge of
a system is only in the hands of few developers and theydecide to leave the project, then the same project could “fall
in trouble”. Such indefatigable/tireless developers are often
called “Heroes” given that they manage large and critical
portions of a system [1].
Nevertheless the potential relevance for managers, no
tools exist, at least in the best of our knowledge, implement-
ing algorithms devoted to discover Heroes. In this paper, we
started filling this gap implementing a tool that analyzes the
code repository of a project to identify Heroes. Then, we
used our tool to conduct an explorative study analyzing a
large set of free and open source applications (i.e., FLOSS)
with the purpose of studying the presence of Heroes and
their connection with the time required for maintenance tasks
(e.g., time required to fix bugs). Results indicate that Heroes
are quite common in FLOSS projects and that their presence
strongly impacts the time required to implement change
requests (CRs). In the future, we will further investigate
these aspects to propose guidelines to managers and team-
leaders, thus improving the quality of software projects.
The paper is organized as follows. Section II introduces
relevant notions and summarizes the context of this work.
Section III presents the experiment we conducted while
Section IV summarizes and discusses the relevant results
we obtained. Finally, Section V concludes the paper.
II . CODE OWNERSHIP AND H EROES
Non-ownership [5] for the code of a software system
exists when the system and its sub-parts have no specific
owners (i.e., developers responsible to implement and main-
tain the code), and thus every developer can potentially
change each part of the system. Even if several projects
apply such a non-ownership model, it is well-recognized
(e.g., [4] and [5]) that non-ownership can lead to: (i) poor
or missing documentation, (ii) unreadable source code due
to mixtures of styles and inconsistencies, and (iii) long
cycles of bug fixing and code maintenance. To face such
problems, often, managers try to intentionally control the
code ownership. Nordberg III [5] identifies and discusses
four typical models of code ownership frequently applied
in the software development: (1) product specialist, (2)
subsystem ownership, (3) chief architect and (4) collective
ownership. In (1) and (2), a developer is respectively the
owner of the whole system or of a sub-part of it. In (3)there is a developer in charge of the system and helped
by several assistants. In (4) the code is owned by several
developers and a developer that owns a piece of code can
be also a contributor for another piece by collaborating with
another developer. The choice of the ownership model to use
is currently a source of debate for the scientific community.
To monitor, analyze and thus change the ownership policy
during the software life-cycle, adequate information and
tools are needed. The major source of information for
2010 17th Working Conference on Reverse Engineering
1095-1350/10 $26.00 © 2010 IEEE
DOI 10.1109/WCRE.2010.25
155
7/27/2019 Ieee Heroes Oss Sw
http://slidepdf.com/reader/full/ieee-heroes-oss-sw 2/5
measuring and evaluating code ownership is, often, the code
repository and the activities performed by developers on
the code that have been tracked by the repository in log
files (e.g., SVN, CVS). Different approaches have been
proposed to measure the code ownership by analyzing these
repositories. For instance, Weyuker et al. [7] propose to
compute the ownership at file level while Girba et al. [3] propose a measure based on the percentage of source code
lines modified by a specific developer.
A developer can play different roles in a software project
with respect to the activities performed on the code, e.g.,
he/she can be an owner (the responsible of the code), an
Hero (a developer that exclusively manages a wide portion
of code) or a contributor (a developer that gives some
contribution to code implementation and maintenance) or a
simple committer (an occasional contributor). For example,
let us consider Jfreechart1, a software library composed
of 1063 files. Analyzing the code repository one can dis-
cover that Jfreechart is maintained by three developers (id:
mungady, taqua and nenry). Furthermore, by the analysisof the activities performed on the Jfreechart code one can
also observe that: (i) mungady modified 1063 files, nenry
17 and taqua only 2; and (ii) mungady exclusively modified
1046 files (i.e., 98% of the Jfreechart code). This clearly
implies that mungady is an Hero for Jfreechart while the
other developers are just contributors for the project.
In this work, we propose the first empirical investigation
that studies the distribution of Heroes in a large set of
FLOSS systems and analyzes the connection between such
a distribution and the maintenance time. To perform our
experiment: (i) we considered committers on code repository
as contributors assuming that they tend to be reasonably
indistinguishable; (ii) we assumed that code ownership can be measured by considering code changes at file level, and
(iii) we focused our analysis on developers that are Heroes
of projects.
III . EMPIRICAL S TUDY
After having developed our tool (it uses SVN client com-
mands such as “svn list” and “svn log” to extract information
from code repositories), we conducted an experimental study
using 37 randomly selected open source projects taken from
SourceForge2 and GoogleCode3. The goal of this study is to
analyze a set of FLOSS projects (the objects of study) with
the purpose of discover Heroes, and of understand if and
how much such factor is related to the time to close CRs.
The perspective is of researchers who want to understand
if Heroes are common in FLOSS projects and if they are
beneficial during the maintenance process.
1http://www.jfree.org/jfreechart2http://sourceforge.net/3http://code.google.com/intl/it-IT/
A. Research questions
This experimentation, conducted and described according
to the guidelines proposed by Wohlin et al. [8], aims at
answering the following two research questions.
RQ1: Are Heroes common in FLOSS projects and how
much is the percentage of files they exclusively manage?
This research question deals with the ability of the approachto recover Heroes in FLOSS projects and determine the
percentage of files and lines of code (LOCs) they handle. To
answer this question, we have to precisely define the concept
of Hero. For us, an Hero is a committer that exclusively
manage (in the implementation of our tool this is translated
in exclusively edit/commit ) a number of files ≥ α% of the
total of the code repository. In this preliminary work we
decided to set α = 10 and α = 20. The rationale behind
this choice is that we intend to factor out committers that
exclusively own a small portion of the system (for example
1%). These two values have been chosen because it seems
reasonable to us, in first approximation, to consider as Hero
a committer that own, at least, 60, 120 files in a medium project of 600 files. We consider two thresholds (α = 10
and α = 20) to differentiate our analysis with respect
to the definition of Hero. Finally, the percentage of files
(and LOCs) managed by all the Heroes is simply computed
summing the individual percentages of owned files (and
LOCs) for each Hero.
RQ2: Are Heroes faster (or slower) than non-Heroes
to close CRs? This research question investigates if the
presence of Heroes is beneficial (or not) with respect to the
time to close CRs. In other words, we are interested to know
if Heroes are faster (or not) than non-Heroes to close CRs.
If they are faster, we can deduce that their presence in a
project is beneficial because they reduce the time to close
CRs. From one hand, given the large portion of handled
code, Heroes could be very busy and overloaded of CRs
and for this reason always in delay. From the other hand,
Heroes know very well the owned code, often written by
themselves, and they are able to quickly modify it. Moreover,
given their intrinsic nature, they often are very active and
motivated. For these reasons, it could happen that even if
they are burden of requests they are able to close CRs very
well and on time.
B. Analysis method
To answer RQ1, we first computed for each project the
number of Heroes and the percentage of files handled by
Heros (for both α = 10 and α = 20). Second, we applied
the proportion test to verify whether the obtained proportion
of projects with at least an Hero vs. projects without Heroes
is due to chance or not. Third, we computed the mean of
the files exclusively managed by Heroes for the considered
projects and the corresponding 95% confidence interval (still
using the proportion test). With the 95% confidence interval,
we are 95 percent certain that the mean of the population of
156
7/27/2019 Ieee Heroes Oss Sw
http://slidepdf.com/reader/full/ieee-heroes-oss-sw 3/5
Repository Software Files LOCs C ommitters P roject Size R evisions
Google choscommerce 69 11016 4 small 224Google closure-compiler 536 123660 4 small 200Sourceforge cppunit 378 22646 6 small 582Google cspoker 208 144111 4 small 1382Google easycooking-fantastici4 141 15460 4 small 203Sourceforge erlide 820 134753 5 small 3155Google gdata 344 46805 19 big 982Google gmapcatcher 77 7164 6 small 842Google googlemock 55 56410 4 small 300Sourceforge gtk-gnutella 723 248697 15 big 17424Google h2database 722 125944 6 small 2628Sourceforge htmlunit 3404 548666 10 small 5743Sourceforge httpunit 354 39232 3 small 1062Sourceforge jfreechart 1063 149727 3 small 2272Google jpcsp 672 355728 20 big 1504Google jstestdriver 438 32082 9 small 603Sourceforge jtidy 174 30487 5 small 115Google keycar 669 83936 4 small 465Sourceforge mantisbt 641 3399759 38 big 5752Google mobileterminal 43 8095 12 big 364Google moofloor 39 10152 4 small 169Google nettiers 267 108159 13 big 837Google pagespeed 297 39069 6 small 845Sourceforge phpwiki 537 118815 15 big 7370Sourceforge remotes 289 35646 3 small 812Google sqlitepersistentobjects 170 17727 8 small 138Google testability 126 7282 4 small 151Google toolbox 309 70805 4 small 344Sourceforge tora 703 198512 17 big 3523Google torrentpier 366 72781 8 small 447Google unladenswallow 2815 863242 13 big 1158Google v8 982 324042 32 big 4556Sourceforge winmerge 862 172366 16 big 7149Sourceforge xcat 156 145106 8 small 6257Sourceforge zk1 1552 217337 17 big 14151Google zscreen 627 98181 10 small 1685Google zxing 1119 81501 19 big 1393
Table IR ANDOMLY SELECTED APPLICATIONS. FROM LEFT TO RIGHT: REPOSITORY, SELECTED APPLICATION, NUMBER OF FILES , LINES O F C OD E, NUMBER
OF COMMITTERS, DIMENSION OF THE PROJECT ( BIG IF > 10 COMMITTERS) AND NUMBER OF COMMITS .
projects (i.e., the mean of all the FLOSS projects) is in some
range of values. Finally, we computed the 2x2 contingency
Table (rows: small and big projects, columns: at least one
Hero and without Heroes) and applied the Fisher’s test to
see if it exists a difference in terms of presence of Heroes
between small and big projects. We considered projects with
number of developers ≤ 10 as small and > 10 as big4.
To answer RQ2, we first restricted the data set to the
projects belonging to Sourceforge. We did it because the
projects in Google are too recent and thus they do not
contain a sufficient number of CRs for our analysis. Second,for each project and for each closed CR contained in the
project bug tracker, we computed the time needed to close
it. We did it by observing the opening and the closing
time reported in the bug tracker for each bug. Third, we
considered as outliers and removed the CRs with times > 4
years (overall, only 7 data points were deleted). Finally, we
tried to understand if Heroes are faster than non-Heroes in
4http://siddhi.blogspot.com/2005/06/truck-factor.html
two different ways: globally and by project. In the first way,
we divided all the collected times to close CRs in two sets
depending on who closed the CR (Heroes and non-Heroes),
without further dividing them by project. Then, we applied
to the two sets the non parametric Mann-Whitney test. In
the second way, we did the same but considering the times
to close CRs divided by project. We used the Mann-Whitney
test because it is very robust and sensitive. Moreover, given
the research question, we used a two-tailed statistical test.
IV. R ESULTSTable I details some information about the 37 analyzed
FLOSS projects. The Table reports: code repository (Source-
Forge or GoogleCode), application’s name, number of file,
lines of code5, number of committers of the project, esti-
mated dimension (small or big) and total number of analyzed
system revisions (i.e., commits). In the analyzed sample 24
projects are classified as small and 13 as big.
5Documentation and external libraries are not considered
157
7/27/2019 Ieee Heroes Oss Sw
http://slidepdf.com/reader/full/ieee-heroes-oss-sw 4/5
Software Heroes %Files %LOCs
choscommerce 1 43.48 33.85closure-compiler 1 28.92 13.76cppunit 1 66.40 66.83cspoker 1 54.81 27.64easycooking-fantastici4 1 26.95 4.97erlide 1 34.15 22.68gdata 3 61.63 44.56
gmapcatcher 1 76.62 76.47googlemock 2 30.91 3.01gtk-gnutella 2 27.66 9.25h2database 1 91.97 78.15htmlunit 3 57.17 51.37httpunit 1 13.84 8.51
jfreechart 1 98.40 96.96 jpscp 2 67.11 82.03 jstestdriver 2 30.59 38.29 jtidy 2 77.01 41.19keycar 1 84.60 91.25mobileterminal 3 44.19 5.3moofloor 1 74.36 77.03nettiers 1 97.00 96.87
pagespeed 1 51.85 61.21 phpwiki 1 17.13 11.24remotes 1 88.93 77.27
sqlitepersistentobjects 1 77.06 84.26testability 1 19.84 9.39toolbox 2 49.84 63.79torrentpier 1 43.72 24.71unladenswallow 1 88.03 77.04V8 1 11.41 2.15winmerge 1 32.95 31.19xcat 3 68.59 7.56zk1 1 50.26 26.36zscreen 1 51.67 47.88zxing 3 62.56 48.57
Table IIHEROES FOUND SETTING α = 10.
A. RQ1: Are Heroes common in FLOSS projects and how
much is the percentage of files they exclusively manage?
Table II lists the number of discovered Heroes setting
α = 10 and the percentage of files and LOCs exclusively
owned/managed by Heroes.
From the data, Heroes seem to be common in the con-
sidered projects. In 35 projects out of 37 we have at least
one Hero and in some cases we have two or more Heroes
(e.g., htmlunit). Only the projects mantisbt and tora
have zero Heroes (they are not shown in Table II). The
mean is 1.46, the median is 1 and, the proportion test6
is significant ( p − value < 0.001). Moreover, we can see
from Table II that the discovered Heroes exclusively managea considerable percentage of files. In average, considering
also mantisbt and tora, 51.39% (confidence interval at
95% ranging from 44.57% to 60.54%). Given that the trend
observed with α = 10 is substantially verified also with
α = 20 we can answer in affirmative way at the first part
of RQ1. Precisely, with α = 20 we have that 27 projects
out of 37 have at least one Hero and the proportion test is
6In R: prop.test(35, 37)
still significant ( p−value = 0.008). Instead, the percentage
of files managed by Heroes decrease to 43.04% (confidence
interval at 95% ranging from 32.02% to 54.06%).
Given these results, we can conclude that Heroes are com-
mon in FLOSS projects and that they exclusively manage
a conspicuous proportion of files. Noting that this hold true
both for big and small projects (Fisher-test: p−value = 0
.11
with α = 10 and p − value = 0.12 with α = 20).
B. RQ2: Are Heroes faster (or slower) than non-Heroes to
close CRs?
To get an idea about RQ2 we divided all the collected
times to close CRs in two sets according to the type of
committer (Hero or non-Hero) who close them, without
further dividing them by project. In this way, we obtained
4727 times for the group of Heroes (median=144.5 h)
and 3812 for the group of non-Heroes (median=236.2 h).
Applying the Mann-Whitney test we found a significant
difference between the two sets ( p − value < 0.001). This
means that considering all the data together: (i) Heroes arefaster (+63.46%7) to close change requests than non-Heroes
and, (ii) the observed difference between the two sets is not
due to chance.
The boxplots shown in Figure 1 present the times to
close CRs, divided by project, for Heroes in red (light gray
in b/w screen) and non-Heroes in white. Each data point
corresponds to the time spent in hours, from a developer, to
close a CR. Names of the project and p-values, computed
with Mann-Whitney test, are reported at the bottom of the
figure.
In Figure 1 we have that: in 8 cases out of 12 the median
in the Heroes set is less than the median in the non-Heroes
set; this means that, in these 8 projects, Heroes are faster than non-Heroes to close CRs. For 5 projects (out of these 8)
this result is statistically significant (erlide, htmlunit,
httpunit, phpwiki and zk1). On the contrary, we have
2 cases where the difference is significant in the opposite di-
rection (jtidy and winmerge). With these data, although
not in definitive way, we can answer in affirmative way at
RQ2.
C. Threats to validity
The explorative nature of this study implies several threats
to validity that will be considered only in the future. First
of all, the generalization of the results is clearly problematic
because the dataset is small. However, the projects have been
randomly chosen and they are real-world projects taken by
two source repositories. Second, the simplified way applied
to infer the file ownership and the level of granularity chosen
(i.e., the file level) could have conditioned the obtained
results. The same can be said for our measurements and
chosen thresholds (e.g., α in the definition of Hero). Finally,
7This percentage come from the equation: 144.5h +144.5h*x% = 236.2h
158
7/27/2019 Ieee Heroes Oss Sw
http://slidepdf.com/reader/full/ieee-heroes-oss-sw 5/5
Figure 1. Boxplots of time required to close CRs divided by project
regarding the possibility of replicating the study, we have:
(i) explained the used algorithm to discover Heroes, (ii)
detailed the process and, (iii) listed the considered open
source systems.
V. CONCLUSION
We have introduced a tool for finding Heroes in software
projects. Our tool has been applied to 37 randomly chosen
FLOSS projects with the aim of answering two research
questions. Although still preliminary, the results are encour-aging. We discovered that:
- Heroes are common in FLOSS projects and they
exclusively manage a conspicuous proportion of files and
code. This is positive because being the knowledge concen-
trated in the hands of few and well-motivated developers
the quality of code should be better (e.g., the code should
be deprived of a mixture of different styles) [6]. On the
contrary, this could bring to poor or missing documentation
(given that it is not essential for exchanging information)
and to make fall in trouble the project if Heroes decide to
leave.
- Heroes are faster than non-Heroes to close CRs. This
outcome, if confirmed also in industrial contexts, could bean important message for managers and project leaders. The
presence of Heroes in a project is beneficial because reduce
the time to close CRs. Indeed, from our experiment, Heroes
are faster than non-Heroes by 63.46%.
Future works will be devoted to continue the empirical
validation presented in this work.
R EFERENCES
[1] Truck factor definition, http://www.agileadvice.com/archives/2005-/05/truck factor.html. Technical report, 2010.
[2] K. Beck. Extreme Programming Explained: Embrace Change.Addison-Wesley, 1999.
[3] T. Girba, A. Kuhn, M. Seeberger, and S. Ducasse. Howdevelopers drive software evolution. In International Workshopon Principles of Software Evolution, pages 113–122, 2005.
[4] N. Nagappan, B. Murphy, and V. Basili. The influence of
organizational structure on software quality: an empirical study.In International Conference on Software Engineering (ICSE), pages 521–530. IEEE, 2008.
[5] M. Nordberg III. Managing code ownership. IEEE Software, pages 26–33, 2003.
[6] M. Pinzger, N. Nagappan, and B. Murphy. Can developer-module networks predict failures? In International Symposiumon Foundations of software engineering (FSE), pages 2–12.ACM, 2008.
[7] E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Do toomany cooks spoil the broth? using the number of developersto enhance defect prediction models. Empirical Software
Engineering , 13(5):539–559, 2008.
[8] C. Wohlin, P. Runeson, M. Host, M. Ohlsson, B. Regnell, andA. Wesslen. Experimentation in Software Engineering - An Introduction. Kluwer, 2000.
159