Ieee Heroes Oss Sw

7/27/2019 Ieee Heroes Oss Sw

http://slidepdf.com/reader/full/ieee-heroes-oss-sw 1/5

Heroes in FLOSS projects: an explorative study

Filippo Ricca

DISI, Universit a di Genova

16146 Genova, Italy [email protected]

Alessandro Marchetto

Fondazione Bruno Kessler–IRST

38123, Trento, [email protected]

Abstract —It is well recognized that the presence of Heroes,i.e., tireless developers who are the only ones who know certaincritical parts of a system, can increase the risk of projectfailure, especially if these developers decide to leave the project.Instead, the relationship between Heroes and maintenancetasks is unknown because little investigated so far.

In this paper, we first have implemented a tool to identifyHeroes. Then, we have conducted an explorative study with 37randomly selected open source projects to discover existing

relationships between the presence of Heroes and the timerequired to implement change requests. Preliminary resultsshow that: (i) Heroes are common in FLOSS projects and (ii)their presence seems to be beneficial because reduce the timeto implement change requests.

Keywords-Heroes, Experimentation, Code Ownership

I. I NT ROD UC TI ON

Human factors (e.g., geographical distribution of teams

and people skills) strongly impact on the overall quality

of software systems and on the systems’ maintenance and

evolution. Hence, emergent studies (e.g., [4], [7]) evaluate

their relevance and discuss how to successfully manage

them. One of such factors playing an important role in

the system maintenance and evolution is the intentional

management of the code ownership [5]. Connected with

code ownership there is the concept of Truck factor.

The Truck factor (TF ) of a project is defined as “the

number of developers on a team who have to be hit with

a truck before the project is in serious trouble” [1]. Clearly,

“be hit by a truck” is a nice extremism to give the idea.

That sentence can be substituted with more realistic ones,

e.g., fall ill, leave the company for another or take a vacation.

Ideally, to avoid potential problems, as advocated by the XP

principle of “collective code ownership” [2], the Truck factor

of a project should be high. Indeed, if all the knowledge of

a system is only in the hands of few developers and theydecide to leave the project, then the same project could “fall

in trouble”. Such indefatigable/tireless developers are often

called “Heroes” given that they manage large and critical

portions of a system [1].

Nevertheless the potential relevance for managers, no

tools exist, at least in the best of our knowledge, implement-

ing algorithms devoted to discover Heroes. In this paper, we

started filling this gap implementing a tool that analyzes the

code repository of a project to identify Heroes. Then, we

used our tool to conduct an explorative study analyzing a

large set of free and open source applications (i.e., FLOSS)

with the purpose of studying the presence of Heroes and

their connection with the time required for maintenance tasks

(e.g., time required to fix bugs). Results indicate that Heroes

are quite common in FLOSS projects and that their presence

strongly impacts the time required to implement change

requests (CRs). In the future, we will further investigate

these aspects to propose guidelines to managers and team-

leaders, thus improving the quality of software projects.

The paper is organized as follows. Section II introduces

relevant notions and summarizes the context of this work.

Section III presents the experiment we conducted while

Section IV summarizes and discusses the relevant results

we obtained. Finally, Section V concludes the paper.

II . CODE OWNERSHIP AND H EROES

Non-ownership [5] for the code of a software system

exists when the system and its sub-parts have no specific

owners (i.e., developers responsible to implement and main-

tain the code), and thus every developer can potentially

change each part of the system. Even if several projects

apply such a non-ownership model, it is well-recognized

(e.g., [4] and [5]) that non-ownership can lead to: (i) poor

or missing documentation, (ii) unreadable source code due

to mixtures of styles and inconsistencies, and (iii) long

cycles of bug fixing and code maintenance. To face such

problems, often, managers try to intentionally control the

code ownership. Nordberg III [5] identifies and discusses

four typical models of code ownership frequently applied

in the software development: (1) product specialist, (2)

subsystem ownership, (3) chief architect and (4) collective

ownership. In (1) and (2), a developer is respectively the

owner of the whole system or of a sub-part of it. In (3)there is a developer in charge of the system and helped

by several assistants. In (4) the code is owned by several

developers and a developer that owns a piece of code can

be also a contributor for another piece by collaborating with

another developer. The choice of the ownership model to use

is currently a source of debate for the scientific community.

To monitor, analyze and thus change the ownership policy

during the software life-cycle, adequate information and

tools are needed. The major source of information for

2010 17th Working Conference on Reverse Engineering

1095-1350/10 $26.00 © 2010 IEEE

DOI 10.1109/WCRE.2010.25

155



measuring and evaluating code ownership is, often, the code

repository and the activities performed by developers on

the code that have been tracked by the repository in log

files (e.g., SVN, CVS). Different approaches have been

proposed to measure the code ownership by analyzing these

repositories. For instance, Weyuker et al. [7] propose to

compute the ownership at file level while Girba et al. [3] propose a measure based on the percentage of source code

lines modified by a specific developer.

A developer can play different roles in a software project

with respect to the activities performed on the code, e.g.,

he/she can be an owner (the responsible of the code), an

Hero (a developer that exclusively manages a wide portion

of code) or a contributor (a developer that gives some

contribution to code implementation and maintenance) or a

simple committer (an occasional contributor). For example,

let us consider Jfreechart1, a software library composed

of 1063 files. Analyzing the code repository one can dis-

cover that Jfreechart is maintained by three developers (id:

mungady, taqua and nenry). Furthermore, by the analysisof the activities performed on the Jfreechart code one can

also observe that: (i) mungady modified 1063 files, nenry

17 and taqua only 2; and (ii) mungady exclusively modified

1046 files (i.e., 98% of the Jfreechart code). This clearly

implies that mungady is an Hero for Jfreechart while the

other developers are just contributors for the project.

In this work, we propose the first empirical investigation

that studies the distribution of Heroes in a large set of

FLOSS systems and analyzes the connection between such

a distribution and the maintenance time. To perform our

experiment: (i) we considered committers on code repository

as contributors assuming that they tend to be reasonably

indistinguishable; (ii) we assumed that code ownership can be measured by considering code changes at file level, and

(iii) we focused our analysis on developers that are Heroes

of projects.

III . EMPIRICAL S TUDY

After having developed our tool (it uses SVN client com-

mands such as “svn list” and “svn log” to extract information

from code repositories), we conducted an experimental study

using 37 randomly selected open source projects taken from

SourceForge2 and GoogleCode3. The goal of this study is to

analyze a set of FLOSS projects (the objects of study) with

the purpose of discover Heroes, and of understand if and

how much such factor is related to the time to close CRs.

The perspective is of researchers who want to understand

if Heroes are common in FLOSS projects and if they are

beneficial during the maintenance process.

1http://www.jfree.org/jfreechart2http://sourceforge.net/3http://code.google.com/intl/it-IT/

A. Research questions

This experimentation, conducted and described according

to the guidelines proposed by Wohlin et al. [8], aims at

answering the following two research questions.

RQ1: Are Heroes common in FLOSS projects and how

much is the percentage of files they exclusively manage?

This research question deals with the ability of the approachto recover Heroes in FLOSS projects and determine the

percentage of files and lines of code (LOCs) they handle. To

answer this question, we have to precisely define the concept

of Hero. For us, an Hero is a committer that exclusively

manage (in the implementation of our tool this is translated

in exclusively edit/commit ) a number of files ≥ α% of the

total of the code repository. In this preliminary work we

decided to set α = 10 and α = 20. The rationale behind

this choice is that we intend to factor out committers that

exclusively own a small portion of the system (for example

1%). These two values have been chosen because it seems

reasonable to us, in first approximation, to consider as Hero

a committer that own, at least, 60, 120 files in a medium project of 600 files. We consider two thresholds (α = 10

and α = 20) to differentiate our analysis with respect

to the definition of Hero. Finally, the percentage of files

(and LOCs) managed by all the Heroes is simply computed

summing the individual percentages of owned files (and

LOCs) for each Hero.

RQ2: Are Heroes faster (or slower) than non-Heroes

to close CRs? This research question investigates if the

presence of Heroes is beneficial (or not) with respect to the

time to close CRs. In other words, we are interested to know

if Heroes are faster (or not) than non-Heroes to close CRs.

If they are faster, we can deduce that their presence in a

project is beneficial because they reduce the time to close

CRs. From one hand, given the large portion of handled

code, Heroes could be very busy and overloaded of CRs

and for this reason always in delay. From the other hand,

Heroes know very well the owned code, often written by

themselves, and they are able to quickly modify it. Moreover,

given their intrinsic nature, they often are very active and

motivated. For these reasons, it could happen that even if

they are burden of requests they are able to close CRs very

well and on time.

B. Analysis method

To answer RQ1, we first computed for each project the

number of Heroes and the percentage of files handled by

Heros (for both α = 10 and α = 20). Second, we applied

the proportion test to verify whether the obtained proportion

of projects with at least an Hero vs. projects without Heroes

is due to chance or not. Third, we computed the mean of

the files exclusively managed by Heroes for the considered

projects and the corresponding 95% confidence interval (still

using the proportion test). With the 95% confidence interval,

we are 95 percent certain that the mean of the population of

156



Repository Software Files LOCs C ommitters P roject Size R evisions

Google choscommerce 69 11016 4 small 224Google closure-compiler 536 123660 4 small 200Sourceforge cppunit 378 22646 6 small 582Google cspoker 208 144111 4 small 1382Google easycooking-fantastici4 141 15460 4 small 203Sourceforge erlide 820 134753 5 small 3155Google gdata 344 46805 19 big 982Google gmapcatcher 77 7164 6 small 842Google googlemock 55 56410 4 small 300Sourceforge gtk-gnutella 723 248697 15 big 17424Google h2database 722 125944 6 small 2628Sourceforge htmlunit 3404 548666 10 small 5743Sourceforge httpunit 354 39232 3 small 1062Sourceforge jfreechart 1063 149727 3 small 2272Google jpcsp 672 355728 20 big 1504Google jstestdriver 438 32082 9 small 603Sourceforge jtidy 174 30487 5 small 115Google keycar 669 83936 4 small 465Sourceforge mantisbt 641 3399759 38 big 5752Google mobileterminal 43 8095 12 big 364Google moofloor 39 10152 4 small 169Google nettiers 267 108159 13 big 837Google pagespeed 297 39069 6 small 845Sourceforge phpwiki 537 118815 15 big 7370Sourceforge remotes 289 35646 3 small 812Google sqlitepersistentobjects 170 17727 8 small 138Google testability 126 7282 4 small 151Google toolbox 309 70805 4 small 344Sourceforge tora 703 198512 17 big 3523Google torrentpier 366 72781 8 small 447Google unladenswallow 2815 863242 13 big 1158Google v8 982 324042 32 big 4556Sourceforge winmerge 862 172366 16 big 7149Sourceforge xcat 156 145106 8 small 6257Sourceforge zk1 1552 217337 17 big 14151Google zscreen 627 98181 10 small 1685Google zxing 1119 81501 19 big 1393

Table IR ANDOMLY SELECTED APPLICATIONS. FROM LEFT TO RIGHT: REPOSITORY, SELECTED APPLICATION, NUMBER OF FILES , LINES O F C OD E, NUMBER

OF COMMITTERS, DIMENSION OF THE PROJECT ( BIG IF > 10 COMMITTERS) AND NUMBER OF COMMITS .

projects (i.e., the mean of all the FLOSS projects) is in some

range of values. Finally, we computed the 2x2 contingency

Table (rows: small and big projects, columns: at least one

Hero and without Heroes) and applied the Fisher’s test to

see if it exists a difference in terms of presence of Heroes

between small and big projects. We considered projects with

number of developers ≤ 10 as small and > 10 as big4.

To answer RQ2, we first restricted the data set to the

projects belonging to Sourceforge. We did it because the

projects in Google are too recent and thus they do not

contain a sufficient number of CRs for our analysis. Second,for each project and for each closed CR contained in the

project bug tracker, we computed the time needed to close

it. We did it by observing the opening and the closing

time reported in the bug tracker for each bug. Third, we

considered as outliers and removed the CRs with times > 4

years (overall, only 7 data points were deleted). Finally, we

tried to understand if Heroes are faster than non-Heroes in

4http://siddhi.blogspot.com/2005/06/truck-factor.html

two different ways: globally and by project. In the first way,

we divided all the collected times to close CRs in two sets

depending on who closed the CR (Heroes and non-Heroes),

without further dividing them by project. Then, we applied

to the two sets the non parametric Mann-Whitney test. In

the second way, we did the same but considering the times

to close CRs divided by project. We used the Mann-Whitney

test because it is very robust and sensitive. Moreover, given

the research question, we used a two-tailed statistical test.

IV. R ESULTSTable I details some information about the 37 analyzed

FLOSS projects. The Table reports: code repository (Source-

Forge or GoogleCode), application’s name, number of file,

lines of code5, number of committers of the project, esti-

mated dimension (small or big) and total number of analyzed

system revisions (i.e., commits). In the analyzed sample 24

projects are classified as small and 13 as big.

5Documentation and external libraries are not considered

157



Software Heroes %Files %LOCs

choscommerce 1 43.48 33.85closure-compiler 1 28.92 13.76cppunit 1 66.40 66.83cspoker 1 54.81 27.64easycooking-fantastici4 1 26.95 4.97erlide 1 34.15 22.68gdata 3 61.63 44.56

gmapcatcher 1 76.62 76.47googlemock 2 30.91 3.01gtk-gnutella 2 27.66 9.25h2database 1 91.97 78.15htmlunit 3 57.17 51.37httpunit 1 13.84 8.51

jfreechart 1 98.40 96.96 jpscp 2 67.11 82.03 jstestdriver 2 30.59 38.29 jtidy 2 77.01 41.19keycar 1 84.60 91.25mobileterminal 3 44.19 5.3moofloor 1 74.36 77.03nettiers 1 97.00 96.87

pagespeed 1 51.85 61.21 phpwiki 1 17.13 11.24remotes 1 88.93 77.27

sqlitepersistentobjects 1 77.06 84.26testability 1 19.84 9.39toolbox 2 49.84 63.79torrentpier 1 43.72 24.71unladenswallow 1 88.03 77.04V8 1 11.41 2.15winmerge 1 32.95 31.19xcat 3 68.59 7.56zk1 1 50.26 26.36zscreen 1 51.67 47.88zxing 3 62.56 48.57

Table IIHEROES FOUND SETTING α = 10.

A. RQ1: Are Heroes common in FLOSS projects and how

much is the percentage of files they exclusively manage?

Table II lists the number of discovered Heroes setting

α = 10 and the percentage of files and LOCs exclusively

owned/managed by Heroes.

From the data, Heroes seem to be common in the con-

sidered projects. In 35 projects out of 37 we have at least

one Hero and in some cases we have two or more Heroes

(e.g., htmlunit). Only the projects mantisbt and tora

have zero Heroes (they are not shown in Table II). The

mean is 1.46, the median is 1 and, the proportion test6

is significant ( p − value < 0.001). Moreover, we can see

from Table II that the discovered Heroes exclusively managea considerable percentage of files. In average, considering

also mantisbt and tora, 51.39% (confidence interval at

95% ranging from 44.57% to 60.54%). Given that the trend

observed with α = 10 is substantially verified also with

α = 20 we can answer in affirmative way at the first part

of RQ1. Precisely, with α = 20 we have that 27 projects

out of 37 have at least one Hero and the proportion test is

6In R: prop.test(35, 37)

still significant ( p−value = 0.008). Instead, the percentage

of files managed by Heroes decrease to 43.04% (confidence

interval at 95% ranging from 32.02% to 54.06%).

Given these results, we can conclude that Heroes are com-

mon in FLOSS projects and that they exclusively manage

a conspicuous proportion of files. Noting that this hold true

both for big and small projects (Fisher-test: p−value = 0

.11

with α = 10 and p − value = 0.12 with α = 20).

B. RQ2: Are Heroes faster (or slower) than non-Heroes to

close CRs?

To get an idea about RQ2 we divided all the collected

times to close CRs in two sets according to the type of

committer (Hero or non-Hero) who close them, without

further dividing them by project. In this way, we obtained

4727 times for the group of Heroes (median=144.5 h)

and 3812 for the group of non-Heroes (median=236.2 h).

Applying the Mann-Whitney test we found a significant

difference between the two sets ( p − value < 0.001). This

means that considering all the data together: (i) Heroes arefaster (+63.46%7) to close change requests than non-Heroes

and, (ii) the observed difference between the two sets is not

due to chance.

The boxplots shown in Figure 1 present the times to

close CRs, divided by project, for Heroes in red (light gray

in b/w screen) and non-Heroes in white. Each data point

corresponds to the time spent in hours, from a developer, to

close a CR. Names of the project and p-values, computed

with Mann-Whitney test, are reported at the bottom of the

figure.

In Figure 1 we have that: in 8 cases out of 12 the median

in the Heroes set is less than the median in the non-Heroes

set; this means that, in these 8 projects, Heroes are faster than non-Heroes to close CRs. For 5 projects (out of these 8)

this result is statistically significant (erlide, htmlunit,

httpunit, phpwiki and zk1). On the contrary, we have

2 cases where the difference is significant in the opposite di-

rection (jtidy and winmerge). With these data, although

not in definitive way, we can answer in affirmative way at

RQ2.

C. Threats to validity

The explorative nature of this study implies several threats

to validity that will be considered only in the future. First

of all, the generalization of the results is clearly problematic

because the dataset is small. However, the projects have been

randomly chosen and they are real-world projects taken by

two source repositories. Second, the simplified way applied

to infer the file ownership and the level of granularity chosen

(i.e., the file level) could have conditioned the obtained

results. The same can be said for our measurements and

chosen thresholds (e.g., α in the definition of Hero). Finally,

7This percentage come from the equation: 144.5h +144.5h*x% = 236.2h

158



Figure 1. Boxplots of time required to close CRs divided by project

regarding the possibility of replicating the study, we have:

(i) explained the used algorithm to discover Heroes, (ii)

detailed the process and, (iii) listed the considered open

source systems.

V. CONCLUSION

We have introduced a tool for finding Heroes in software

projects. Our tool has been applied to 37 randomly chosen

FLOSS projects with the aim of answering two research

questions. Although still preliminary, the results are encour-aging. We discovered that:

- Heroes are common in FLOSS projects and they

exclusively manage a conspicuous proportion of files and

code. This is positive because being the knowledge concen-

trated in the hands of few and well-motivated developers

the quality of code should be better (e.g., the code should

be deprived of a mixture of different styles) [6]. On the

contrary, this could bring to poor or missing documentation

(given that it is not essential for exchanging information)

and to make fall in trouble the project if Heroes decide to

leave.

- Heroes are faster than non-Heroes to close CRs. This

outcome, if confirmed also in industrial contexts, could bean important message for managers and project leaders. The

presence of Heroes in a project is beneficial because reduce

the time to close CRs. Indeed, from our experiment, Heroes

are faster than non-Heroes by 63.46%.

Future works will be devoted to continue the empirical

validation presented in this work.

R EFERENCES

[1] Truck factor definition, http://www.agileadvice.com/archives/2005-/05/truck factor.html. Technical report, 2010.

[2] K. Beck. Extreme Programming Explained: Embrace Change.Addison-Wesley, 1999.

[3] T. Girba, A. Kuhn, M. Seeberger, and S. Ducasse. Howdevelopers drive software evolution. In International Workshopon Principles of Software Evolution, pages 113–122, 2005.

[4] N. Nagappan, B. Murphy, and V. Basili. The influence of

organizational structure on software quality: an empirical study.In International Conference on Software Engineering (ICSE), pages 521–530. IEEE, 2008.

[5] M. Nordberg III. Managing code ownership. IEEE Software, pages 26–33, 2003.

[6] M. Pinzger, N. Nagappan, and B. Murphy. Can developer-module networks predict failures? In International Symposiumon Foundations of software engineering (FSE), pages 2–12.ACM, 2008.

[7] E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Do toomany cooks spoil the broth? using the number of developersto enhance defect prediction models. Empirical Software

Engineering , 13(5):539–559, 2008.

[8] C. Wohlin, P. Runeson, M. Host, M. Ohlsson, B. Regnell, andA. Wesslen. Experimentation in Software Engineering - An Introduction. Kluwer, 2000.

159

Documents

Ieee Heroes Oss Sw