Code Ownership and Software Quality: A Replication Study

Michaela Greiler; Kim Herzig; Jacek Czerwonka

Code Ownership and Software Quality: A Replication Study

Michaela Greiler ,
Kim Herzig ,
Jacek Czerwonka

Proceedings of the 12th Working Conference on Mining Software Repositories | May 2015

Published by IEEE

Download BibTex

In a traditional sense, ownership determines rights and duties in regard to an object, for example a property. The owner of source code usually refers to the person that invented the code. However, larger code artifacts, such as files, are usually composed by multiple engineers contributing to the entity over time through a series of changes. Frequently, the person with the highest contribution, e.g. the most number of code changes, is defined as the code owner and takes responsibility for it. Thus, code ownership relates to the knowledge engineers have about code. Lacking responsibility and knowledge about code can reduce code quality. In an earlier study, Bird et al. [1] showed that Windows binaries that lacked clear code ownership were more likely to be defect prone. However recommendations for large artifacts such as binaries are usually not actionable. E.g. changing the concept of binaries and refactoring them to ensure strong ownership would violate system architecture principles. A recent replication study by Foucault et al. [2] on open source software replicate the original results and lead to doubts about the general concept of ownership impacting code quality. In this paper, we replicated and extended the previous two ownership studies [1, 2] and reflect on their findings. Further, we define several new ownership metrics to investigate the dependency between ownership and code quality on file and directory level for 4 major Microsoft products. The results confirm the original findings by Bird et al. [1] that code ownership correlates with code quality. Using new and refined code ownership metrics we were able to classify source files that contained at least one bug with a median precision of 0.74 and a median recall of 0.38. On directory level, we achieve a precision of 0.76 and a recall of 0.60.