An Empirical Study of License Violations in Open Source Projects
- Arunesh Mathur ,
- Harshal Choudhary ,
- Priyank Vashist ,
- Bill Thies ,
- Santhi Thilagam
IEEE Software Engineering Workshop (SEW 2012) |
The use of Open Source Software (OSS) components in building applications has presented the challenge of integrating them in a way such that the licenses of the individual components do not conflict with each other and if applicable, the overall license of the application. These conflicts lead to violations, with many having far reaching legal consequences. While proprietary software firms are often plagued with the risks of not satisfying the clauses of OSS licenses, we hypothesize that a large degree of code reuse within the OSS community poses similar threats too. Through an analysis of 1423 projects, consisting of approximately 69 million non-blank lines of code from Google Code project hosting, we validate instances of code reuse between projects by comparing their licenses. Our results discover four violations, evaluated by searching for files that share similar content. Additionally, we present statistics on code reuse within the set of projects.