[arch-dev-public] O'Reilly book: Making Software

Aaron Griffin aaronmgriffin at gmail.com
Mon Nov 22 19:39:57 CET 2010


My boss just brought this book over to my desk. It's one of those
theory books that's all about measuring lines of code and whatnot.

But the reason he brought it over, is Chapter 8: "Beyond Lines of
Code: Do We Need More Complex Metrics?". Two pages in, it begins with:

**Measuring the Source Code**
We have selected for our case study the ArchLinux software
distribution (http://archlinux.org), which contains thousands of
packages, all open source. ArchLinux is a lightweight GNU/Linux
distribution whose maintainers refuse to modify the source code
packaged for the distribution, in order to meet the goal of
drastically reducing the time that elapses between the official
release of a package and its integration into the distribution.
...
Because of the size of ArchLinux, using it as a case study gives us
access to the original source code of thousands of open source
projects, through the build scripts used by ABS (see Example 8-1)

The chapter goes on with statistics and all that junk. They're not
studying Arch, but using Arch as a launch pad for getting large
amounts of open source source code for analysis. The numbers are
interesting:

The ArchLinux repositories contained 4096 packages (as of April 2010),
with some of the packages being different versions of the same
upstream project. After removing different versions, we obtained a
sample of 4015 packages, containing 1272748 source code files. Among
all those files, 576511 were written in C. However, there were
repeated files. In the overall sample, only 776573 were unique files;
in the C subsample, only 338831 were unique files. From these unique C
files, 212167 were nonheader files and 126664 were header files.


More information about the arch-dev-public mailing list