[arch-dev-public] O'Reilly book: Making Software

Dieter Plaetinck dieter at plaetinck.be
Mon Nov 22 22:09:09 CET 2010


On Mon, 22 Nov 2010 12:39:57 -0600
Aaron Griffin <aaronmgriffin at gmail.com> wrote:

> My boss just brought this book over to my desk. It's one of those
> theory books that's all about measuring lines of code and whatnot.
> 
> But the reason he brought it over, is Chapter 8: "Beyond Lines of
> Code: Do We Need More Complex Metrics?". Two pages in, it begins with:
> 
> **Measuring the Source Code**
> We have selected for our case study the ArchLinux software
> distribution (http://archlinux.org), which contains thousands of
> packages, all open source. ArchLinux is a lightweight GNU/Linux
> distribution whose maintainers refuse to modify the source code
> packaged for the distribution, in order to meet the goal of
> drastically reducing the time that elapses between the official
> release of a package and its integration into the distribution.
> ...
> Because of the size of ArchLinux, using it as a case study gives us
> access to the original source code of thousands of open source
> projects, through the build scripts used by ABS (see Example 8-1)
> 
> The chapter goes on with statistics and all that junk. They're not
> studying Arch, but using Arch as a launch pad for getting large
> amounts of open source source code for analysis. The numbers are
> interesting:
> 
> The ArchLinux repositories contained 4096 packages (as of April 2010),
> with some of the packages being different versions of the same
> upstream project. After removing different versions, we obtained a
> sample of 4015 packages, containing 1272748 source code files. Among
> all those files, 576511 were written in C. However, there were
> repeated files. In the overall sample, only 776573 were unique files;
> in the C subsample, only 338831 were unique files. From these unique C
> files, 212167 were nonheader files and 126664 were header files.


hmmm.. 39% of all files in our packages are duplicates. That's
interesting.  Wonder where they come from.

Dieter


More information about the arch-dev-public mailing list