[arch-dev-public] O'Reilly book: Making Software

Allan McRae allan at archlinux.org
Mon Nov 22 22:20:41 CET 2010


On 23/11/10 07:09, Dieter Plaetinck wrote:
> On Mon, 22 Nov 2010 12:39:57 -0600
> Aaron Griffin<aaronmgriffin at gmail.com>  wrote:
>
>> My boss just brought this book over to my desk. It's one of those
>> theory books that's all about measuring lines of code and whatnot.
>>
>> But the reason he brought it over, is Chapter 8: "Beyond Lines of
>> Code: Do We Need More Complex Metrics?". Two pages in, it begins with:
>>
>> **Measuring the Source Code**
>> We have selected for our case study the ArchLinux software
>> distribution (http://archlinux.org), which contains thousands of
>> packages, all open source. ArchLinux is a lightweight GNU/Linux
>> distribution whose maintainers refuse to modify the source code
>> packaged for the distribution, in order to meet the goal of
>> drastically reducing the time that elapses between the official
>> release of a package and its integration into the distribution.
>> ...
>> Because of the size of ArchLinux, using it as a case study gives us
>> access to the original source code of thousands of open source
>> projects, through the build scripts used by ABS (see Example 8-1)
>>
>> The chapter goes on with statistics and all that junk. They're not
>> studying Arch, but using Arch as a launch pad for getting large
>> amounts of open source source code for analysis. The numbers are
>> interesting:
>>
>> The ArchLinux repositories contained 4096 packages (as of April 2010),
>> with some of the packages being different versions of the same
>> upstream project. After removing different versions, we obtained a
>> sample of 4015 packages, containing 1272748 source code files. Among
>> all those files, 576511 were written in C. However, there were
>> repeated files. In the overall sample, only 776573 were unique files;
>> in the C subsample, only 338831 were unique files. From these unique C
>> files, 212167 were nonheader files and 126664 were header files.
>
>
> hmmm.. 39% of all files in our packages are duplicates. That's
> interesting.  Wonder where they come from.
>

A lot of projects include sources from their dependencies inside their 
tarball.

Allan


More information about the arch-dev-public mailing list