On 5/6/07, Simo Leone <simo@archlinux.org> wrote:
On Sun, May 06, 2007 at 08:28:53PM -0400, Dan McGee wrote:
I know we weren't supposed to rush into the objectives here before settling on the goals, but I think this is an important one to examine, especially with a change in the repository structure happening in the coming months.
How many of the devs are using archstats, let alone users? After looking at it a bit more today, I realized this could be a great asset to finding areas where devs should be spending their time with regard to package maintenance. It would also help us out greatly when it comes to determining which packages should no longer be maintained by us and dropped back to the TU level and/or unsupported.
However, a few things need work: 1. The current website <http://www.archlinux.org/~simo/archstats/index.php> is in dire need of an overhaul. Using the old website theme is the least of our worries- things like the package listing are at this point rather unusable and only suck 223 MB of memory in Firefox once fully loaded. I have a lot of ideas for this here- enable breakdown by pkgname only (so it actually looks like people are running kernel26), limit number of results on the page unless someone actually selects to see them all, etc. 2. The archstats database. It contains several one-time system updates, and several systems that haven't updated since 2005 or earlier. This is clearly junk data, and to make archstats useful we should probably just start fresh, and find a way to cut down on spurious commits, which leads into... 3. The archstats program itself. In the last week, I've had no problems with it, but have had problems before (some of the spurious commits above were definitely my fault). Configuration should probably be editable in a conf file (the current /etc file has a big fat warning saying do not edit by hand- this seems not the Arch Way). Setting it up as a cron job is straightforward, but if we want people to use it we should probably think of a way to make it even easier.
Comments on any of this? I know its yet another project idea to be thrown out there, but this one could prove very helpful and in the long run vastly reduce dev maintenance of package when we realize very few users are actually using a package and it should be maintained elsewhere.
Actually I have an entirely rewritten archstats laying on my harddrive, all it needs is someone to slap a pretty web interface on it. It's probably not very well done, it was me playing with django a bit, but nonetheless it didn't take very long to put together.
As for ideas, what I've been wanting to do is have a way for pacman to have some form of "hooks". Something where a command gets run after an Syu or an S or an R, in this case, that command would be whatever is required for archstats to update the package list. This would make it really easy for people to just "set and forget" archstats, and we could get very good and up to date stats that way.
Also, I sort of inherited the archstats project from eric when he left, and I haven't really touched it at all, besides culling some old data once in a while (havent done that in a few months though).
It's another one of those projects I said I'd work on but haven't gotten around to... I won't bother making promises I might not keep but school does end soon, I've got nothing but free time in a few weeks... hopefully I can get my butt in gear and at least get the ball rolling or something.
Simo already knows this, but here is what I started last night while procrastinating my exam studying: <http://code.toofishes.net/gitweb.cgi?p=archstats.git;a=summary> I've managed to remove about 500 lines of code by moving repetition to functions (still more to be done, but that was 1/6 of the code). I also completely bypassed the MD5sum checking stuff, showing how that is worthless. Simo and I were trying to think of a better way to do client verification (Jürgen, any ideas?), and we came up with nothing. Obviously we could get poisoned data using archstats, but at the same time, some data seems better than no data especially if we can patrol it. -Dan