[arch-dev-public] Problem with web dashboard: massive orphaning of packages

Fri Sep 12 16:55:39 EDT 2008

On Fri, Sep 12, 2008 at 3:23 PM, Dusty Phillips <buchuki at gmail.com> wrote:
> 2008/9/12 Eric Belanger <belanger at astro.umontreal.ca>:
>> Hi,
>>
>> I don't know if you remember but a while ago a huge part of extra i686 (IIRC
>> it was all packages from L to Z) were orphaned and erroneouly showing up as
>> recently updated on the web site.  This just happened again with packages in
>> extra x86_64. I don't know what could caused that but it's very annoying as
>> we has to readopt all our packages back.
>
> Fuck.
>
> I remember Judd telling me not to swear at users but its ok to swear
> at scripts right?
>
> This has to be happening in reporead.py.  Fucking reporead.py. To the
> best of my knowledge, no other script updates the web database in
> anyway, am I wrong?
>
>
> The actual db_update script splits the packages into those that are in
> the database and those that are not and processes them separately.
> Packages that are not currently in the database get added as orphans
> because apparently its hard to interrogate the maintainer from the
> db.tar.gz. At first, I assumed that it is doing an add when it should
> be doing an update, which would add new packages with orphan
> maintainer. But this doesn't appear to be the case because there are
> not currently any duplicate x86_64 packages (that aren't in testing).
>
> My second more likely hypothesis is race conditions. I don't know how
> the db scripts update exactly, but I suspect reporead is reading a
> db.tar.gz file that is either broken or not yet fully uploaded. It
> sees this broken db file and drops all the packages in the web
> interface that are not in that file. Then x minutes later (crontab),
> it runs again on a proper db and sees the missing packages again. It
> adds them to the database and sets the maintainer to orphan.
>
> Are such broken dbs possible/likely/happening? If its a race
> condition, we need to put a lock on the database (maybe dbtools does
> this already) so that reporead isn't accessing it at the same time as
> dbtools. If its just that when the database gets updated it sometimes
> breaks the database well.. that just needs to be fixed.

Hmmm, the DBs are constructed in /tmp and then moved live to
/home/ftp/whatever it's possible that reporead may be opening it
mid-move, but that doesn't seem right. It's gzipped. Wouldn't that
balk if you took half of a DB file, and tried to gunzip it?