[arch-dev-public] Problem with web dashboard: massive orphaning of packages

Fri Sep 12 17:19:33 EDT 2008

On Fri, Sep 12, 2008 at 3:57 PM, Dan McGee <dpmcgee at gmail.com> wrote:
> On Fri, Sep 12, 2008 at 3:23 PM, Dusty Phillips <buchuki at gmail.com> wrote:
>> 2008/9/12 Eric Belanger <belanger at astro.umontreal.ca>:
>>> Hi,
>>>
>>> I don't know if you remember but a while ago a huge part of extra i686 (IIRC
>>> it was all packages from L to Z) were orphaned and erroneouly showing up as
>>> recently updated on the web site.  This just happened again with packages in
>>> extra x86_64. I don't know what could caused that but it's very annoying as
>>> we has to readopt all our packages back.
>>
>> Fuck.
>>
>> I remember Judd telling me not to swear at users but its ok to swear
>> at scripts right?
>>
>> This has to be happening in reporead.py.  Fucking reporead.py. To the
>> best of my knowledge, no other script updates the web database in
>> anyway, am I wrong?
>>
>>
>> The actual db_update script splits the packages into those that are in
>> the database and those that are not and processes them separately.
>> Packages that are not currently in the database get added as orphans
>> because apparently its hard to interrogate the maintainer from the
>> db.tar.gz. At first, I assumed that it is doing an add when it should
>> be doing an update, which would add new packages with orphan
>> maintainer. But this doesn't appear to be the case because there are
>> not currently any duplicate x86_64 packages (that aren't in testing).
>>
>> My second more likely hypothesis is race conditions. I don't know how
>> the db scripts update exactly, but I suspect reporead is reading a
>> db.tar.gz file that is either broken or not yet fully uploaded. It
>> sees this broken db file and drops all the packages in the web
>> interface that are not in that file. Then x minutes later (crontab),
>> it runs again on a proper db and sees the missing packages again. It
>> adds them to the database and sets the maintainer to orphan.
>>
>> Are such broken dbs possible/likely/happening? If its a race
>> condition, we need to put a lock on the database (maybe dbtools does
>> this already) so that reporead isn't accessing it at the same time as
>> dbtools. If its just that when the database gets updated it sometimes
>> breaks the database well.. that just needs to be fixed.
>
> This would be a hell of a race condition- to make a database, we first
> unzip it to a temp location, make our changes and updates, and then
> rezip it. Thus reporead.py would have to open the db while it is being
> zipped, which is a very short period of time, but I guess
> theoretically possible.
>
> WIthout looking at the repo-add code, I don't know if we do this now,
> but we probably should:
> 1. unzip the db to a temp location
> 2. make changes
> 3. rezip it to db.tar.gz.new
> 4. move old db to db.tar.gz.old
> 5 move new db to db.tar.gz
>
> This would make the "db replacement" portion atomic in the sense that
> we would never have a partial DB; we would only have a short period of
> time where no db existed in that location. If really necessary we
> could avoid even this by copying the old db to one with the old
> extension instead of moving it.

Well, all the repo-add stuff is done in a subdir of /tmp too, then
it's simply 'mv'ed to /home/ftp, so it *should* be fairly atomic...
well, it would be if /tmp was on the same filesystem - it's just a
matter of moving inodes