Automating cleanup of AUR
The requests queue is getting pretty filled with requests for packages that quite clearly have been abandoned with no chance of them ever becoming useful. It can be pretty exhausting to keep up with that queue! In particular, there's a lot of "low-hanging fruit" packages that can be pruned from the AUR: Orphaned, 0 votes, last updated years ago packages are hardly worth keeping around, are they? Has such a conversation popped up before? I didn't find anything in the archives past the patch by Lukas enabling automatic orphaning of an OOD package after 180 days. [1] If automatically deleting packages on a schedule isn't welcome, would it make sense to at least explore a patch wherein a deletion request for e.g. an orphaned package/<10 votes/last updated >=2 years ago is automatically accepted? That would alleviate a lot on both the dedicated requesters helping out as well as those working the queue. [1] https://lists.archlinux.org/pipermail/aur-dev/2014-July/002876.html
Hi Brett, Thank you a lot for bringing this up. While I am developing the "next-gen" aurweb, I am largely unaware of issues that TU face on live unless they're found in a bug report or communicated to the ML. We do already have some cron scripts that are used for user and package maintenance; looks like we need to produce one here for requests as well. However, before we do so, I would like to iron out a concrete spec for them in terms of timing. Also, "next-gen" aurweb is getting pretty close to release now, so I'd like to put this off until that's released. For the current, more immediate queue issue, I'd like to look into performing a manual cleanup based on our spec as soon as possible, to get these massive request queues out of your guy's heads. We will still need to be sending notifications out about the requests cleaned up by such a cron script, so that we can keep tracking of these actions as much as possible. A few arguments against your current spec: 1. There can be packages which have not been updated for 2 years which are still relevent. 2. Deciding on a number of votes I think is unnecessary. I think we should model these removals based on the state of the package in question + the state of the request (how long the request has existed for). Of course, these ideas are not a tyrannical enforcement of what the spec must be. Please do share any input you guys have. The most important thing here would be ensuring that requests which have been left in an "on-hold" state are not removed sporadically. Perhaps we should introduce a new DB column for requests that allows a TU to mark one as "on-hold" to safeguard against this pruning. We could perhaps prune packages based on how long they have been flagged out of date + how long it's been since they've been modified, with some sort of reasonable lower bound. I'll produce an issue for this to keep track of things in the gitlab repository soon. For now, just tasking immediate concerns. You guys can always drop by #archlinux-aurweb on Libera to chat on IRC about things along these lines. Not that you have to; just letting folks know it's a thing. Alright. Now, please let me know what you guys think are some good ideas for pruning out requests and ancient packages that would not interrupt expected workflows or "accidently" remove packages. Regards, Kevin -- Kevin Morris Software Developer Identities: - kevr @ Libera
On 21-12-10 19:42, Kevin Morris via aur-dev wrote:
Alright. Now, please let me know what you guys think are some good ideas for pruning out requests and ancient packages that would not interrupt expected workflows or "accidently" remove packages.
There are a lot of things that can be automated. Here's my wish list: 1. Cron - Automatically delete packages that have been orphaned for longer than x months. I think 3-6 months is reasonable. Popular packages tend to be maintained very quickly after they have been orphaned so I don't think this is too aggressive. Not to mention that the git repositories are still intact after deletion so if anyone wishes to re-add a previously deleted package, the history is there. 2. Cron - Automatically orphan packages that have been flagged out of date for longer than x months. I would think a month or two is a reasonable length of time for maintainers to either unflag the package or actually maintain the package. 3. Cron - Related to #1. Before removing the packages from the AUR, an automated email can be sent to aur-general with a list of packages about to be deleted and if anyone wants to maintain these, they're up for grabs. I think weekly would be a bit much, so maybe monthly? 4. I want a pony. There are currently over 72,000 packages, and only 60-odd Trusted Users ^W ^W Package Maintainers. Anything that eliminates time spent pruning the AUR is a welcome addition. -- George Rawlinson
On 2021-12-11 07:10 +0000 George Rawlinson via aur-dev wrote:
On 21-12-10 19:42, Kevin Morris via aur-dev wrote:
Alright. Now, please let me know what you guys think are some good ideas for pruning out requests and ancient packages that would not interrupt expected workflows or "accidently" remove packages.
There are a lot of things that can be automated. Here's my wish list:
1. Cron - Automatically delete packages that have been orphaned for longer than x months. I think 3-6 months is reasonable. Popular packages tend to be maintained very quickly after they have been orphaned so I don't think this is too aggressive. Not to mention that the git repositories are still intact after deletion so if anyone wishes to re-add a previously deleted package, the history is there.
2. Cron - Automatically orphan packages that have been flagged out of date for longer than x months. I would think a month or two is a reasonable length of time for maintainers to either unflag the package or actually maintain the package.
3. Cron - Related to #1. Before removing the packages from the AUR, an automated email can be sent to aur-general with a list of packages about to be deleted and if anyone wants to maintain these, they're up for grabs. I think weekly would be a bit much, so maybe monthly?
4. I want a pony.
There are currently over 72,000 packages, and only 60-odd Trusted Users ^W ^W Package Maintainers. Anything that eliminates time spent pruning the AUR is a welcome addition.
-- George Rawlinson
I suggest that any automatic removal of orphans after x months be combined with 2 additional conditions: last download y months ago and upstream is gone. If it's orphaned, unused and upstream is gone, it's clearly dead. We should not automatically remove packages that are still in use, and if upstream still exists than the package may be picked up again. Those packages can still be flagged manually by users and either purged automatically after some fixed time in the request queue, or tagged there to indicate that they can be purged without further investigation. Expanding on that, it would be very useful if the request dashboard provided relevant package details directly, such as last update, orphan age, and download statistics. Maybe even a timestamp of the maintainer's last activity on the AUR. Automatically orphaning packages after x months of being out-of-date is a good idea. It may even be a good idea to mark a maintainer's account as inactive whenever a package is orphaned so that orphan requests for other packages solely owned by the same maintainer can be accepted immediately. The maintainer would be marked as active again following any AUR activity. The package page should then indicate if the maintainer is considered inactive and that an orphan request would be accepted automatically. Inactive maintainers can be automatically removed from all packages after x months of inactivity, but x should be 6 to 12 months imo. Any rules for automatically accepting orphan requests should also apply to co-maintainer requests, perhaps with reduced time limits. I like the idea of announcing automatically removals as a last-chance call for maintainers. A notification to aur-general might be useful, but I would prefer a dedicate dashboard on the AUR itself visible to everyone where users could quickly adopt the packages directly. If you want to have a little fun with it, add a big purge countdown timer at the top and a pacman pill-eating animation to clear them when the time runs out. I'm tempted to go off on a tangent exploring the possibilities of federated package management for the AUR but I've probably bikeshedded enough already so I'll stop here. Regards, Xyne
On 2021-12-11 13:37, Xyne via aur-dev wrote:
I suggest that any automatic removal of orphans after x months be combined with 2 additional conditions: last download y months ago
I would agree with this rule, but it also shows why this idea is also unfeasible. I can't think of any way you're going to be able to tell the difference between bots "downloading" packages and actual users. It's possible that pkgstats.de could be leveraged here, but even a zero number there may not be the final answer that is needed. Caleb
On 2021-12-11 17:06 +0300 Caleb Maclennan via aur-dev wrote:
On 2021-12-11 13:37, Xyne via aur-dev wrote:
I suggest that any automatic removal of orphans after x months be combined with 2 additional conditions: last download y months ago
I would agree with this rule, but it also shows why this idea is also unfeasible. I can't think of any way you're going to be able to tell the difference between bots "downloading" packages and actual users. It's possible that pkgstats.de could be leveraged here, but even a zero number there may not be the final answer that is needed.
Caleb
What about ignoring all downloads from a given address based on bot-like behavior such as downloading hundreds of packages in a short interval, or repeated download checks at fixed intervals? Any bot sweeping the AUR for info should be easy to spot server-side, and any bot downloading specific packages should probably be considered an indication of the package's continued usage when deciding to purge orphans. The automatic removal should remain conservative so I don't see it as an issue if some packages are accidentally flagged as still in use. They will simply require a human to review and manually delete them in that case. The automatic removal should still handle the bulk of it. Regards, Xyne
participants (5)
-
Brett Cornwall
-
Caleb Maclennan
-
George Rawlinson
-
Kevin Morris
-
Xyne