On 01/03/13 06:02 AM, Martti Kühne wrote:
On Fri, Mar 1, 2013 at 5:07 AM, Connor Behan <connor.behan@gmail.com> wrote: [...]
INSERT INTO `PackageComments` VALUES (17,46,68,'ruby bindings for fastcgi',1113164127,68),(28,69,65,'A countdown timer applet for the GNOME panel.',1113178883,0);
Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR.
That's why we use machines to do this kind of work for us. Also, a lovely idea to restore comments that are older than two years, that'll be extremely beneficial to the quality of the aur. Right, inserting this data into a db can be automated. It would just require minor syntax changes to account for the newer MySQL version. This hasn't been done, I gather, because the devs hold themselves to a high standard and don't want corrupted text littering the AUR comments. Fixing the encoding of the text is what might require "reading". Loui Chang seemed to think there was a way to automate this as well but it would be nontrivial so the project got put on the back burner. I should ask him. Other thoughts on this, we don't need comments on packages that don't exist any more, that were deleted already or are made by users which aren't in the db any more. If I understood correctly, none of that data is currently in the aur's comments? or all? Whether a comment is a "deleted comment" is stored in the AUR database. Whether it belongs to a deleted package or a deleted user, I believe, is not. If you delete an AUR package, the PHP file will only delete the record for that package. Comments that were part of it stay in the db as "orphan data". In fact, package tarballs don't even get deleted by the PHP file. This is done by a helper script that periodically runs a cleanup.
However, if this 2010 backup does get imported into the AUR, I agree that we can take the liberty of removing such orphan data so there is less to import.
cheers! mar77i