Re: [aur-general] AUR Maintenance
On 28/02/13 01:54 PM, Phillip Smith wrote:
I'd be willing to try and assist with this too. What is the format of that backup file?
It is a 46MB text file of SQL commands; the kind you would get by running mysqldump. It only has 462 lines, but some of them are very long. The important lines are 97-117 that specify the "PackageComments" table: CREATE TABLE `PackageComments` ( `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `PackageID` int(10) unsigned NOT NULL DEFAULT '0', `UsersID` int(10) unsigned NOT NULL DEFAULT '0', `Comments` text NOT NULL, `CommentTS` bigint(20) unsigned NOT NULL DEFAULT '0', `DelUsersID` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`ID`), KEY `UsersID` (`UsersID`), KEY `PackageID` (`PackageID`), KEY `DelUsersID` (`DelUsersID`) ) ENGINE=MyISAM AUTO_INCREMENT=154508 DEFAULT CHARSET=latin1; /*!40101 SET character_set_client = @saved_cs_client */; -- -- Dumping data for table `PackageComments` -- LOCK TABLES `PackageComments` WRITE; /*!40000 ALTER TABLE `PackageComments` DISABLE KEYS */; ID is a number identifying the comment, PackageID is the package to which it belongs, UsersID is the one who posted it, Comments is the actual text of it, CommentTS is the timestamp of when it was posted, DelUsersID is equal to the ID of the user who deleted the comment and 0 if it has not been deleted. The next important lines are 118-146 which state the actual comment data. An example of it is: INSERT INTO `PackageComments` VALUES (17,46,68,'ruby bindings for fastcgi',1113164127,68),(28,69,65,'A countdown timer applet for the GNOME panel.',1113178883,0); Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR.
On Fri, Mar 1, 2013 at 5:07 AM, Connor Behan <connor.behan@gmail.com> wrote: [...]
INSERT INTO `PackageComments` VALUES (17,46,68,'ruby bindings for fastcgi',1113164127,68),(28,69,65,'A countdown timer applet for the GNOME panel.',1113178883,0);
Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR.
That's why we use machines to do this kind of work for us. Also, a lovely idea to restore comments that are older than two years, that'll be extremely beneficial to the quality of the aur. Other thoughts on this, we don't need comments on packages that don't exist any more, that were deleted already or are made by users which aren't in the db any more. If I understood correctly, none of that data is currently in the aur's comments? or all? cheers! mar77i
On 01/03/13 06:02 AM, Martti Kühne wrote:
On Fri, Mar 1, 2013 at 5:07 AM, Connor Behan <connor.behan@gmail.com> wrote: [...]
INSERT INTO `PackageComments` VALUES (17,46,68,'ruby bindings for fastcgi',1113164127,68),(28,69,65,'A countdown timer applet for the GNOME panel.',1113178883,0);
Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR.
That's why we use machines to do this kind of work for us. Also, a lovely idea to restore comments that are older than two years, that'll be extremely beneficial to the quality of the aur. Right, inserting this data into a db can be automated. It would just require minor syntax changes to account for the newer MySQL version. This hasn't been done, I gather, because the devs hold themselves to a high standard and don't want corrupted text littering the AUR comments. Fixing the encoding of the text is what might require "reading". Loui Chang seemed to think there was a way to automate this as well but it would be nontrivial so the project got put on the back burner. I should ask him. Other thoughts on this, we don't need comments on packages that don't exist any more, that were deleted already or are made by users which aren't in the db any more. If I understood correctly, none of that data is currently in the aur's comments? or all? Whether a comment is a "deleted comment" is stored in the AUR database. Whether it belongs to a deleted package or a deleted user, I believe, is not. If you delete an AUR package, the PHP file will only delete the record for that package. Comments that were part of it stay in the db as "orphan data". In fact, package tarballs don't even get deleted by the PHP file. This is done by a helper script that periodically runs a cleanup.
However, if this 2010 backup does get imported into the AUR, I agree that we can take the liberty of removing such orphan data so there is less to import.
cheers! mar77i
On 1 March 2013 15:07, Connor Behan <connor.behan@gmail.com> wrote:
Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR.
While it is a lot of data, I agree that it shouldn't be that difficult to recover. What am I missing? Any chance those of us who aren't TU's can get access to the file?
On 3/3/13, Phillip Smith <lists@fukawi2.nl> wrote:
While it is a lot of data, I agree that it shouldn't be that difficult to recover. What am I missing? Any chance those of us who aren't TU's can get access to the file?
I also came close to that question, which indeed is kind of obvious. cheers! mar77i
participants (3)
-
Connor Behan
-
Martti Kühne
-
Phillip Smith