[aur-general] AUR Maintenance

Firmicus Firmicus at gmx.net
Mon Mar 29 10:26:02 CEST 2010


On 29/03/2010 09:00, Pierre Schmitz wrote:
> Am Montag, 29. März 2010 01:21:09 schrieb Allan McRae:
>    
>> Is there any progress on fixing this?  There are a lot of packaging
>> notes on those pages that would be a shame to lose.
>>      

I did it last Thursday. I've done my best to repair the mysql backup 
Loui pointed me at. I'd say it's 95% fixed now, but the procedure left a 
few isolated illegal characters in its trail (like this: �), especially 
within Cyrillic and CJK. The text should be legible however. You can 
compare the original on sigurd
/srv/http/aur.archlinux.org/backup/aur-20100205-1859.sql.gz
with my repaired version:
/home/francois/aur-20100205-1859.sql.fixed2.xz
and judge whether any further effort is needed or justified.
> It's very likely the same issue I had updating the wiki. This is caused by a
> mysql packaging change which switched the default encoding from latin1 to
> utf8. Here are some tips:
> http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL
>
> But I guess we lost the chance to fix this more or less easily because the AUR
> content has changed since the last backup.
Indeed. Believe me, the encoding of the strings was in a terrible mess 
(mostly the comments, but also the names of users), so it was no longer 
simply a matter of doing a conversion from one charset to another. 
Basically what I did was to convert from windows-1252 (!) to UTF-8, and 
then repair all "doubly-encoded" UTF-8 characters using the perl module 
Encode::DoubleEncodedUTF8 (on CPAN). But as I said above, there is no 
way to automatically recover everything from that one backup alone.

> This requires some kind of script
> that imports and merges the old and new comments.
>
>    
The problem with that "import and merge" operation – unless it is done 
with a reliable and well-tested tool – is that it risks damaging the 
data more than it currently is ;) I'll leave it to Loui to decide 
whether it's worth the trouble.

F


More information about the aur-general mailing list