[aur-general] AUR Maintenance
Firmicus at gmx.net
Mon Mar 29 10:26:02 CEST 2010
On 29/03/2010 09:00, Pierre Schmitz wrote:
> Am Montag, 29. März 2010 01:21:09 schrieb Allan McRae:
>> Is there any progress on fixing this? There are a lot of packaging
>> notes on those pages that would be a shame to lose.
I did it last Thursday. I've done my best to repair the mysql backup
Loui pointed me at. I'd say it's 95% fixed now, but the procedure left a
few isolated illegal characters in its trail (like this: �), especially
within Cyrillic and CJK. The text should be legible however. You can
compare the original on sigurd
with my repaired version:
and judge whether any further effort is needed or justified.
> It's very likely the same issue I had updating the wiki. This is caused by a
> mysql packaging change which switched the default encoding from latin1 to
> utf8. Here are some tips:
> But I guess we lost the chance to fix this more or less easily because the AUR
> content has changed since the last backup.
Indeed. Believe me, the encoding of the strings was in a terrible mess
(mostly the comments, but also the names of users), so it was no longer
simply a matter of doing a conversion from one charset to another.
Basically what I did was to convert from windows-1252 (!) to UTF-8, and
then repair all "doubly-encoded" UTF-8 characters using the perl module
Encode::DoubleEncodedUTF8 (on CPAN). But as I said above, there is no
way to automatically recover everything from that one backup alone.
> This requires some kind of script
> that imports and merges the old and new comments.
The problem with that "import and merge" operation – unless it is done
with a reliable and well-tested tool – is that it risks damaging the
data more than it currently is ;) I'll leave it to Loui to decide
whether it's worth the trouble.
More information about the aur-general