[aur-general] AUR Maintenance
Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while.
On Tue 23 Mar 2010 15:09 -0400, Loui Chang wrote:
Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while.
I've deleted existing comments from the AUR. I ran into a problem juggling the encodings, which was the problem I was trying to fix. The aur should properly display utf8 in comments now though. It may be possible to restore most of the old comments, but that's something that we'd have to look into later. Cheers!
On Tue, Mar 23, 2010 at 9:43 PM, Loui Chang <louipc.ist@gmail.com> wrote:
On Tue 23 Mar 2010 15:09 -0400, Loui Chang wrote:
Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while.
I've deleted existing comments from the AUR.
I ran into a problem juggling the encodings, which was the problem I was trying to fix. The aur should properly display utf8 in comments now though.
It may be possible to restore most of the old comments, but that's something that we'd have to look into later.
Probably a stupid question but just to be sure : being able to look into it later supposes that there is an easy way to restore old comments by keeping the new ones ? (i.e. merging both) Or will the new ones be lost when restoring the old ones ?
On Tue 23 Mar 2010 21:51 +0100, Xavier Chantry wrote:
On Tue, Mar 23, 2010 at 9:43 PM, Loui Chang <louipc.ist@gmail.com> wrote:
On Tue 23 Mar 2010 15:09 -0400, Loui Chang wrote:
Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while.
I've deleted existing comments from the AUR.
I ran into a problem juggling the encodings, which was the problem I was trying to fix. The aur should properly display utf8 in comments now though.
It may be possible to restore most of the old comments, but that's something that we'd have to look into later.
Probably a stupid question but just to be sure : being able to look into it later supposes that there is an easy way to restore old comments by keeping the new ones ? (i.e. merging both) Or will the new ones be lost when restoring the old ones ?
Old comments are mostly backed up but suffer from some encoding issues - that's the first hurdle. There should be a way to merge old and new comments. I'm not exactly sure how easy that would be however. Probably pretty easy for a real sysadmin. That I am not unfortunately. I'm not sure how much value is in the old comments, but it's not worth keeping the AUR locked down while I try to figure it out.
On Tue, Mar 23, 2010 at 10:39 PM, Loui Chang <louipc.ist@gmail.com> wrote:
I'm not sure how much value is in the old comments, but it's not worth keeping the AUR locked down while I try to figure it out.
I would say there are 90% of crap and 10% that would be a shame to lose :) I hope someone more knowledgeable about mysql/encoding/sysadmin can help.
On Tue, Mar 23, 2010 at 17:41, Xavier Chantry <chantry.xavier@gmail.com> wrote:
I would say there are 90% of crap and 10% that would be a shame to lose :)
I hope someone more knowledgeable about mysql/encoding/sysadmin can help.
I might throw up an announcement on the forums that they've been removed and a call for help on fixing it.
On Tue, Mar 23, 2010 at 16:43, Loui Chang <louipc.ist@gmail.com> wrote:
It may be possible to restore most of the old comments, but that's something that we'd have to look into later.
What's needed for this, and what ways could someone contribute?
On Tue 23 Mar 2010 16:51 -0400, Daenyth Blank wrote:
On Tue, Mar 23, 2010 at 16:43, Loui Chang <louipc.ist@gmail.com> wrote:
It may be possible to restore most of the old comments, but that's something that we'd have to look into later.
What's needed for this, and what ways could someone contribute?
We need someone with a keen knowledge of mysql and encodings to be able to restore the backed up comments properly in utf8.
On 23/03/2010 22:24, Loui Chang wrote:
On Tue 23 Mar 2010 16:51 -0400, Daenyth Blank wrote:
On Tue, Mar 23, 2010 at 16:43, Loui Chang<louipc.ist@gmail.com> wrote:
It may be possible to restore most of the old comments, but that's something that we'd have to look into later.
What's needed for this, and what ways could someone contribute?
We need someone with a keen knowledge of mysql and encodings to be able to restore the backed up comments properly in utf8.
I've done encoding conversions and repairs countless times (mostly using Perl). So perhaps I could help on this... (Not today though, but probably tomorrow). Contact me off-list and give me more detailed instructions of what the issue is. I do have access to sigurd but I can't look at the data right now as I am not in the mysql group. F
On 25/03/10 02:06, Firmicus wrote:
On 23/03/2010 22:24, Loui Chang wrote:
On Tue 23 Mar 2010 16:51 -0400, Daenyth Blank wrote:
On Tue, Mar 23, 2010 at 16:43, Loui Chang<louipc.ist@gmail.com> wrote:
It may be possible to restore most of the old comments, but that's something that we'd have to look into later. What's needed for this, and what ways could someone contribute? We need someone with a keen knowledge of mysql and encodings to be able to restore the backed up comments properly in utf8.
I've done encoding conversions and repairs countless times (mostly using Perl). So perhaps I could help on this... (Not today though, but probably tomorrow). Contact me off-list and give me more detailed instructions of what the issue is. I do have access to sigurd but I can't look at the data right now as I am not in the mysql group.
Is there any progress on fixing this? There are a lot of packaging notes on those pages that would be a shame to lose. Allan
Am Montag, 29. März 2010 01:21:09 schrieb Allan McRae:
Is there any progress on fixing this? There are a lot of packaging notes on those pages that would be a shame to lose.
It's very likely the same issue I had updating the wiki. This is caused by a mysql packaging change which switched the default encoding from latin1 to utf8. Here are some tips: http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL But I guess we lost the chance to fix this more or less easily because the AUR content has changed since the last backup. This requires some kind of script that imports and merges the old and new comments. -- Pierre Schmitz, https://users.archlinux.de/~pierre
On 29/03/2010 09:00, Pierre Schmitz wrote:
Am Montag, 29. März 2010 01:21:09 schrieb Allan McRae:
Is there any progress on fixing this? There are a lot of packaging notes on those pages that would be a shame to lose.
It's very likely the same issue I had updating the wiki. This is caused by a mysql packaging change which switched the default encoding from latin1 to utf8. Here are some tips: http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL
But I guess we lost the chance to fix this more or less easily because the AUR content has changed since the last backup. Indeed. Believe me, the encoding of the strings was in a terrible mess (mostly the comments, but also the names of users), so it was no longer simply a matter of doing a conversion from one charset to another. Basically what I did was to convert from windows-1252 (!) to UTF-8, and
I did it last Thursday. I've done my best to repair the mysql backup Loui pointed me at. I'd say it's 95% fixed now, but the procedure left a few isolated illegal characters in its trail (like this: �), especially within Cyrillic and CJK. The text should be legible however. You can compare the original on sigurd /srv/http/aur.archlinux.org/backup/aur-20100205-1859.sql.gz with my repaired version: /home/francois/aur-20100205-1859.sql.fixed2.xz and judge whether any further effort is needed or justified. then repair all "doubly-encoded" UTF-8 characters using the perl module Encode::DoubleEncodedUTF8 (on CPAN). But as I said above, there is no way to automatically recover everything from that one backup alone.
This requires some kind of script that imports and merges the old and new comments.
The problem with that "import and merge" operation – unless it is done with a reliable and well-tested tool – is that it risks damaging the data more than it currently is ;) I'll leave it to Loui to decide whether it's worth the trouble. F
So... is there an official word about what is the final decision on restoring these? Is it still being investigated how to fix this or it just being left? Allan
On Wed 31 Mar 2010 13:08 +1000, Allan McRae wrote:
So... is there an official word about what is the final decision on restoring these? Is it still being investigated how to fix this or it just being left?
I may investigate more, but I'm not putting any deadline on it. The encodings are mostly fixed, but it's possible to fix them more. I am considering patching some of the code dealing with comments, and launching a new release of the AUR before that though. Not really 'official' but neither is the AUR really. ;) Cheers.
participants (6)
-
Allan McRae
-
Daenyth Blank
-
Firmicus
-
Loui Chang
-
Pierre Schmitz
-
Xavier Chantry