[arch-general] Troubleshooting random crash

Andre Goree andre at drenet.info
Wed Feb 6 22:49:19 EST 2013

On 02/06/13 22:45, Curtis Shimamoto wrote:
> Andre Goree <andre at drenet.info> wrote:
>> On 02/06/13 20:14, Gaetan Bisson wrote:
>>> [2013-02-06 19:06:45 -0500] Andre Goree:
>>>> Not really too keen on downgrading a bunch of packages that might
>> break
>>>> dependencies and provide a REAL mess.  If I have to go through that
>> long
>>>> process, I'd rather just reinstall -- which at this point I'm
>> planning
>>>> to do anyways.
>>> Well, there is little point in posting to this list if you have no
>>> motivation to actually investigate the problem.
>>> For starters, you've upgraded Linux from 3.6.11 to 3.7.4 in the
>> window
>>> when you report the issue appeared; from the symptoms you described,
>>> it's a likely suspect. Downgrading it is far from being a "REAL
>> mess":
>>> you only need to downgrade/rebuild the external modules you really
>> need
>>> (probably none).
>> Indeed there isn't, and surely even less point in replying to said post
>> if in fact I had no motivation.  Given that I'm replying, I'd probably
>> like to avoid reinstalling if at all possible.  I like the idea of
>> downgrading just the kernel -- obviously I mean downgrading every
>> package I've upgraded since 1/21 was not something I wanted to
>> undertake.  I'll try this tomorrow.
>>>> In fact it seems
>>>> all system processes hang because no logs are produced after the
>> issue
>>>> rears it's ugly head.
>>> Ah. So that would mean your issue is I/O related, then?
>> It would seem so, yes.  I hinted to this at the end of my last reply as
>> well.
>>>>> So you produce nothing at work?
>>>> Not sure if you're just being an ass or not, however if you aren't:
>>>> that has nothing at all to do with the issue and I merely wanted to
>>>> establish _why_ I was using btrfs on a machine that I have running
>> at my
>>>> job -- which is _also_ inconsequential in the context of my email. 
>> If
>>>> you indeed were being an ass, congrats, you succeeded.
>>> Once you were done being offended, you could have looked for the
>> meaning
>>> behind the words I used: that your "main work desktop" really
>> qualifies
>>> as "a production server".
>>> But, of course, as you have so unequivocally declared, btrfs has
>>> absolutely "nothing at all to do with the issue". And your statement
>>> above implying that the problem is I/O related is just a coincidence.
>> I think you mis-comprehended my reply.  Following the context, I merely
>> meant that distinguishing my system from a production server and
>> explaining why I was running btrfs on this system was inconsequential
>> to
>> the issue at hand.  Which is still true.  I never said nor meant it to
>> be understood that I believed btrfs not to be the problem.  In fact,
>> the
>> opposite is true.
>> So, for the sake of clarity, I never declared (and certainly not
>> unequivocally) "btrfs has absolutely nothing at all to do with the
>> issue", but rather, my distinctions and reasons for running btrfs have
>> nothing to do with the issue.  Not sure how you got that mixed up,
>> especially given the later part of my reply.
>>> Reporting issues is worthless when speculation is substituted for
>> hard
>>> data. For example, a good report would have gone: "I believe this
>> issue
>>> is unrelated to btrfs being my root filesystem since, on another Arch
>>> machine running ext3, I observe the following identical symptoms:
>> first,
>>> `ssh -vvv` hangs at exactly the same point; second..."
>> I'll be sure to raise my reporting standards the next time I'd like
>> help
>>from an Arch list, my apologies.
>>>>> How about looking at the system logs to see what your system was up
>> to
>>>>> just before a crash? 
>>>> I've done that, with no real hints.  That's the first thing any
>> linux
>>>> admin does when confronted with an issue such as this, no?
>>> Sure. But your first post gave no indication that you did that.
>> Indeed, I need to raise my reporting standards, I figured a lot of
>> stuff
>> was implied but I now know I must be much clearer.  Again, my
>> apologies.
>>>> Is there
>>>> perhaps a way to build Thunderbird with debug symbols or some kind
>> of
>>>> logging?  I seem to recall opening Thunderbird each time this issue
>> has
>>>> showed up.
>>> Well it would be nice to confirm that it is indeed at fault;
>> downgrading
>>> it is certainly not a "REAL mess" either. You can certainly also
>> build
>>> it with debug symbols: in the PKGBUILD (or makepkg.conf), set
>>> CXXFLAGS='' LDFLAGS='' CFLAGS='-g' and remove the strip option.
>> Given that thunderbird wasn't upgraded in the time that this issue
>> began, not sure a downgrade would help but it may be worth a shot.
>> Thanks for the pointers on building with debug symbols.
>>>> I'm ready
>>>> to blame btrfs b/c that's the only issue I see with my setup -- I
>> also
>>>> have a tough time running a virtual machine on this box which I
>> believe
>>>> is also due to btrfs.
>>> Didn't you write just a few lines ago that btrfs "has nothing at all
>> to
>>> do with the issue"?
>> I most certainly did not, there's an obvious misunderstanding here.
>>> Wild guess: your thunderbird mail database is huge (just like the
>> disk
>>> image of your virtual machine - although I cannot really know what
>> you
>>> mean by "tough time") and your btrfs has problems dealing with such
>> big
>>> files (for instance, because your filesystem nearly full). To
>> confirm,
>>> start thunderbird with an empty profile (such as by renaming
>> ~/.mozilla
>>> into ~/.mozilla.old) and see what happens.
>> The thing is, this doesn't happen everytime I start thunderbird --
>> rather, seemingly, after the system has been up for a long period (>20
>> hrs or so).  The filesystem is not nearly full either, though it does
>> contain a lot of data.  I'm thinking downgrading to 3.6.x will help a
>> bit.  I'm going to look for btrfs bugs in 3.7.x and see if anyone else
>> has been having a similar issue as well.  Thanks for the assistance.
> For downgrading, I have found the Arch Rollback Machine quite handy. You can choose a date to roll back to, sync and then have pacman reinstall all packages that it finds are newer than its database. This is of course if there was not a significant change like potentially the recent filesyatem update. 
> I would certainly try doing just the kernel first though, as that is even easier. Just thought I would mention that amazing tool we have in our debugging arsenal.
> Regards,

Awesome, thanks for the suggestion!

Andre Goree
andre at drenet.info

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 551 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.archlinux.org/pipermail/arch-general/attachments/20130206/2112018b/attachment-0001.asc>

More information about the arch-general mailing list