[arch-general] Troubleshooting random crash

Andre Goree andre at drenet.info
Wed Feb 6 21:24:41 EST 2013


On 02/06/13 20:14, Gaetan Bisson wrote:
> [2013-02-06 19:06:45 -0500] Andre Goree:
>> Not really too keen on downgrading a bunch of packages that might break
>> dependencies and provide a REAL mess.  If I have to go through that long
>> process, I'd rather just reinstall -- which at this point I'm planning
>> to do anyways.
> 
> Well, there is little point in posting to this list if you have no
> motivation to actually investigate the problem.
> 
> For starters, you've upgraded Linux from 3.6.11 to 3.7.4 in the window
> when you report the issue appeared; from the symptoms you described,
> it's a likely suspect. Downgrading it is far from being a "REAL mess":
> you only need to downgrade/rebuild the external modules you really need
> (probably none).

Indeed there isn't, and surely even less point in replying to said post
if in fact I had no motivation.  Given that I'm replying, I'd probably
like to avoid reinstalling if at all possible.  I like the idea of
downgrading just the kernel -- obviously I mean downgrading every
package I've upgraded since 1/21 was not something I wanted to
undertake.  I'll try this tomorrow.

>> In fact it seems
>> all system processes hang because no logs are produced after the issue
>> rears it's ugly head.
> 
> Ah. So that would mean your issue is I/O related, then?

It would seem so, yes.  I hinted to this at the end of my last reply as
well.

> 
>>> So you produce nothing at work?
>>
>> Not sure if you're just being an ass or not, however if you aren't:
>> that has nothing at all to do with the issue and I merely wanted to
>> establish _why_ I was using btrfs on a machine that I have running at my
>> job -- which is _also_ inconsequential in the context of my email.  If
>> you indeed were being an ass, congrats, you succeeded.
> 
> Once you were done being offended, you could have looked for the meaning
> behind the words I used: that your "main work desktop" really qualifies
> as "a production server".
> 
> But, of course, as you have so unequivocally declared, btrfs has
> absolutely "nothing at all to do with the issue". And your statement
> above implying that the problem is I/O related is just a coincidence.
> 

I think you mis-comprehended my reply.  Following the context, I merely
meant that distinguishing my system from a production server and
explaining why I was running btrfs on this system was inconsequential to
the issue at hand.  Which is still true.  I never said nor meant it to
be understood that I believed btrfs not to be the problem.  In fact, the
opposite is true.

So, for the sake of clarity, I never declared (and certainly not
unequivocally) "btrfs has absolutely nothing at all to do with the
issue", but rather, my distinctions and reasons for running btrfs have
nothing to do with the issue.  Not sure how you got that mixed up,
especially given the later part of my reply.


> Reporting issues is worthless when speculation is substituted for hard
> data. For example, a good report would have gone: "I believe this issue
> is unrelated to btrfs being my root filesystem since, on another Arch
> machine running ext3, I observe the following identical symptoms: first,
> `ssh -vvv` hangs at exactly the same point; second..."


I'll be sure to raise my reporting standards the next time I'd like help
from an Arch list, my apologies.

>>> How about looking at the system logs to see what your system was up to
>>> just before a crash? 
>>
>> I've done that, with no real hints.  That's the first thing any linux
>> admin does when confronted with an issue such as this, no?
> 
> Sure. But your first post gave no indication that you did that.

Indeed, I need to raise my reporting standards, I figured a lot of stuff
was implied but I now know I must be much clearer.  Again, my apologies.


>> Is there
>> perhaps a way to build Thunderbird with debug symbols or some kind of
>> logging?  I seem to recall opening Thunderbird each time this issue has
>> showed up.
> 
> Well it would be nice to confirm that it is indeed at fault; downgrading
> it is certainly not a "REAL mess" either. You can certainly also build
> it with debug symbols: in the PKGBUILD (or makepkg.conf), set
> CXXFLAGS='' LDFLAGS='' CFLAGS='-g' and remove the strip option.


Given that thunderbird wasn't upgraded in the time that this issue
began, not sure a downgrade would help but it may be worth a shot.
Thanks for the pointers on building with debug symbols.


>> I'm ready
>> to blame btrfs b/c that's the only issue I see with my setup -- I also
>> have a tough time running a virtual machine on this box which I believe
>> is also due to btrfs.
> 
> Didn't you write just a few lines ago that btrfs "has nothing at all to
> do with the issue"?

I most certainly did not, there's an obvious misunderstanding here.

> 
> Wild guess: your thunderbird mail database is huge (just like the disk
> image of your virtual machine - although I cannot really know what you
> mean by "tough time") and your btrfs has problems dealing with such big
> files (for instance, because your filesystem nearly full). To confirm,
> start thunderbird with an empty profile (such as by renaming ~/.mozilla
> into ~/.mozilla.old) and see what happens.
> 

The thing is, this doesn't happen everytime I start thunderbird --
rather, seemingly, after the system has been up for a long period (>20
hrs or so).  The filesystem is not nearly full either, though it does
contain a lot of data.  I'm thinking downgrading to 3.6.x will help a
bit.  I'm going to look for btrfs bugs in 3.7.x and see if anyone else
has been having a similar issue as well.  Thanks for the assistance.

-- 
Andre Goree
andre at drenet.info

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 551 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.archlinux.org/pipermail/arch-general/attachments/20130206/0f2067a0/attachment.asc>


More information about the arch-general mailing list