On Fri, Sep 11, 2009 at 7:09 AM, Dan McGee <dpmcgee@gmail.com> wrote:
On Thu, Sep 10, 2009 at 6:33 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Thu, Sep 10, 2009 at 10:38 AM, Tobias Powalowski <t.powa@gmx.de> wrote:
Hi guys, kernel 2.6.31 first test run ...
Looking decent here. Noticed a few things:
* new dmesg messages, not sure if they are of concern or not:
ACPI: CPU0 (power states: C1[C1] C2[C2]) processor LNXCPU:00: registered as cooling_device0 ACPI: Processor [CPU0] (supports 8 throttling states) ACPI: SSDT 00000000cfee8a00 00152 (v01 PmRef Cpu1Ist 00003000 INTL 20040311) ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU1._PDC] (Node ffff88022f81e120), AE_ALREADY_EXISTS ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error ACPI: CPU1 (power states: C1[C1] C2[C2]) processor LNXCPU:01: registered as cooling_device1 ACPI: Processor [CPU1] (supports 8 throttling states) ACPI: SSDT 00000000cfee8b60 00152 (v01 PmRef Cpu2Ist 00003000 INTL 20040311) ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU2._PDC] (Node ffff88022f81e1a0), AE_ALREADY_EXISTS ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error ACPI: CPU2 (power states: C1[C1] C2[C2]) processor LNXCPU:02: registered as cooling_device2 ACPI: Processor [CPU2] (supports 8 throttling states) ACPI: SSDT 00000000cfee8cc0 00152 (v01 PmRef Cpu3Ist 00003000 INTL 20040311) ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU3._PDC] (Node ffff88022f81e220), AE_ALREADY_EXISTS ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error ACPI: CPU3 (power states: C1[C1] C2[C2]) processor LNXCPU:03: registered as cooling_device3 ACPI: Processor [CPU3] (supports 8 throttling states)
* When /etc/rc.d/microcode/ ran in my daemons, it spit out a "/etc/rc.d/microcode: /dev/cpu/microcode not a character device" message. Interestingly enough it still looks like it ran the microcode update as there were messages in dmesg. However, if I run it now it is just fine (and that device does exist). Race condition somewhere?
Failboat when I woke up this morning. Machine (X) was completely unresponsive, and I ssh-ed in and a bunch of things were all jacked up. Grabbed something useful out of dmesg though:
[drm] wait for fifo failed status : 0xE57004A4 0x00FF0F02 <Above message was in there 240 times> BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0569131>] radeon_read_ring_rptr+0x31/0x70 [radeon] PGD 2112ea067 PUD 2112a6067 PMD 0 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/scsi_level CPU 0 Modules linked in: radeon drm nfs lockd fscache nfs_acl auth_rpcgss sunrpc coretemp cpufreq_ondemand it87 hwmon_vid ipv6 ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables microcode ext3 jbd usbhid hid snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_hda_codec_atihdmi snd_hda_codec_realtek uhci_hcd snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc ohci1394 ieee1394 ehci_hcd usbcore i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support r8169 mii intel_agp evdev thermal fan button battery ac acpi_cpufreq freq_table processor rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 raid1 md_mod sr_mod cdrom sd_mod ata_generic ahci pata_jmicron pata_acpi libata scsi_mod Pid: 3694, comm: X Not tainted 2.6.31-ARCH #1 EP45-DS3R RIP: 0010:[<ffffffffa0569131>] [<ffffffffa0569131>] radeon_read_ring_rptr+0x31/0x70 [radeon] RSP: 0018:ffff88022b597b98 EFLAGS: 00010246 RAX: ffff88022b7ba180 RBX: ffff88022e681800 RCX: 000000000000002c RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88022e681800 RBP: 0000000000000010 R08: 00000000ffffffff R09: 000014f4e884a645 R10: 0000000000000001 R11: ffff880028047958 R12: 0000000000000008 R13: ffff88022c08aa30 R14: ffff88022ea14900 R15: 0000000000000000 FS: 00007fda900666f0(0000) GS:ffff880028034000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000211188000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process X (pid: 3694, threadinfo ffff88022b596000, task ffff88022e96dcc0) Stack: ffff88022c08aa30 00000000643c49d9 ffff88022e0a6f00 ffffffffa0569c43 <0> 00000000ffffffff 00000000643c49d9 ffff88022e681800 ffffffffa057e470 <0> ffff88022e685000 00000000643c49d9 ffff88022c08aa30 ffff88022e681800 Call Trace: [<ffffffffa0569c43>] ? radeon_commit_ring+0x63/0xe0 [radeon] [<ffffffffa057e470>] ? r600_do_cp_idle+0xd0/0x140 [radeon] [<ffffffffa056d6a6>] ? radeon_do_release+0x76/0x240 [radeon] [<ffffffffa053d4b1>] ? drm_lastclose+0x51/0x330 [drm] [<ffffffff81120fc5>] ? __fput+0xe5/0x240 [<ffffffff8111caa7>] ? filp_close+0x67/0xb0 [<ffffffff8105bd75>] ? put_files_struct+0x85/0x120 [<ffffffff8105da9c>] ? do_exit+0x16c/0x7d0 [<ffffffff8104d4b0>] ? finish_task_switch+0x180/0x190 [<ffffffff8105e156>] ? do_group_exit+0x56/0xd0 [<ffffffff8106d641>] ? get_signal_to_deliver+0x2a1/0x470 [<ffffffff8100b793>] ? do_notify_resume+0x123/0x830 [<ffffffff811316f9>] ? vfs_ioctl+0xa9/0xd0 [<ffffffff81131880>] ? do_vfs_ioctl+0xa0/0x5a0 [<ffffffff8101843e>] ? restore_i387_xstate+0x18e/0x1f0 [<ffffffff8100c47b>] ? sysret_signal+0x7e/0xcf Code: 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 f6 87 d6 03 00 00 08 75 33 48 8b 87 10 01 00 00 c1 ee 02 89 f6 48 c1 e6 02 48 03 70 18 <8b> 06 48 8b 54 24 08 65 48 33 14 25 28 00 00 00 75 1e 48 83 c4 RIP [<ffffffffa0569131>] radeon_read_ring_rptr+0x31/0x70 [radeon] RSP <ffff88022b597b98> CR2: 0000000000000000 ---[ end trace 1cad1c27957ccafb ]--- Fixing recursive fault but reboot is needed!
No binary modules, no taint. Haven't searched around yet to see if anyone else is seeing this.
-Dan
Found my oops at kerneloops, but no idea where to go with it: http://www.kerneloops.org/guilty.php?guilty=radeon_read_ring_rptr&version=2.6.31-release&start=2064384&end=2097151&class=oops