[arch-general] kernel oopses when using modules
I posted[1] to the forums about this when I thought it was an nvidia problem, but now it seems to be more general. I recently upgraded to kernel26-2.6.23.9-1 from 2.6.23.8-1, and now I get oopses when certain modules are accessed. For example: BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000b printing eip: c016c4da *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ext2 w83627ehf hwmon_vid ipv6 ohci1394 ieee1394 firewire_ohci firewire_core crc_itu_t tsdev usbhid hid ff_memless usb_storage ide_core intel_agp agpgart ppp_generic sky2 sg evdev thermal processor fan button battery ac kqemu i2c_i801 i2c_dev i2c_core coretemp snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore slhc skge rtc ext3 jbd mbcache sd_mod sr_mod cdrom ehci_hcd uhci_hcd usbcore ahci ata_generic pata_jmicron libata CPU: 1 EIP: 0060:[<c016c4da>] Not tainted VLI EFLAGS: 00210206 (2.6.23-ARCH #1) EIP is at find_vma+0xa/0x70 eax: 00000003 ebx: af09d000 ecx: af09d000 edx: af09d000 esi: 00000003 edi: af09d000 ebp: f5d94000 esp: f490de1c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process qemu (pid: 7434, ti=f490c000 task=f5d94000 task.ti=f490c000) Stack: 00000114 af09d000 c016cc1d 00000114 f9590000 af09d000 c016b084 f5d94000 00000003 00000003 00000000 00000022 f9590000 f9590000 f9590000 00000114 f9590000 f9590000 00000002 f935345f 00000001 00000001 00000000 f490de80 Call Trace: [<c016cc1d>] find_extend_vma+0x1d/0x70 [<c016b084>] get_user_pages+0x44/0x2d0 [<f935345f>] kqemu_lock_user_page+0x3f/0x80 [kqemu] [<f93549d7>] mon_user_map+0xe7/0x110 [kqemu] [<f93552cb>] kqemu_init+0x7eb/0xe20 [kqemu] [<c016ade1>] handle_mm_fault+0x501/0x760 [<c016e051>] mmap_region+0x311/0x440 [<f93531b9>] kqemu_ioctl+0x109/0x120 [kqemu] [<c018ae58>] do_ioctl+0x78/0x90 [<c018b09e>] vfs_ioctl+0x22e/0x2b0 [<c018b17d>] sys_ioctl+0x5d/0x70 [<c0104482>] sysenter_past_esp+0x6b/0xa1 [<c0360000>] wait_for_completion+0x30/0xa0 ======================= Code: 00 89 d1 8b 50 20 39 ca 73 05 89 48 20 89 ca 8b 48 14 39 d1 73 03 89 48 20 f3 c3 8d b6 00 00 00 00 56 85 c0 53 89 c6 89 d3 74 51 <8b> 50 08 85 d2 74 05 39 5a 08 77 35 8b 4e 04 85 c9 74 3e 31 d2 EIP: [<c016c4da>] find_vma+0xa/0x70 SS:ESP 0068:f490de1c I don't want to waste the bandwidth, but I have another one very much like it for nvidia trying to run opengl stuff. The dmesg above is the kqemu module and qemu. I'd suspect hardware, but I don't have any other reason to, and everything was working fine before the upgrade. Also, it's only these two modules (so far) and the BUG always happens at the same place, which seems a little too deterministic to be heat issues. I'm not familiar enough with the kernel changelog to be able to narrow things down that way. Any help? -- Ryan W Sims
2007/12/11, Ryan Sims <rwsims@gmail.com>:
I posted[1] to the forums about this when I thought it was an nvidia problem, but now it seems to be more general. I recently upgraded to kernel26-2.6.23.9-1 from 2.6.23.8-1, and now I get oopses when certain modules are accessed. For example:
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000b printing eip: c016c4da *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ext2 w83627ehf hwmon_vid ipv6 ohci1394 ieee1394 firewire_ohci firewire_core crc_itu_t tsdev usbhid hid ff_memless usb_storage ide_core intel_agp agpgart ppp_generic sky2 sg evdev thermal processor fan button battery ac kqemu i2c_i801 i2c_dev i2c_core coretemp snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore slhc skge rtc ext3 jbd mbcache sd_mod sr_mod cdrom ehci_hcd uhci_hcd usbcore ahci ata_generic pata_jmicron libata CPU: 1 EIP: 0060:[<c016c4da>] Not tainted VLI EFLAGS: 00210206 (2.6.23-ARCH #1) EIP is at find_vma+0xa/0x70 eax: 00000003 ebx: af09d000 ecx: af09d000 edx: af09d000 esi: 00000003 edi: af09d000 ebp: f5d94000 esp: f490de1c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process qemu (pid: 7434, ti=f490c000 task=f5d94000 task.ti=f490c000) Stack: 00000114 af09d000 c016cc1d 00000114 f9590000 af09d000 c016b084 f5d94000 00000003 00000003 00000000 00000022 f9590000 f9590000 f9590000 00000114 f9590000 f9590000 00000002 f935345f 00000001 00000001 00000000 f490de80 Call Trace: [<c016cc1d>] find_extend_vma+0x1d/0x70 [<c016b084>] get_user_pages+0x44/0x2d0 [<f935345f>] kqemu_lock_user_page+0x3f/0x80 [kqemu] [<f93549d7>] mon_user_map+0xe7/0x110 [kqemu] [<f93552cb>] kqemu_init+0x7eb/0xe20 [kqemu] [<c016ade1>] handle_mm_fault+0x501/0x760 [<c016e051>] mmap_region+0x311/0x440 [<f93531b9>] kqemu_ioctl+0x109/0x120 [kqemu] [<c018ae58>] do_ioctl+0x78/0x90 [<c018b09e>] vfs_ioctl+0x22e/0x2b0 [<c018b17d>] sys_ioctl+0x5d/0x70 [<c0104482>] sysenter_past_esp+0x6b/0xa1 [<c0360000>] wait_for_completion+0x30/0xa0 ======================= Code: 00 89 d1 8b 50 20 39 ca 73 05 89 48 20 89 ca 8b 48 14 39 d1 73 03 89 48 20 f3 c3 8d b6 00 00 00 00 56 85 c0 53 89 c6 89 d3 74 51 <8b> 50 08 85 d2 74 05 39 5a 08 77 35 8b 4e 04 85 c9 74 3e 31 d2 EIP: [<c016c4da>] find_vma+0xa/0x70 SS:ESP 0068:f490de1c
I don't want to waste the bandwidth, but I have another one very much like it for nvidia trying to run opengl stuff. The dmesg above is the kqemu module and qemu. I'd suspect hardware, but I don't have any other reason to, and everything was working fine before the upgrade. Also, it's only these two modules (so far) and the BUG always happens at the same place, which seems a little too deterministic to be heat issues. I'm not familiar enough with the kernel changelog to be able to narrow things down that way. Any help?
Exactly this was a problem with virtualbox-modules (vboxdrv) All modules need to be rebuilt against .9 kernel. Make sure you have the latest drivers. It seems the problem is in kqemu (wasn't it rebuilt in our repos?). -- Roman Kyrylych (Роман Кирилич)
On Dec 11, 2007 5:07 AM, Roman Kyrylych <roman.kyrylych@gmail.com> wrote:
2007/12/11, Ryan Sims <rwsims@gmail.com>:
I posted[1] to the forums about this when I thought it was an nvidia problem, but now it seems to be more general. I recently upgraded to kernel26-2.6.23.9-1 from 2.6.23.8-1, and now I get oopses when certain modules are accessed. For example:
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000b printing eip: c016c4da *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ext2 w83627ehf hwmon_vid ipv6 ohci1394 ieee1394 firewire_ohci firewire_core crc_itu_t tsdev usbhid hid ff_memless usb_storage ide_core intel_agp agpgart ppp_generic sky2 sg evdev thermal processor fan button battery ac kqemu i2c_i801 i2c_dev i2c_core coretemp snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore slhc skge rtc ext3 jbd mbcache sd_mod sr_mod cdrom ehci_hcd uhci_hcd usbcore ahci ata_generic pata_jmicron libata CPU: 1 EIP: 0060:[<c016c4da>] Not tainted VLI EFLAGS: 00210206 (2.6.23-ARCH #1) EIP is at find_vma+0xa/0x70 eax: 00000003 ebx: af09d000 ecx: af09d000 edx: af09d000 esi: 00000003 edi: af09d000 ebp: f5d94000 esp: f490de1c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process qemu (pid: 7434, ti=f490c000 task=f5d94000 task.ti=f490c000) Stack: 00000114 af09d000 c016cc1d 00000114 f9590000 af09d000 c016b084 f5d94000 00000003 00000003 00000000 00000022 f9590000 f9590000 f9590000 00000114 f9590000 f9590000 00000002 f935345f 00000001 00000001 00000000 f490de80 Call Trace: [<c016cc1d>] find_extend_vma+0x1d/0x70 [<c016b084>] get_user_pages+0x44/0x2d0 [<f935345f>] kqemu_lock_user_page+0x3f/0x80 [kqemu] [<f93549d7>] mon_user_map+0xe7/0x110 [kqemu] [<f93552cb>] kqemu_init+0x7eb/0xe20 [kqemu] [<c016ade1>] handle_mm_fault+0x501/0x760 [<c016e051>] mmap_region+0x311/0x440 [<f93531b9>] kqemu_ioctl+0x109/0x120 [kqemu] [<c018ae58>] do_ioctl+0x78/0x90 [<c018b09e>] vfs_ioctl+0x22e/0x2b0 [<c018b17d>] sys_ioctl+0x5d/0x70 [<c0104482>] sysenter_past_esp+0x6b/0xa1 [<c0360000>] wait_for_completion+0x30/0xa0 ======================= Code: 00 89 d1 8b 50 20 39 ca 73 05 89 48 20 89 ca 8b 48 14 39 d1 73 03 89 48 20 f3 c3 8d b6 00 00 00 00 56 85 c0 53 89 c6 89 d3 74 51 <8b> 50 08 85 d2 74 05 39 5a 08 77 35 8b 4e 04 85 c9 74 3e 31 d2 EIP: [<c016c4da>] find_vma+0xa/0x70 SS:ESP 0068:f490de1c
I don't want to waste the bandwidth, but I have another one very much like it for nvidia trying to run opengl stuff. The dmesg above is the kqemu module and qemu. I'd suspect hardware, but I don't have any other reason to, and everything was working fine before the upgrade. Also, it's only these two modules (so far) and the BUG always happens at the same place, which seems a little too deterministic to be heat issues. I'm not familiar enough with the kernel changelog to be able to narrow things down that way. Any help?
Exactly this was a problem with virtualbox-modules (vboxdrv) All modules need to be rebuilt against .9 kernel. Make sure you have the latest drivers. It seems the problem is in kqemu (wasn't it rebuilt in our repos?).
-- Roman Kyrylych (Роман Кирилич)
Thanks for the response, but I found the problem. It's much simpler than that: pebkac. While tearing my hair out last night and getting ready to hack away at PKGBUILDs until 3, I found this in my pacman log: WARNING: /boot appears to be a seperate partition but is not mounted This is most likely not what you want. Please mount your /boot partition and reinstall the kernel unless you are sure this is OK And that's when I remembered marking /boot as "noauto" in fstab, and of *course* I didn't remember to remount before upgrading my kernel. Well, a couple of rescue cd boots and mkinitcpio hackery later, all is well again. Mea culpa, sorry for the static. -- Ryan W Sims
participants (2)
-
Roman Kyrylych
-
Ryan Sims