Bugzilla – Bug 609338
cpqphp: NULL ptr deref in cpqhpc_probe+0x951/0x1120
Last modified: 2010-06-25 08:45:57 UTC
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-GB; rv:1.9.2) Gecko/20100115 Firefox/3.6 I'm not really sure how to describe this - it's really just a bunch of odd udevd related error message from /var/log/boot/msg. This is a raw copy&paste from boot.msg: ----------- Waiting for device /dev/ida/c0d0p3 to appear: ok fsck from util-linux-ng 2.17.2 [/sbin/fsck.jfs (1) -- /] fsck.jfs -a /dev/ida/c0d0p3 fsck.jfs version 1.1.14, 06-Apr-2009 processing started: 5/24/2010 12.55.18 The current device is: /dev/ida/c0d0p3 Block size in bytes: 4096 Filesystem size in blocks: 6783446 **Phase 0 - Replay Journal Log Filesystem is clean. fsck succeeded. Mounting root device read-write. Mounting root /dev/ida/c0d0p3 mount -o rw,defaults -t jfs /dev/ida/c0d0p3 /root Boot logging started on /dev/char/../tty1(/dev/console) at Mon May 24 12:55:20 2010 <notice -- May 24 12:55:20.285299000> service boot.startpreload startdone <notice -- May 24 12:55:20.829120000> service boot.startpreload done<notice -- May 24 12:55:20.855788000> service boot.udev startok Starting udevd: udevd[397]: can not read '/etc/udev/rules.d/79-yast2-drivers.rules' done Loading drivers, configuring devices: udevd-work[405]: '/sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00' unexpected exit with status 0x0009 udevadm settle - timeout of 180 seconds reached, the event queue contains: /sys/devices/pci0000:00/0000:00:01.0 (943) /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 (944) /sys/devices/pci0000:00/0000:00:01.0/pci_bus/0000:01 (945) /sys/devices/pci0000:06/0000:06:0b.0 (971) /sys/devices/pci0000:0d/0000:0d:0b.0 (975) failed udevd[398]: worker [400] unexpectedly returned with status 0x0100 udevd[398]: worker [400] failed while handling '/devices/pci0000:00/0000:00:01.0' <notice -- May 24 12:58:22.244630000> service boot.udev done<notice -- May 24 12:58:22.245009000> service boot.loadmodules start<notice -- May 24 12:58:22.246576000> service boot.rootfsck startLoading required kernel modules done <notice -- May 24 12:58:22.289765000> service boot.loadmodules doneudevd[398]: worker [418] unexpectedly returned with status 0x0100 udevd[398]: worker [418] failed while handling '/devices/pci0000:06/0000:06:0b.0' udevd[398]: worker [407] unexpectedly returned with status 0x0100 udevd[398]: worker [407] failed while handling '/devices/pci0000:0d/0000:0d:0b.0' Activating swap-devices in /etc/fstab... done Reproducible: Always
Looks like a failure loading a kernel module. Maybe a driver for the graphics card? What card are you using? What does: /sbin/modprobe --first-time -n -v pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00 print?
This system has two cards, both 00:0d.0 ATI 3D Rage IIC 215IIC [Mach64 GT IIC]. One is separate, the other is part of HP Integrated Lights-Out card. I tried the modprobe, but it just hangs - I checked and found this: # ps axfl | grep modprobe 4 0 425 1 18 -2 2044 544 - D< ? 0:00 /sbin/modprobe -b pci:v00008086d00000960sv00000000sd00000000bc06sc04i00 0 0 474 1 18 -2 1996 544 - S< ? 3:47 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00 0 0 481 1 18 -2 1996 540 - S< ? 3:50 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F9bc08sc04i00 0 0 829 1 18 -2 1996 544 - S< ? 3:43 /sbin/modprobe -b pci:v00008086d00000960sv00000000sd00000000bc06sc04i00 0 0 830 1 18 -2 1996 540 - S< ? 3:49 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00 0 0 831 1 18 -2 1996 536 - S< ? 3:48 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F9bc08sc04i00 0 0 18800 3138 20 0 1996 580 - S+ pts/0 0:00 | \_ /sbin/modprobe --first-time -n -v pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00
Created attachment 365138 [details] output from psaxfl | grep modprobe
This looks very broken, modprobe -v should never hang, no idea how that is even possible. What does: strace -p 18800 print? (if you rebooted, replace the 18800 with the actual pid of "modprobe -n" that hangs) The issue sounds like a crashed kernel/kernel module/driver. Care to check what "dmesg" prints, and attach it if i there is something suspicious.
Created attachment 365172 [details] output of psaxfl right after boot up Shows several modprobes apparently hanging.
Created attachment 365173 [details] strace of modprobe Seems to indicate some interaction with a module 'cpqphp' - I think this is for compaq pci hotplug and I remember having seen an oops from that module at an earlier point. I'll attach some output later.
Created attachment 365174 [details] boot.msg which shows problems regarding cpqphp Just browse the text and look for cpqphp.
You might be able to bootup cleanly when you blacklist the module in a file in /etc/modprobe.d/? cpqphp crashes: [ 0.000000] Linux version 2.6.34-8-desktop (geeko@buildhost) ... [ 45.288057] cpqphp: Hot Plug Subsystem Device ID: a2f8 [ 45.288370] BUG: unable to handle kernel NULL pointer dereference at 00000050 [ 45.288764] IP: [<f82e3c41>] cpqhpc_probe+0x951/0x1120 [cpqphp] [ 45.289115] *pdpt = 0000000033779001 *pde = 0000000000000000 and leaves the kernel unreliable, and let later random modprobe hang. Passing bug to the kernel.
Have blacklisted cpqphp - this solves the problem for me. This is just a test-system, I don't actually need the PCI hotplug facility, but I can help with debugging of course.
0x11d1 is in cpqhpc_probe (drivers/pci/hotplug/cpqphp_core.c:695). 690 hotplug_slot->release = &release_slot; 691 hotplug_slot->private = slot; 692 snprintf(name, SLOT_NAME_SIZE, "%u", slot->number); 693 hotplug_slot->ops = &cpqphp_hotplug_slot_ops; 694 695 hotplug_slot_info->power_status = get_slot_enabled(ctrl, slot); 696 hotplug_slot_info->attention_status = 697 cpq_get_attention_status(ctrl, slot); 698 hotplug_slot_info->latch_status = 699 cpq_get_latch_status(ctrl, slot);
... or not. Strange that I'm running Factory and it didn't produce the same code. Using the actual debuginfo points to drivers/pci/hotplug/cpqphp_core.c:946 942 case PCI_SUB_HPC_ID2: 943 /* First Pushbutton implementation */ 944 ctrl->push_flag = 1; 945 ctrl->slot_switch_type = 1; 946 bus->max_bus_speed = PCI_SPEED_33MHz; 947 ctrl->push_button = 1; 948 ctrl->pci_config_space = 1; 949 ctrl->defeature_PHP = 1; ... which makes a lot more sense as &((struct pci_bus *)0)->max_bus_speed = 0x50 and PCI_SPEED_33MHZ is 0x0.
Could you attach lspci -vv and lspci -tv outputs?
Created attachment 366597 [details] output from lspci -vv
Created attachment 366598 [details] output from lspci -tv
Aha, the device is not a bridge, but the driver expects it to be a bridge. Could you attach lspci -nnvvxxxs 0000:00:0b.0? I'll escalate this to upstream.
Created attachment 366652 [details] lspci -nnvvxxxs 0000:00:0b.0
Could you try this kernel: http://labs.suse.cz/jslaby/bug-609338/kernel-default-2.6.34-10.i586.rpm ?
I installed it with rpm --upgrade, updated my bootloader and rebooted - on boot up, this gave me a console with lots of flashing colours, but not a running system.
(In reply to comment #19) > I installed it with rpm --upgrade, updated my bootloader and rebooted - on boot > up, this gave me a console with lots of flashing colours, but not a running > system. Hmm, weird. Could you try booting with vga=0 parameter?
Created attachment 369181 [details] serial console capture I booted with vga=0 and got a kernel panic.
(In reply to comment #21) > I booted with vga=0 and got a kernel panic. But this is the old kernel (it oopses in cpqphp). Could you try it with the one from labs.suse?
(In reply to comment #22) > But this is the old kernel (it oopses in cpqphp). Could you try it with the one > from labs.suse? (Note that you can have installed both and decide on bootup which one to run. Just do not do rpm -U, but rpm -i.)
Okay, something went really wrong here - that kernel is even older than the previous one; I was running on -9 before. I'll get back to you.
Created attachment 369198 [details] serial console capture My mistake - the minicom capture file is appended to, so I sent you everything last time. This time it should be really just the boot-up with the -10 kernel.
Hm, I don't understand that, could you try -desktop kernel from kotd: ftp://ftp.suse.com/pub/projects/kernel/kotd/master/i586/kernel-desktop.rpm if that (desktop vs. default) matters?
Created attachment 369283 [details] serial console capture Okay, have installed that kernel - it boots up to a point, but then appears to halt.
Cool: request_module: runaway loop modprobe binfmt-0000 It tries to load some non-elf binary needed for modprobe itself or alike. Maybe a corrupted initrd? Could you make the initrd public somewhere? If that fails, I will build a kernel which will print a module name which is tried to be loaded.
(In reply to comment #28) > request_module: runaway loop modprobe binfmt-0000 Is this a 32bit kernel trying to run a 64bit binary?
I rebuilt the initrd before booting, but I'll put it here: http://public.jessen.ch/files/kavanagh-initrd-10
(In reply to comment #29) > (In reply to comment #28) > > request_module: runaway loop modprobe binfmt-0000 > > Is this a 32bit kernel trying to run a 64bit binary? I wouldn't say so, it wouldn't be binfmt-0000 in that case, but combination of 0x4c='L' and 0x46='F' for ELF. Anyway, Per, could you test with this kernel: http://labs.suse.cz/jslaby/bug-609338/kernel-desktop-2.6.34-10.i586.rpm ?
Created attachment 369487 [details] serial console capture 5 Booting that kernel appears to make the system hang, but it still responded to a Ctrl-Alt-Delete and could do a normal shutdown. This is the console output.
(In reply to comment #32) > Created an attachment (id=369487) [details] > serial console capture 5 Weird all the binaries (/sbin/modprobe I guess) -- their first 256 bytes more accurately -- are zeros. The initrd: 1) isn't properly loaded by bootloader -- what loader you use and with what configuration 2) is wrongly copied from highmem -- could you check the new kernel at labs.suse? 3) is overwritten by somebody in the lowmem after it is copied. It gets copied to 16M, I'm building another test kernel with changed ramdisk recommended position in the header
(In reply to comment #33) > 1) isn't properly loaded by bootloader -- what loader you use and with what > configuration > 2) is wrongly copied from highmem -- could you check the new kernel at > labs.suse? > 3) is overwritten by somebody in the lowmem after it is copied. It gets copied > to 16M, I'm building another test kernel with changed ramdisk recommended > position in the header Actually no, neither of those makes sense to me. 1) and 2) can't be true, otherwise it won't be successfully decompressed. 3) can't be it since after RD is successfully decompressed, it doesn't matter what's there anymore. Now I see only 2 reasons why it may fail: 1) decompress method failed to unpack the files properly and stored zeros without noticing anything is wrong 2) read of the files from the initramfs returns zeros, no idea why
(In reply to comment #34) > 2) read of the files from the initramfs returns zeros, no idea why Jeff is this it? commit b636984aae0ee7599bdd82fef68b4c097bb3d3b7 Author: Jeff Mahoney <jeffm@suse.de> Date: Sun Jun 20 19:28:28 2010 -0400 - patches.suse/add-initramfs-file_read_write: Fix missing kmap calls while loading initramfs files.
In the meantime, could you Per try the kotd again?
It could be. I haven't seen it manifest in corruption before, only crashes.
(In reply to comment #37) > It could be. Ok, let's see, thanks.
(In reply to comment #36) > In the meantime, could you Per try the kotd again? I tried 2.6.34-10-default dated 2010-06-21 21:12:08 - that just gave me a kernel panic.
(In reply to comment #39) > (In reply to comment #36) > > In the meantime, could you Per try the kotd again? > > I tried 2.6.34-10-default dated 2010-06-21 21:12:08 - that just gave me a > kernel panic. Could you tell me SHA and branch of the commit from which this kernel was built? rpm -qi kernel-default|grep GIT should do the trick.
(In reply to comment #39) > that just gave me a kernel panic. And also if the panic differs from the previous, it would be helpful to have it here.
Created attachment 371504 [details] serial console capture 6
There was a typo in the kmap fix that appears to be causing a different problem now. That's also been fixed in the repo.
> > I tried 2.6.34-10-default dated 2010-06-21 21:12:08 - that just gave me a > > kernel panic. > > Could you tell me SHA and branch of the commit from which this kernel was > built? > rpm -qi kernel-default|grep GIT > should do the trick. I got it wrong - what I installed was: ftp://ftp.suse.com/pub/projects/kernel/kotd/master/i586/kernel-desktop.rpm rpm -qi kernel-desktop | grep GIT GIT Revision: 5d1bde19645c8ade295b7e2c67b7af0e19e452c8 GIT Branch: master GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1 GIT Branch: master GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1 GIT Branch: master Looks like it is 2.6.35-0
Created attachment 371518 [details] dmesg output after a boot-up with 2.6.34-9 I noticed a few more backtraces when I booted up with 2.6.34-9 again. ALso messages such as: BUG: soft lockup - CPU#5 stuck for 122s! I don't know if these are related to this issue, but ...
(In reply to comment #44) > rpm -qi kernel-desktop | grep GIT > GIT Revision: 5d1bde19645c8ade295b7e2c67b7af0e19e452c8 > GIT Branch: master > GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1 > GIT Branch: master > GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1 > GIT Branch: master I'm quite confused now. How you can have 2 kernels with same rev id? Anyway those are too old and don't contain all the fixes. You should test a kernel which has the following line in changelog (rpm -q --changelog): patches.suse/add-initramfs-file_read_write: Fixed typo. from jeffm@suse.de. What kernels do you have installed right now, please append output of: rpm -q `rpmqpack|grep kernel`
rpmqpack|grep kernel kernel-xen-devel kernel-desktop kernel-devel kernel-default-devel kernel-pae-devel kernel-source kernel-default kernel-default-base kernel-syms kernel-desktop-devel patterns-openSUSE-devel_kernel I'll try to get hold of a kernel with the typo fixed.
This is the same initramfs DSDT bug that's been reported several times. *** This bug has been marked as a duplicate of bug 616940 ***
Update: This morning, I installed kotd 2.6.35-rc3, which failed. I then proceeded to install 2.6.34-11-default which worked, including the cpqphp module.
(In reply to comment #49) > Update: This morning, I installed kotd 2.6.35-rc3, which failed. I then > proceeded to install 2.6.34-11-default which worked, including the cpqphp > module. Thanks, good to know. I believe that master kernel (2.6.35) will be fixed as soon as it will be built with the fix.