Bug 609338 - cpqphp: NULL ptr deref in cpqhpc_probe+0x951/0x1120
cpqphp: NULL ptr deref in cpqhpc_probe+0x951/0x1120
Status: RESOLVED DUPLICATE of bug 616940
Classification: openSUSE
Product: openSUSE 11.3
Classification: openSUSE
Component: Basesystem
Milestone 7
Other Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: Jiri Slaby
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-27 07:19 UTC by Per Jessen
Modified: 2010-06-25 08:45 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
output from psaxfl | grep modprobe (1.12 KB, text/plain)
2010-05-27 11:34 UTC, Per Jessen
Details
output of psaxfl right after boot up (14.52 KB, text/plain)
2010-05-27 13:24 UTC, Per Jessen
Details
strace of modprobe (1.20 MB, text/plain)
2010-05-27 13:28 UTC, Per Jessen
Details
boot.msg which shows problems regarding cpqphp (34.50 KB, application/zip)
2010-05-27 13:32 UTC, Per Jessen
Details
output from lspci -vv (10.97 KB, text/plain)
2010-06-03 06:26 UTC, Per Jessen
Details
output from lspci -tv (1.29 KB, text/plain)
2010-06-03 06:27 UTC, Per Jessen
Details
lspci -nnvvxxxs 0000:00:0b.0 (1.35 KB, text/plain)
2010-06-03 09:26 UTC, Per Jessen
Details
serial console capture (304.68 KB, text/plain)
2010-06-15 08:40 UTC, Per Jessen
Details
serial console capture (14.87 KB, text/plain)
2010-06-15 09:52 UTC, Per Jessen
Details
serial console capture (21.88 KB, text/plain)
2010-06-15 15:05 UTC, Per Jessen
Details
serial console capture 5 (819.24 KB, text/plain)
2010-06-16 10:10 UTC, Per Jessen
Details
serial console capture 6 (14.73 KB, text/plain)
2010-06-24 13:47 UTC, Per Jessen
Details
dmesg output after a boot-up with 2.6.34-9 (245.55 KB, text/plain)
2010-06-24 14:18 UTC, Per Jessen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Per Jessen 2010-05-27 07:19:58 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-GB; rv:1.9.2) Gecko/20100115 Firefox/3.6

I'm not really sure how to describe this - it's really just a bunch of odd udevd related error message from /var/log/boot/msg. This is a raw copy&paste from boot.msg:  

-----------
Waiting for device /dev/ida/c0d0p3 to appear:  ok
fsck from util-linux-ng 2.17.2
[/sbin/fsck.jfs (1) -- /] fsck.jfs -a /dev/ida/c0d0p3 
fsck.jfs version 1.1.14, 06-Apr-2009
processing started: 5/24/2010 12.55.18
The current device is:  /dev/ida/c0d0p3
Block size in bytes:  4096
Filesystem size in blocks:  6783446
**Phase 0 - Replay Journal Log
Filesystem is clean.
fsck succeeded. Mounting root device read-write.
Mounting root /dev/ida/c0d0p3
mount -o rw,defaults -t jfs /dev/ida/c0d0p3 /root

Boot logging started on /dev/char/../tty1(/dev/console) at Mon May 24 12:55:20 2010

<notice -- May 24 12:55:20.285299000> service boot.startpreload startdone
<notice -- May 24 12:55:20.829120000> service boot.startpreload done<notice -- May 24 12:55:20.855788000> service boot.udev startok
Starting udevd: udevd[397]: can not read '/etc/udev/rules.d/79-yast2-drivers.rules'


done
Loading drivers, configuring devices: udevd-work[405]: '/sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00' unexpected exit with status 0x0009



udevadm settle - timeout of 180 seconds reached, the event queue contains:
  /sys/devices/pci0000:00/0000:00:01.0 (943)
  /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 (944)
  /sys/devices/pci0000:00/0000:00:01.0/pci_bus/0000:01 (945)
  /sys/devices/pci0000:06/0000:06:0b.0 (971)
  /sys/devices/pci0000:0d/0000:0d:0b.0 (975)
failed
udevd[398]: worker [400] unexpectedly returned with status 0x0100


udevd[398]: worker [400] failed while handling '/devices/pci0000:00/0000:00:01.0'


<notice -- May 24 12:58:22.244630000> service boot.udev done<notice -- May 24 12:58:22.245009000> service boot.loadmodules start<notice -- May 24 12:58:22.246576000> service boot.rootfsck startLoading required kernel modules
done
<notice -- May 24 12:58:22.289765000> service boot.loadmodules doneudevd[398]: worker [418] unexpectedly returned with status 0x0100
udevd[398]: worker [418] failed while handling '/devices/pci0000:06/0000:06:0b.0'


udevd[398]: worker [407] unexpectedly returned with status 0x0100


udevd[398]: worker [407] failed while handling '/devices/pci0000:0d/0000:0d:0b.0'


Activating swap-devices in /etc/fstab...
done


Reproducible: Always
Comment 1 Kay Sievers 2010-05-27 10:49:12 UTC
Looks like a failure loading a kernel module.

Maybe a driver for the graphics card? What card are you using?

What does:
  /sbin/modprobe --first-time -n -v pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00

print?
Comment 2 Per Jessen 2010-05-27 11:31:17 UTC
This system has two cards, both 00:0d.0 ATI 3D Rage IIC 215IIC [Mach64 GT IIC]. One is separate, the other is part of HP Integrated Lights-Out card.
I tried the modprobe, but it just hangs - I checked and found this:

# ps axfl | grep modprobe
4     0   425     1  18  -2   2044   544 -      D<   ?          0:00 /sbin/modprobe -b pci:v00008086d00000960sv00000000sd00000000bc06sc04i00
0     0   474     1  18  -2   1996   544 -      S<   ?          3:47 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00
0     0   481     1  18  -2   1996   540 -      S<   ?          3:50 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F9bc08sc04i00
0     0   829     1  18  -2   1996   544 -      S<   ?          3:43 /sbin/modprobe -b pci:v00008086d00000960sv00000000sd00000000bc06sc04i00
0     0   830     1  18  -2   1996   540 -      S<   ?          3:49 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00
0     0   831     1  18  -2   1996   536 -      S<   ?          3:48 /sbin/modprobe -b pci:v00000E11d0000A0F7sv00000E11sd0000A2F9bc08sc04i00
0     0 18800  3138  20   0   1996   580 -      S+   pts/0      0:00  |           \_ /sbin/modprobe --first-time -n -v pci:v00000E11d0000A0F7sv00000E11sd0000A2F8bc08sc04i00
Comment 3 Per Jessen 2010-05-27 11:34:32 UTC
Created attachment 365138 [details]
output from psaxfl | grep modprobe
Comment 4 Kay Sievers 2010-05-27 12:11:27 UTC
This looks very broken, modprobe -v should never hang, no idea how that is even possible. What does:
  strace -p 18800
print? (if you rebooted, replace the 18800 with the actual pid of "modprobe -n" that hangs)

The issue sounds like a crashed kernel/kernel module/driver. Care to check what "dmesg" prints, and attach it if i there is something suspicious.
Comment 5 Per Jessen 2010-05-27 13:24:16 UTC
Created attachment 365172 [details]
output of psaxfl right after boot up

Shows several modprobes apparently hanging.
Comment 6 Per Jessen 2010-05-27 13:28:05 UTC
Created attachment 365173 [details]
strace of modprobe 

Seems to indicate some interaction with a module 'cpqphp' - I think this is for compaq pci hotplug and I remember having seen an oops from that module at an earlier point. I'll attach some output later.
Comment 7 Per Jessen 2010-05-27 13:32:04 UTC
Created attachment 365174 [details]
boot.msg which shows problems regarding cpqphp

Just browse the text and look for cpqphp.
Comment 8 Kay Sievers 2010-05-27 14:23:19 UTC
You might be able to bootup cleanly when you blacklist the module in a file in /etc/modprobe.d/?

cpqphp crashes:

[    0.000000] Linux version 2.6.34-8-desktop (geeko@buildhost)
...
[   45.288057] cpqphp: Hot Plug Subsystem Device ID: a2f8
[   45.288370] BUG: unable to handle kernel NULL pointer dereference at 00000050
[   45.288764] IP: [<f82e3c41>] cpqhpc_probe+0x951/0x1120 [cpqphp]
[   45.289115] *pdpt = 0000000033779001 *pde = 0000000000000000

and leaves the kernel unreliable, and let later random modprobe hang.

Passing bug to the kernel.
Comment 9 Per Jessen 2010-05-27 16:22:05 UTC
Have blacklisted cpqphp - this solves the problem for me. This is just a test-system, I don't actually need the PCI hotplug facility, but I can help with debugging of course.
Comment 10 Jeff Mahoney 2010-05-28 13:53:38 UTC
0x11d1 is in cpqhpc_probe (drivers/pci/hotplug/cpqphp_core.c:695).
690			hotplug_slot->release = &release_slot;
691			hotplug_slot->private = slot;
692			snprintf(name, SLOT_NAME_SIZE, "%u", slot->number);
693			hotplug_slot->ops = &cpqphp_hotplug_slot_ops;
694	
695			hotplug_slot_info->power_status = get_slot_enabled(ctrl, slot);
696			hotplug_slot_info->attention_status =
697				cpq_get_attention_status(ctrl, slot);
698			hotplug_slot_info->latch_status =
699				cpq_get_latch_status(ctrl, slot);
Comment 11 Jeff Mahoney 2010-05-28 14:18:41 UTC
... or not. Strange that I'm running Factory and it didn't produce the same code.

Using the actual debuginfo points to 
drivers/pci/hotplug/cpqphp_core.c:946
 942                 case PCI_SUB_HPC_ID2:
 943                         /* First Pushbutton implementation */
 944                         ctrl->push_flag = 1;
 945                         ctrl->slot_switch_type = 1;
 946                         bus->max_bus_speed = PCI_SPEED_33MHz;
 947                         ctrl->push_button = 1;
 948                         ctrl->pci_config_space = 1;
 949                         ctrl->defeature_PHP = 1;

... which makes a lot more sense as &((struct pci_bus *)0)->max_bus_speed = 0x50 and PCI_SPEED_33MHZ is 0x0.
Comment 13 Jiri Slaby 2010-06-02 17:48:49 UTC
Could you attach lspci -vv and lspci -tv outputs?
Comment 14 Per Jessen 2010-06-03 06:26:47 UTC
Created attachment 366597 [details]
output from lspci -vv
Comment 15 Per Jessen 2010-06-03 06:27:15 UTC
Created attachment 366598 [details]
output from lspci -tv
Comment 16 Jiri Slaby 2010-06-03 09:18:22 UTC
Aha, the device is not a bridge, but the driver expects it to be a bridge.

Could you attach lspci -nnvvxxxs 0000:00:0b.0?

I'll escalate this to upstream.
Comment 17 Per Jessen 2010-06-03 09:26:52 UTC
Created attachment 366652 [details]
lspci -nnvvxxxs 0000:00:0b.0
Comment 18 Jiri Slaby 2010-06-14 14:46:28 UTC
Could you try this kernel:
http://labs.suse.cz/jslaby/bug-609338/kernel-default-2.6.34-10.i586.rpm
?
Comment 19 Per Jessen 2010-06-14 19:21:32 UTC
I installed it with rpm --upgrade, updated my bootloader and rebooted - on boot up, this gave me a console with lots of flashing colours, but not a running system.
Comment 20 Jiri Slaby 2010-06-15 08:05:24 UTC
(In reply to comment #19)
> I installed it with rpm --upgrade, updated my bootloader and rebooted - on boot
> up, this gave me a console with lots of flashing colours, but not a running
> system.

Hmm, weird. Could you try booting with vga=0 parameter?
Comment 21 Per Jessen 2010-06-15 08:40:15 UTC
Created attachment 369181 [details]
serial console capture

I booted with vga=0 and got a kernel panic.
Comment 22 Jiri Slaby 2010-06-15 08:55:49 UTC
(In reply to comment #21)
> I booted with vga=0 and got a kernel panic.

But this is the old kernel (it oopses in cpqphp). Could you try it with the one from labs.suse?
Comment 23 Jiri Slaby 2010-06-15 08:56:47 UTC
(In reply to comment #22)
> But this is the old kernel (it oopses in cpqphp). Could you try it with the one
> from labs.suse?

(Note that you can have installed both and decide on bootup which one to run. Just do not do rpm -U, but rpm -i.)
Comment 24 Per Jessen 2010-06-15 09:11:20 UTC
Okay, something went really wrong here - that kernel is even older than the previous one; I was running on -9 before.  I'll get back to you.
Comment 25 Per Jessen 2010-06-15 09:52:16 UTC
Created attachment 369198 [details]
serial console capture

My mistake - the minicom capture file is appended to, so I sent you everything last time. This time it should be really just the boot-up with the -10 kernel.
Comment 26 Jiri Slaby 2010-06-15 10:29:36 UTC
Hm, I don't understand that, could you try -desktop kernel from kotd:
ftp://ftp.suse.com/pub/projects/kernel/kotd/master/i586/kernel-desktop.rpm
if that (desktop vs. default) matters?
Comment 27 Per Jessen 2010-06-15 15:05:02 UTC
Created attachment 369283 [details]
serial console capture

Okay, have installed that kernel - it boots up to a point, but then appears to halt.
Comment 28 Jiri Slaby 2010-06-15 15:48:26 UTC
Cool:
request_module: runaway loop modprobe binfmt-0000

It tries to load some non-elf binary needed for modprobe itself or alike. Maybe a corrupted initrd? Could you make the initrd public somewhere? If that fails, I will build a kernel which will print a module name which is tried to be loaded.
Comment 29 Kay Sievers 2010-06-15 16:09:21 UTC
(In reply to comment #28)
> request_module: runaway loop modprobe binfmt-0000

Is this a 32bit kernel trying to run a 64bit binary?
Comment 30 Per Jessen 2010-06-15 16:11:39 UTC
I rebuilt the initrd before booting, but I'll put it here:

http://public.jessen.ch/files/kavanagh-initrd-10
Comment 31 Jiri Slaby 2010-06-15 19:18:32 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > request_module: runaway loop modprobe binfmt-0000
> 
> Is this a 32bit kernel trying to run a 64bit binary?

I wouldn't say so, it wouldn't be binfmt-0000 in that case, but combination of 0x4c='L' and 0x46='F' for ELF. Anyway, Per, could you test with this kernel:
http://labs.suse.cz/jslaby/bug-609338/kernel-desktop-2.6.34-10.i586.rpm
?
Comment 32 Per Jessen 2010-06-16 10:10:05 UTC
Created attachment 369487 [details]
serial console capture 5

Booting that kernel appears to make the system hang, but it still responded to a Ctrl-Alt-Delete and could do a normal shutdown.  This is the console output.
Comment 33 Jiri Slaby 2010-06-16 11:51:26 UTC
(In reply to comment #32)
> Created an attachment (id=369487) [details]
> serial console capture 5

Weird all the binaries (/sbin/modprobe I guess) -- their first 256 bytes more accurately -- are zeros. The initrd:
1) isn't properly loaded by bootloader -- what loader you use and with what configuration
2) is wrongly copied from highmem -- could you check the new kernel at labs.suse?
3) is overwritten by somebody in the lowmem after it is copied. It gets copied to 16M, I'm building another test kernel with changed ramdisk recommended position in the header
Comment 34 Jiri Slaby 2010-06-16 12:27:57 UTC
(In reply to comment #33)
> 1) isn't properly loaded by bootloader -- what loader you use and with what
> configuration
> 2) is wrongly copied from highmem -- could you check the new kernel at
> labs.suse?
> 3) is overwritten by somebody in the lowmem after it is copied. It gets copied
> to 16M, I'm building another test kernel with changed ramdisk recommended
> position in the header

Actually no, neither of those makes sense to me. 1) and 2) can't be true, otherwise it won't be successfully decompressed. 3) can't be it since after RD is successfully decompressed, it doesn't matter what's there anymore.

Now I see only 2 reasons why it may fail:
1) decompress method failed to unpack the files properly and stored zeros without noticing anything is wrong
2) read of the files from the initramfs returns zeros, no idea why
Comment 35 Jiri Slaby 2010-06-21 12:59:23 UTC
(In reply to comment #34)
> 2) read of the files from the initramfs returns zeros, no idea why

Jeff is this it?
commit b636984aae0ee7599bdd82fef68b4c097bb3d3b7
Author: Jeff Mahoney <jeffm@suse.de>
Date:   Sun Jun 20 19:28:28 2010 -0400

    - patches.suse/add-initramfs-file_read_write: Fix missing kmap calls
      while loading initramfs files.
Comment 36 Jiri Slaby 2010-06-22 12:58:54 UTC
In the meantime, could you Per try the kotd again?
Comment 37 Jeff Mahoney 2010-06-22 23:31:17 UTC
It could be. I haven't seen it manifest in corruption before, only crashes.
Comment 38 Jiri Slaby 2010-06-23 06:59:27 UTC
(In reply to comment #37)
> It could be.

Ok, let's see, thanks.
Comment 39 Per Jessen 2010-06-24 12:30:50 UTC
(In reply to comment #36)
> In the meantime, could you Per try the kotd again?

I tried 2.6.34-10-default dated 2010-06-21 21:12:08 - that just gave me a kernel panic.
Comment 40 Jiri Slaby 2010-06-24 12:59:15 UTC
(In reply to comment #39)
> (In reply to comment #36)
> > In the meantime, could you Per try the kotd again?
> 
> I tried 2.6.34-10-default dated 2010-06-21 21:12:08 - that just gave me a
> kernel panic.

Could you tell me SHA and branch of the commit from which this kernel was built?
rpm -qi kernel-default|grep GIT
should do the trick.
Comment 41 Jiri Slaby 2010-06-24 13:15:09 UTC
(In reply to comment #39)
> that just gave me a kernel panic.

And also if the panic differs from the previous, it would be helpful to have it here.
Comment 42 Per Jessen 2010-06-24 13:47:11 UTC
Created attachment 371504 [details]
serial console capture 6
Comment 43 Jeff Mahoney 2010-06-24 14:05:00 UTC
There was a typo in the kmap fix that appears to be causing a different problem now. That's also been fixed in the repo.
Comment 44 Per Jessen 2010-06-24 14:12:52 UTC
> > I tried 2.6.34-10-default dated 2010-06-21 21:12:08 - that just gave me a
> > kernel panic.
> 
> Could you tell me SHA and branch of the commit from which this kernel was
> built?
> rpm -qi kernel-default|grep GIT
> should do the trick.

I got it wrong - what I installed was:

ftp://ftp.suse.com/pub/projects/kernel/kotd/master/i586/kernel-desktop.rpm

rpm -qi kernel-desktop | grep GIT
GIT Revision: 5d1bde19645c8ade295b7e2c67b7af0e19e452c8
GIT Branch: master
GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1
GIT Branch: master
GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1
GIT Branch: master

Looks like it is 2.6.35-0
Comment 45 Per Jessen 2010-06-24 14:18:34 UTC
Created attachment 371518 [details]
dmesg output after a boot-up with 2.6.34-9

I noticed a few more backtraces when I booted up with 2.6.34-9 again. ALso messages such as:  BUG: soft lockup - CPU#5 stuck for 122s!

I don't know if these are related to this issue, but ...
Comment 46 Jiri Slaby 2010-06-24 14:43:24 UTC
(In reply to comment #44)
> rpm -qi kernel-desktop | grep GIT
> GIT Revision: 5d1bde19645c8ade295b7e2c67b7af0e19e452c8
> GIT Branch: master
> GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1
> GIT Branch: master
> GIT Revision: c7c26a87c90d2f34d54dcfceebb2df19f326e9a1
> GIT Branch: master

I'm quite confused now. How you can have 2 kernels with same rev id? Anyway those are too old and don't contain all the fixes. You should test a kernel which has the following line in changelog (rpm -q --changelog):
patches.suse/add-initramfs-file_read_write: Fixed typo.
from jeffm@suse.de.

What kernels do you have installed right now, please append output of:
rpm -q `rpmqpack|grep kernel`
Comment 47 Per Jessen 2010-06-24 19:12:21 UTC
rpmqpack|grep kernel
kernel-xen-devel
kernel-desktop
kernel-devel
kernel-default-devel
kernel-pae-devel
kernel-source
kernel-default
kernel-default-base
kernel-syms
kernel-desktop-devel
patterns-openSUSE-devel_kernel

I'll try to get hold of a kernel with the typo fixed.
Comment 48 Jeff Mahoney 2010-06-24 19:20:28 UTC
This is the same initramfs DSDT bug that's been reported several times.

*** This bug has been marked as a duplicate of bug 616940 ***
Comment 49 Per Jessen 2010-06-25 08:28:46 UTC
Update:  This morning, I installed kotd 2.6.35-rc3, which failed. I then proceeded to install 2.6.34-11-default which worked, including the cpqphp module.
Comment 50 Jiri Slaby 2010-06-25 08:45:57 UTC
(In reply to comment #49)
> Update:  This morning, I installed kotd 2.6.35-rc3, which failed. I then
> proceeded to install 2.6.34-11-default which worked, including the cpqphp
> module.

Thanks, good to know. I believe that master kernel (2.6.35) will be fixed as soon as it will be built with the fix.