cciss module does not identify resources

Bug #684304 reported by C de-Avillez
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Andy Whitcroft
Natty
Fix Released
High
Andy Whitcroft

Bug Description

Natty alpha1

When installing on a machine with CCISS resources, the cciss module, but no resources are identified.

I will upload the logs in a few.

This is a failed remote install via serial console, so there is no option for apport-collect data.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Setting importance to high -- this blocks UEC QA testing.

tags: added: iso-testing natty regression-release
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Developers don't like tarballs. ;) You have to download them, extract stuff and then loose everything on your disk drive.

Revision history for this message
Stefan Bader (smb) wrote :

Would we happen to have a boot log that has the resources found (with a previous release) from the same machine?

Revision history for this message
Stefan Bader (smb) wrote :

One more thing, I am not sure I miss something or whether there is no indication of the cciss modules actually being loaded. Does it show up in lsmod?

Revision history for this message
C de-Avillez (hggdh2) wrote :

(1) it does happen I have the install logs from Maverick (which I ran today after the failure, for verification). Now, this is the complete /var/log/installer contents, with 9 files. Do you want them one after the other, or a tarball? Your choice.

(2) yes, it showed up on lsmod.

Additionally, this is on a system in the DC, so I guess you could access it.

Revision history for this message
C de-Avillez (hggdh2) wrote :

No, I was wrong: the cciss module is not automatically loaded. I ran the d-i install until I got the error, and then opened a shell:

~ # lsmod
Module Size Used by
iscsi_tcp 9910 0
libiscsi_tcp 16070 1 iscsi_tcp
libiscsi 46060 2 iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi 37628 3 iscsi_tcp,libiscsi
xfs 776382 0
exportfs 4283 1 xfs
reiserfs 245469 0
jfs 181857 0
btrfs 527739 0
zlib_deflate 21890 1 btrfs
crc32c 3023 1
libcrc32c 1284 1 btrfs
ntfs 97973 0
vfat 11366 0
fat 55023 1 vfat
usb_storage 50402 0
usbhid 40920 0
hid 84505 1 usbhid
bnx2 78004 0
~ #

Revision history for this message
C de-Avillez (hggdh2) wrote :

I then modprobed it:

~ # modprobe cciss
~ # lsmod
Module Size Used by
cciss 104356 0
iscsi_tcp 9910 0
libiscsi_tcp 16070 1 iscsi_tcp
libiscsi 46060 2 iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi 37628 3 iscsi_tcp,libiscsi
xfs 776382 0
exportfs 4283 1 xfs
reiserfs 245469 0
jfs 181857 0
btrfs 527739 0
zlib_deflate 21890 1 btrfs
crc32c 3023 1
libcrc32c 1284 1 btrfs
ntfs 97973 0
vfat 11366 0
fat 55023 1 vfat
usb_storage 50402 0
usbhid 40920 0
hid 84505 1 usbhid
bnx2 78004 0
~ # dmesg|tail
[ 38.395929] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 124.498472] NTFS driver 2.1.29 [Flags: R/O MODULE].
[ 124.515765] Btrfs loaded
[ 124.532751] JFS: nTxBlock = 8192, nTxLock = 65536
[ 124.557501] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[ 124.558369] SGI XFS Quota Management subsystem
[ 126.200287] Loading iSCSI transport class v2.0-870.
[ 126.211212] iscsi: registered transport (tcp)
[ 126.212772] iscsid (7734): /proc/7734/oom_adj is deprecated, please use /proc/7734/oom_score_adj instead.
[ 580.944529] HP CISS Driver (v 3.6.26)
~ #

So the HP hardware is not being recognised.

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :

Note, at the end of the output, that the cciss was loaded, but did not find any resources.

Changed in linux (Ubuntu Natty):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Revision history for this message
C de-Avillez (hggdh2) wrote :
Download full text (3.7 KiB)

lspci output:

00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)
00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 13)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
00:06.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 6 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 13)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:0a.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 13)
00:0d.0 Host bridge: Intel Corporation Device 343a (rev 13)
00:0d.1 Host bridge: Intel Corporation Device 343b (rev 13)
00:0d.2 Host bridge: Intel Corporation Device 343c (rev 13)
00:0d.3 Host bridge: Intel Corporation Device 343d (rev 13)
00:0d.4 Host bridge: Intel Corporation 5520/5500/X58 Physical Layer Port 0 (rev 13)
00:0d.5 Host bridge: Intel Corporation 5520/5500 Physical Layer Port 1 (rev 13)
00:0d.6 Host bridge: Intel Corporation Device 341a (rev 13)
00:0e.0 Host bridge: Intel Corporation Device 341c (rev 13)
00:0e.1 Host bridge: Intel Corporation Device 341d (rev 13)
00:0e.2 Host bridge: Intel Corporation Device 341e (rev 13)
00:0e.3 Host bridge: Intel Corporation Device 341f (rev 13)
00:0e.4 Host bridge: Intel Corporation Device 3439 (rev 13)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.2 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 3
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.3 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIB (ICH10) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1
01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
01:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out Controller (rev 03)
01:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out Processor (rev 03)
01:04.4 USB Controller: Hewlett-Packard...

Read more...

tags: added: kernel-series-unknown
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Sorry for not responding. Would it be possible to get the full dmesg and "sudo lspci -vvvnn" from Maverick so I can compare them? In comment #17, you title the file "lsmod after error". Do I understand correctly that this means after the installation failed? In that output the cciss module is not loaded (anymore). Just to see whether it maybe is something about timing, what happens if you modprobe the module when you are dropped to busybox?

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :

After the installation stops on failing to recognise a disk, I went to a shell, and manually run a 'modprobe cciss'. The output is shown on comment 10. So, manually modprobing the module does not do anything.

I wrongly stated, initially, that cciss was auto-loaded. It is *not*.

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Right, actually it seems the cciss module is the least of the problems. Something odd is going on with all the PCIe bridges. All of the root port devices have msi disabled, some windows are different (thought that might be just new layout) and none of them seems to have a driver associated.

That most of the other devices end up on the same interrupts (7 and 10) is just fallout but maybe does not help either.

Revision history for this message
Stefan Bader (smb) wrote :

Could you give Natty a try with "resource_alloc_from_bottom" in the kernel command line, please?

Revision history for this message
Stefan Bader (smb) wrote :

Digging around further in other bug reports seems to indicate "pci=nocrs" may be another option to try of above option does not help.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Tried resource_alloc_from_bottom, got a kernel OOPS (but it kept on), and no disc resources identified. I have attached the installer syslog.

Output of 'lsmod' after the failure:

~ # lsmod
Module Size Used by
iscsi_tcp 18382 0
libiscsi_tcp 20966 1 iscsi_tcp
libiscsi 57124 2 iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi 41852 3 iscsi_tcp,libiscsi
xfs 782676 0
exportfs 13027 1 xfs
reiserfs 252405 0
jfs 186201 0
btrfs 540045 0
zlib_deflate 27074 1 btrfs
libcrc32c 12644 1 btrfs
ntfs 101813 0
vfat 21678 0
fat 61335 1 vfat
usb_storage 57634 0
usbhid 47152 0
bnx2 85940 0
hid 90905 1 usbhid
~ #

Revision history for this message
C de-Avillez (hggdh2) wrote :

Tried 'pci=nocrs', similar results. The kernel OOPS is related to the bxn2, BTW. Installation syslog is attached.

Stefan Bader (smb)
tags: added: kernel-server
removed: kernel-series-unknown
Revision history for this message
Stefan Bader (smb) wrote :

It seems that despite the claim alloc_from_botttom does not completely restore the old allocation scheme (the assigned addresses still are top down, though lower than without). And there is currently some discussion going on about this on the mailing list. I picked two patches floating there and I am preparing kernels. As soon as the machine is available from other tests I would like to try this (also need to gain some other access rights, though).

Revision history for this message
Stefan Bader (smb) wrote :

Still have no way to test the provided kernels myself. Beside this there seems at least moves upstream to get the things I suspect are causing the problems here to be reverted:

http://marc.info/?l=linux-kernel&m=129243714212019&w=4

but that might be delaying verification more. So it would be good to hear whether the provided changes would be a temporary solution.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Applied the new kernel on a Maverick install -- it boots OK. Interestingly, I do not see the cciss module in use anymore...

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Ah, did not actually check for it but apparently there is a new driver hpsa which is implemented as a scsi driver and is claimed to be the way to go for newer smart arrays. The docs say that cciss is a block driver that is used for disks and tapes which seemed to give people headaches.

Another point of interest, this boot was not done with "pci=nocrs", so the top-down instead of bottom up assignment is actually still in use. However there was a second patch used to build the kernel which specifically addresses some problems with subtractive decode bridges (whatever that exactly is). So as things seem to work, the major issue was there (iirc the fact the patch was needed also came from the changes about the resource assignment). The other thing will likely change as well but at least this seems to hint that only the smaller of the two patches needs to be carried temporarily to fix the issue.

[This should get verified probably by building another test kernel with only that change]

Revision history for this message
Andy Whitcroft (apw) wrote :

As per comment #34 I have put together a further set of kernel images for test. These are from the same code base Stefan used with only the 'subtractive bridge' change applied. If those with the hardware could test and confirm if this is enough to fix the issue. Please report back here. There kernels are at the URL below:

    http://people.canonical.com/~apw/uectest-natty/

Revision history for this message
C de-Avillez (hggdh2) wrote :

Boots successfully with the server image.

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :

Another point -- may, or may not, help: 'hpsa.ko' is not included in the standard ISO -- it would be under ./modules/<kernelversion>/kernel/drivers/scsi/.

So we need to include it in the standard ISO.

Revision history for this message
Andy Whitcroft (apw) wrote :

@C de-Avillez -- indeed good point, which look into the udebs to ensure it is included.

Could we test the uectest3 kernel at the URL below and see if this one works for you? This test kernel carries the mainline rejig of the device layout code and is slated to fix your issue. If not I need to know asap. Thanks. Kernels are here:

    http://people.canonical.com/~apw/uectest-natty/

Changed in linux (Ubuntu Natty):
assignee: Stefan Bader (stefan-bader-canonical) → Andy Whitcroft (apw)
status: New → Incomplete
Revision history for this message
Andy Whitcroft (apw) wrote :

@C de-Avillez -- ok I've confirmed that hpsa is not include by default and added it to the d-i configuration. This d-i change is now Fix Committed and will be included in the next upload to Natty.

Revision history for this message
C de-Avillez (hggdh2) wrote :

kernel uectst3 boots on Maverick on the test rig. I see a kernel trace on hpsa -- similar to bug 690190:

[ 4.722045] WARNING: at /home/apw/build/natty/ubuntu-natty/kernel/trace/ftrace.c:1014 ftrace_bug+0x289/0x2c0()
[ 4.722048] Hardware name: ProLiant DL380 G6
[ 4.722049] Modules linked in: hpsa(+)
[ 4.722052] Pid: 192, comm: modprobe Not tainted 2.6.37-9-server #22+uectst3
[ 4.722054] Call Trace:
[ 4.722061] [<ffffffff810662cf>] warn_slowpath_common+0x7f/0xc0
[ 4.722066] [<ffffffffa0005d44>] ? hpsa_compat_ioctl+0x4/0xd4 [hpsa]
[ 4.722068] [<ffffffff8106632a>] warn_slowpath_null+0x1a/0x20
[ 4.722070] [<ffffffff810e0f19>] ftrace_bug+0x289/0x2c0
[ 4.722074] [<ffffffffa0005d44>] ? hpsa_compat_ioctl+0x4/0xd4 [hpsa]
[ 4.722076] [<ffffffff810e119a>] ftrace_update_code+0x11a/0x150
[ 4.722079] [<ffffffffa0005d44>] ? hpsa_compat_ioctl+0x4/0xd4 [hpsa]
[ 4.722082] [<ffffffff810e15d1>] ftrace_process_locs+0x91/0xc0
[ 4.722085] [<ffffffff810e2f25>] ftrace_module_notify+0x45/0x50
[ 4.722090] [<ffffffff815d3abd>] notifier_call_chain+0x4d/0x70
[ 4.722094] [<ffffffff8108d988>] __blocking_notifier_call_chain+0x58/0x80
[ 4.722096] [<ffffffff8108d9c6>] blocking_notifier_call_chain+0x16/0x20
[ 4.722102] [<ffffffff810a4cd7>] sys_init_module+0x87/0x220
[ 4.722106] [<ffffffff8100c082>] system_call_fastpath+0x16/0x1b
[ 4.722108] ---[ end trace 68482689a5568d46 ]---
[ 4.722109] ftrace faulted on writing [<ffffffffa0005d44>] hpsa_compat_ioctl+0x4/0xd4 [hpsa]
[ 4.722141] HP HPSA Driver (v 2.0.2-1)
[ 4.722165] hpsa 0000:04:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
[ 4.722180] hpsa 0000:04:00.0: MSIX
[ 4.722228] hpsa 0000:04:00.0: irq 65 for MSI/MSI-X
[ 4.722235] hpsa 0000:04:00.0: irq 66 for MSI/MSI-X
[ 4.722249] hpsa 0000:04:00.0: irq 67 for MSI/MSI-X
[ 4.722254] hpsa 0000:04:00.0: irq 68 for MSI/MSI-X
[ 4.752433] hpsa 0000:04:00.0: hpsa0: <0x323a> at IRQ 65 using DAC
[ 4.782927] scsi2 : hpsa
[ 4.785456] hpsa 0000:04:00.0: Direct-Access device c2b0t0l0 added.
[ 4.785462] hpsa 0000:04:00.0: RAID device c2b3t0l0 added.
[ 4.785652] scsi 2:0:0:0: Direct-Access HP LOGICAL VOLUME 1.66 PQ: 0 ANSI: 5
[ 4.785801] scsi 2:3:0:0: RAID HP P410i 1.66 PQ: 0 ANSI: 0
[ 4.785951] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 4.786081] scsi 2:3:0:0: Attached scsi generic sg2 type 12
[ 4.786109] sd 2:0:0:0: [sda] 143305920 512-byte logical blocks: (73.3 GB/68.3 GiB)
[ 4.786242] sd 2:0:0:0: [sda] Write Protect is off
[ 4.786244] sd 2:0:0:0: [sda] Mode Sense: 6b 00 00 08
[ 4.786326] sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4.787036] sda: sda1 sda2 < sda5 >
[ 4.787871] sd 2:0:0:0: [sda] Attached SCSI disk

Revision history for this message
C de-Avillez (hggdh2) wrote :
Changed in linux (Ubuntu Natty):
status: Incomplete → Confirmed
Revision history for this message
Andy Whitcroft (apw) wrote :

@C de-Avillez -- ok, final test kernel, this is the latest natty kernel with the upstream updates to the layout algorithm. This should sort out the ftrace issue. Could you test the uectst4 kernels from the link below and report back, thanks:

    http://people.canonical.com/~apw/uectest-natty/

Revision history for this message
C de-Avillez (hggdh2) wrote :

Andy, boots nicely :-)

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
Andy Whitcroft (apw) wrote :

Ok the fixes tested above have now releases as v2.6.37-rc7 which will be the base for the next Natty upload. This upload will contain both the fixes and the d-i changes and should close this bug.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.37-11.25

---------------
linux (2.6.37-11.25) natty; urgency=low

  [ Andy Whitcroft ]

  * [Config] d-i -- add hpsa to the list of block devices
    - LP: #684304
  * [Config] add vmw-balloon driver to -virtual flavour
    - LP: #592039
  * rebase to v2.6.37-rc7

  [ Upstream Kernel Changes ]

  * rebase to v2.6.37-rc7
 -- Andy Whitcroft <email address hidden> Tue, 21 Dec 2010 13:35:28 +0000

Changed in linux (Ubuntu Natty):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.