On Sep 26th 2022 I updated my CU169 installation on a Dell PowerEdge R240 to CU170. After reboot the system did not start anymore. The last error messages I received were from megaraid_sas: - Ignore DCMD timeout: megasas_get_ctrl_info 5390 - Could not get controller info. Fail from megasas_init_adapter_fusion 1904 - Failed from megasas_init_fw 6548 I had to reinstall from the backup iso and am now back up and running at CU169.
Is this reproducible by applying the update again or did the RAID controller malfunction?
(In reply to Michael Tremer from comment #1) > Is this reproducible by applying the update again or did the RAID controller > malfunction? I did not try the update again because this is my production system and about 30 people rely on it working. I tried to start the system twice, both time with no success. Starting with CU169 worked without any problems. The RAID function of the controller is not used, both disks form a software RAID1 as offered by IPFire (https://bugzilla.ipfire.org/show_bug.cgi?id=12862). Friday evening I could give it another try.
We did not get any other reports about problems with this update. So I do not know what else we can do here apart from trying again... Sorry.
(In reply to Michael Tremer from comment #3) > We did not get any other reports about problems with this update. So I do > not know what else we can do here apart from trying again... Sorry. No problem. Is there anything I should check or look for before and after applying the update?
Not really. We do not do anything else apart from extracting files and running a few scripts. A screenshot would be helpful.
Created attachment 1092 [details] Screenshot when booting stopped
I am afraid I haven't had a chance for looking into this until today. Unfortunately the problem is reproducible. CU169 works fine, but upgrading to 170 or now to 171 leads to a system that doesn't boot. The reason seems to be that no devices are found. I can't understand why it works with CU169 but not with newer systems. I would be happy if someone could give me a hint how to track this problem. The first lines of rdsosreport.txt look like this: dracut:/# less /run/initramfs/rdsosreport.txt + cat /lib/dracut/dracut-056 dracut-056 + echo /proc/cmdline /proc/cmdline + sed -e 's/\(ftp:\/\/.*\):.*@/\1:*******@/g;s/\(cifs:\/\/.*\):.*@/\1:*******@/g;s/cifspass=[^ ]*/cifspass=*******/g;s/iscsi:.*@/iscsi:******@/g;s/rd.iscsi.password=[^ ]*/rd.iscsi.password=******/g;s/rd.iscsi.in.password=[^ ]*/rd.iscsi.in.password=******/g' /proc/cmdline BOOT_IMAGE=/vmlinuz-5.15.71-ipfire root=UUID=a16c8d30-a521-4f54-b342-4601b54d802f ro rd.auto console=ttyS0,115200n8 rd.debug panic=10 + '[' -f /etc/cmdline ']' + for _i in /etc/cmdline.d/*.conf + '[' -f '/etc/cmdline.d/*.conf' ']' + break + cat /proc/self/mountinfo 1 1 0:2 / / rw - rootfs none rw 17 1 0:16 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 18 1 0:17 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 19 1 0:5 / /dev rw,nosuid,noexec - devtmpfs devtmpfs rw,size=7970008k,nr_inodes=1992502,mode=755 20 19 0:18 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 21 19 0:19 / /dev/shm rw,nosuid,nodev,noexec - tmpfs tmpfs rw 22 1 0:20 / /run rw,nosuid,nodev,noexec - tmpfs tmpfs rw,mode=755 + cat /proc/mounts none / rootfs rw 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0 tmpfs /run tmpfs rw,nosuid,nodev,noexec,mode=755 0 0 + blkid + blkid -o udev + ls -l '/dev/disk/by*' ls: cannot access '/dev/disk/by*': No such file or directory
I found the following ticket https://bugzilla.kernel.org/show_bug.cgi?id=214311 about problems with my PERC H330 controller when booting in BIOS mode. I switched to UEFI and the disks where found. Problem solved as far as I am concerned. I am just surprised that this issue showed up between kernel versions 5.15.49 and 5.15.71.
(In reply to Dirk Sihling from comment #8) > Problem solved as far as I am concerned. I am just surprised that this issue > showed up between kernel versions 5.15.49 and 5.15.71. That is indeed a very interesting problem.