Bug 12941 - IPFire does not boot after Update from CU169 to CU170
Summary: IPFire does not boot after Update from CU169 to CU170
Status: CLOSED FIXED
Alias: None
Product: IPFire
Classification: Unclassified
Component: --- (show other bugs)
Version: 2
Hardware: x86_64 Linux
: Will affect almost no one Crash
Assignee: Michael Tremer
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-27 08:31 UTC by Dirk Sihling
Modified: 2022-11-08 17:18 UTC (History)
0 users

See Also:


Attachments
Screenshot when booting stopped (3.43 MB, image/jpeg)
2022-09-27 10:21 UTC, Dirk Sihling
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Sihling 2022-09-27 08:31:09 UTC
On Sep 26th 2022 I updated my CU169 installation on a Dell PowerEdge R240 to CU170. After reboot the system did not start anymore. The last error messages I received were from megaraid_sas:
- Ignore DCMD timeout: megasas_get_ctrl_info 5390
- Could not get controller info. Fail from megasas_init_adapter_fusion 1904
- Failed from megasas_init_fw 6548

I had to reinstall from the backup iso and am now back up and running at CU169.
Comment 1 Michael Tremer 2022-09-27 09:11:36 UTC
Is this reproducible by applying the update again or did the RAID controller malfunction?
Comment 2 Dirk Sihling 2022-09-27 09:44:22 UTC
(In reply to Michael Tremer from comment #1)
> Is this reproducible by applying the update again or did the RAID controller
> malfunction?

I did not try the update again because this is my production system and about 30 people rely on it working.
I tried to start the system twice, both time with no success. Starting with CU169 worked without any problems.
The RAID function of the controller is not used, both disks form a software RAID1 as offered by IPFire (https://bugzilla.ipfire.org/show_bug.cgi?id=12862).

Friday evening I could give it another try.
Comment 3 Michael Tremer 2022-09-27 10:02:38 UTC
We did not get any other reports about problems with this update. So I do not know what else we can do here apart from trying again... Sorry.
Comment 4 Dirk Sihling 2022-09-27 10:04:06 UTC
(In reply to Michael Tremer from comment #3)
> We did not get any other reports about problems with this update. So I do
> not know what else we can do here apart from trying again... Sorry.

No problem. Is there anything I should check or look for before and after applying the update?
Comment 5 Michael Tremer 2022-09-27 10:05:10 UTC
Not really. We do not do anything else apart from extracting files and running a few scripts.

A screenshot would be helpful.
Comment 6 Dirk Sihling 2022-09-27 10:21:13 UTC
Created attachment 1092 [details]
Screenshot when booting stopped
Comment 7 Dirk Sihling 2022-10-31 17:53:03 UTC
I am afraid I haven't had a chance for looking into this until today.
Unfortunately the problem is reproducible. CU169 works fine, but upgrading to 170 or now to 171 leads to a system that doesn't boot.
The reason seems to be that no devices are found. I can't understand why it works with CU169 but not with newer systems.

I would be happy if someone could give me a hint how to track this problem.

The first lines of rdsosreport.txt look like this:
dracut:/# less /run/initramfs/rdsosreport.txt
+ cat /lib/dracut/dracut-056
dracut-056
+ echo /proc/cmdline
/proc/cmdline
+ sed -e 's/\(ftp:\/\/.*\):.*@/\1:*******@/g;s/\(cifs:\/\/.*\):.*@/\1:*******@/g;s/cifspass=[^ ]*/cifspass=*******/g;s/iscsi:.*@/iscsi:******@/g;s/rd.iscsi.password=[^ ]*/rd.iscsi.password=******/g;s/rd.iscsi.in.password=[^ ]*/rd.iscsi.in.password=******/g' /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.15.71-ipfire root=UUID=a16c8d30-a521-4f54-b342-4601b54d802f ro rd.auto console=ttyS0,115200n8 rd.debug panic=10
+ '[' -f /etc/cmdline ']'
+ for _i in /etc/cmdline.d/*.conf
+ '[' -f '/etc/cmdline.d/*.conf' ']'
+ break
+ cat /proc/self/mountinfo
1 1 0:2 / / rw - rootfs none rw
17 1 0:16 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
18 1 0:17 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
19 1 0:5 / /dev rw,nosuid,noexec - devtmpfs devtmpfs rw,size=7970008k,nr_inodes=1992502,mode=755
20 19 0:18 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
21 19 0:19 / /dev/shm rw,nosuid,nodev,noexec - tmpfs tmpfs rw
22 1 0:20 / /run rw,nosuid,nodev,noexec - tmpfs tmpfs rw,mode=755
+ cat /proc/mounts
none / rootfs rw 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,mode=755 0 0
+ blkid
+ blkid -o udev
+ ls -l '/dev/disk/by*'
ls: cannot access '/dev/disk/by*': No such file or directory
Comment 8 Dirk Sihling 2022-11-04 12:13:17 UTC
I found the following ticket https://bugzilla.kernel.org/show_bug.cgi?id=214311 about problems with my PERC H330 controller when booting in BIOS mode. I switched to UEFI and the disks where found.
Problem solved as far as I am concerned. I am just surprised that this issue showed up between kernel versions 5.15.49 and 5.15.71.
Comment 9 Michael Tremer 2022-11-08 17:18:56 UTC
(In reply to Dirk Sihling from comment #8)
> Problem solved as far as I am concerned. I am just surprised that this issue
> showed up between kernel versions 5.15.49 and 5.15.71.

That is indeed a very interesting problem.