12941 – IPFire does not boot after Update from CU169 to CU170

Bug 12941 - IPFire does not boot after Update from CU169 to CU170

Summary: IPFire does not boot after Update from CU169 to CU170

Status:	CLOSED FIXED

Alias:	None

Product:	IPFire
Classification:	Unclassified
Component:	--- (show other bugs)
Version:	2
Hardware:	x86_64 Linux

Importance:	Will affect almost no one Crash
Assignee:	Michael Tremer
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-09-27 08:31 UTC by Dirk Sihling
Modified:	2022-11-08 17:18 UTC (History)
CC List:	0 users

See Also:

Attachments
Screenshot when booting stopped (3.43 MB, image/jpeg) 2022-09-27 10:21 UTC, Dirk Sihling	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dirk Sihling 2022-09-27 08:31:09 UTC

On Sep 26th 2022 I updated my CU169 installation on a Dell PowerEdge R240 to CU170. After reboot the system did not start anymore. The last error messages I received were from megaraid_sas:
- Ignore DCMD timeout: megasas_get_ctrl_info 5390
- Could not get controller info. Fail from megasas_init_adapter_fusion 1904
- Failed from megasas_init_fw 6548

I had to reinstall from the backup iso and am now back up and running at CU169.

Comment 1 Michael Tremer 2022-09-27 09:11:36 UTC

Is this reproducible by applying the update again or did the RAID controller malfunction?

Comment 2 Dirk Sihling 2022-09-27 09:44:22 UTC

(In reply to Michael Tremer from comment #1)
> Is this reproducible by applying the update again or did the RAID controller
> malfunction?

I did not try the update again because this is my production system and about 30 people rely on it working.
I tried to start the system twice, both time with no success. Starting with CU169 worked without any problems.
The RAID function of the controller is not used, both disks form a software RAID1 as offered by IPFire (https://bugzilla.ipfire.org/show_bug.cgi?id=12862).

Friday evening I could give it another try.

Comment 3 Michael Tremer 2022-09-27 10:02:38 UTC

We did not get any other reports about problems with this update. So I do not know what else we can do here apart from trying again... Sorry.

Comment 4 Dirk Sihling 2022-09-27 10:04:06 UTC

(In reply to Michael Tremer from comment #3)
> We did not get any other reports about problems with this update. So I do
> not know what else we can do here apart from trying again... Sorry.

No problem. Is there anything I should check or look for before and after applying the update?

Comment 5 Michael Tremer 2022-09-27 10:05:10 UTC

Not really. We do not do anything else apart from extracting files and running a few scripts.

A screenshot would be helpful.

Comment 6 Dirk Sihling 2022-09-27 10:21:13 UTC

Created attachment 1092 [details]
Screenshot when booting stopped

Comment 7 Dirk Sihling 2022-10-31 17:53:03 UTC

I am afraid I haven't had a chance for looking into this until today.
Unfortunately the problem is reproducible. CU169 works fine, but upgrading to 170 or now to 171 leads to a system that doesn't boot.
The reason seems to be that no devices are found. I can't understand why it works with CU169 but not with newer systems.

I would be happy if someone could give me a hint how to track this problem.

The first lines of rdsosreport.txt look like this:
dracut:/# less /run/initramfs/rdsosreport.txt
+ cat /lib/dracut/dracut-056
dracut-056
+ echo /proc/cmdline
/proc/cmdline
+ sed -e 's/\(ftp:\/\/.*\):.*@/\1:*******@/g;s/\(cifs:\/\/.*\):.*@/\1:*******@/g;s/cifspass=[^ ]*/cifspass=*******/g;s/iscsi:.*@/iscsi:******@/g;s/rd.iscsi.password=[^ ]*/rd.iscsi.password=******/g;s/rd.iscsi.in.password=[^ ]*/rd.iscsi.in.password=******/g' /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.15.71-ipfire root=UUID=a16c8d30-a521-4f54-b342-4601b54d802f ro rd.auto console=ttyS0,115200n8 rd.debug panic=10
+ '[' -f /etc/cmdline ']'
+ for _i in /etc/cmdline.d/*.conf
+ '[' -f '/etc/cmdline.d/*.conf' ']'
+ break
+ cat /proc/self/mountinfo
1 1 0:2 / / rw - rootfs none rw
17 1 0:16 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
18 1 0:17 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
19 1 0:5 / /dev rw,nosuid,noexec - devtmpfs devtmpfs rw,size=7970008k,nr_inodes=1992502,mode=755
20 19 0:18 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
21 19 0:19 / /dev/shm rw,nosuid,nodev,noexec - tmpfs tmpfs rw
22 1 0:20 / /run rw,nosuid,nodev,noexec - tmpfs tmpfs rw,mode=755
+ cat /proc/mounts
none / rootfs rw 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,mode=755 0 0
+ blkid
+ blkid -o udev
+ ls -l '/dev/disk/by*'
ls: cannot access '/dev/disk/by*': No such file or directory

Comment 8 Dirk Sihling 2022-11-04 12:13:17 UTC

I found the following ticket https://bugzilla.kernel.org/show_bug.cgi?id=214311 about problems with my PERC H330 controller when booting in BIOS mode. I switched to UEFI and the disks where found.
Problem solved as far as I am concerned. I am just surprised that this issue showed up between kernel versions 5.15.49 and 5.15.71.

Comment 9 Michael Tremer 2022-11-08 17:18:56 UTC

(In reply to Dirk Sihling from comment #8)
> Problem solved as far as I am concerned. I am just surprised that this issue
> showed up between kernel versions 5.15.49 and 5.15.71.

That is indeed a very interesting problem.