Bug 13509 - Grub Boot failure after update 182
Summary: Grub Boot failure after update 182
Status: CLOSED FIXED
Alias: None
Product: IPFire
Classification: Unclassified
Component: --- (show other bugs)
Version: 2
Hardware: x86_64 Linux
: - Unknown - Crash
Assignee: Arne.F
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-07 23:59 UTC by dnl
Modified: 2024-02-14 12:15 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dnl 2024-01-07 23:59:06 UTC
After cleanly updating an existing IPFire installation to update 182 the system will not boot.  The grub rescue prompt is displayed straight after the BIOS prompt with "error: file `/grub/x86_64-efi/boot.mod` not found.".

The system was previously configured to use Legacy BIOS boot, but it appears to me as if all options now are trying to use EFI.

The system is Intel J1900 based and has been running IPFire for years: https://fireinfo.ipfire.org/profile/12a8c1b37699b320895097608705c5cc374254a9

I was eventually able to boot the system through other recovery ISO images on USB sticks, however the process of doing this seems to be trial and error.

After booting the system using a recovery image I ran `/usr/bin/install-bootloader` and `grub-mkconfig -o /boot/grub/grub.cfg` but the problem happened again on next reboot.

It is possible this is not a bug however after posting at length in the forums it was suggested that I raise this as a bug as all the obvious troubleshooting steps have been exhausted.  Other people have mentioned various grub related problems in the forum thread I raised however at this time the evidence they provide does not correlate with the problem I see.

Forum thread, including two screenshots of mine: https://community.ipfire.org/t/core-update-182-caused-grub-boot-failure/10826/2 and post 19.


It would be difficult for me to rebuild this system from scratch.  Despite having backups I use a KVM image on a partition for performance reasons and I'm not confident I'd be able to recover everything as it is.  (I've learned the lesson I should document this better!)


Thank you for your time!
dnl
Comment 1 dnl 2024-01-08 00:10:20 UTC
PS: The boot.mod file itself does not appear to be the problem. It has the same checksum as one I have on an IPFire VM.
Comment 2 dnl 2024-01-10 02:01:11 UTC
The system seems unable to boot EFI yet since performing the update & reboot Grub doesn't attempt to boot using legacy BIOS any longer.

There is no evidence of disk or filesystem corruption.  /boot is fine and all the /boot/grub/i386-pc/ file checksums match a test IPFire VM.

If I boot using a recovery image, such as SuperGrub, it detects only EFI boot options also, however it allows me the option to extract entries from grub.cfg.  When I choose to boot one of the entries from grub.cfg it boots without problem.

I have tried to run the following:
```
grub-install --no-floppy --recheck --force --target=i386-pc /dev/sda
```
I still end up at `grub rescue >` though.
Comment 3 dnl 2024-01-10 10:03:03 UTC
Arne.F has stated in the forum: "on systems with xfs filesystem on the boot partition grub seems not correct installed (always?)."

My system uses XFS for /boot (and root).
Comment 4 Michael Tremer 2024-01-11 13:16:08 UTC
> https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=7b40a1c6e275d097c89a8411db1d1e37b75c8e6d
> https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=c903d67c7e602a0bbc7c3ea4afefe66cccd6a344

The GRUB update has been reverted from the updater so that we won't break any systems. The following update will ship the final version which seems to fix most of the known problems.
Comment 5 dnl 2024-01-12 03:10:31 UTC
Thank you very much!

I'll test this as soon as the update to core update 182 is available on my system.
Comment 6 Deyan Bektchiev 2024-01-14 20:31:56 UTC
I also had the bootloader issue and after booting manually re-applied the core 182 update (set manually the version to 181 and used pakfire upgrade) and then reinstalled the bootloader to fix it. The last step might not have been necessary but felt safer and I didn't have the time to experiment.

Thank you for providing an updated 182 package!
Comment 7 dnl 2024-01-14 21:36:11 UTC
Thanks to Deyan in the forum I've been able to re-apply 182 on my system, but unfortunately it has not resolved the problem.  The symptoms appear identical.

I've put two screenshots in the forum post:
https://community.ipfire.org/t/core-update-182-caused-grub-boot-failure/10826/40?u=dnl

The latter shows that a problem I saw once before after manually running a grub install or using the /usr/bin/install-bootloader script where the console font becomes corrupt. Updating it a second time appeared to resolve this.  So I'll try to re-apply 182 a second time late today to see if it changes.
Comment 8 dnl 2024-01-15 21:36:00 UTC
I re-applied 182 a second time and rebooted.  Unfortunately my system still won't boot (same problem) and the graphical corruption in the console is still present.
Comment 9 Michael Tremer 2024-01-16 15:29:34 UTC
(In reply to dnl from comment #8)
> I re-applied 182 a second time and rebooted.  Unfortunately my system still
> won't boot (same problem) and the graphical corruption in the console is
> still present.

Is there any chance that you can give Core Update 183 a go? That should hopefully contain a proper fix for the XFS boot issue.

I am not sure where are standing with this one regarding BIOS bugs.
Comment 10 dnl 2024-01-16 21:30:20 UTC
Thanks Michael

It's now working!!

This morning I forced the system to re-update to 182 and ran install-bootloader and it is working.  The graphical corruption on the console is also gone.

I had not realised that I needed to rebuild grub (with the install-bootloader) script after updating to 182 again.  I had assumed it would automatically run when grub was upgraded.

Do I need to prepare for when you update grub again please?
Comment 11 Michael Tremer 2024-01-17 11:00:08 UTC
(In reply to dnl from comment #10)
> I had not realised that I needed to rebuild grub (with the
> install-bootloader) script after updating to 182 again.  I had assumed it
> would automatically run when grub was upgraded.

We did not run that script, because people usually upgrade rather than downgrade and on upgraded systems we would not touch the boot loader at all.

> Do I need to prepare for when you update grub again please?

The update to GRUB 2.12 is now in Core Update 183. It would be great to know whether that works for you, or if we are being put into the same position again...

Can you maybe just clone your machine so that you don't break production?
Comment 12 dnl 2024-01-18 09:07:00 UTC
(In reply to Michael Tremer from comment #11)
> The update to GRUB 2.12 is now in Core Update 183. It would be great to know
> whether that works for you, or if we are being put into the same position
> again...
> 
> Can you maybe just clone your machine so that you don't break production?

Unfortunately this is my home IPFire system and I don't have a spare SSD/disk to put in it.  My family tolerates outages worse than my clients at work!  Still, I'll try to arrange a few hours sometime.  Hopefully I can always just boot it with a USB stick again if it fails.

Am I correct in saying that you believe the problem is related to:
* Legacy BIOS boot
* with XFS filesystems
* on Intel baytrail hardware?

Note that when the problem happened Grub was always trying to boot EFI, not BIOS.

Thanks again!
Comment 13 dnl 2024-01-18 09:07:36 UTC
PS: If I test update 183 now, is it possible to later change back to the 'stable' release when that release version becomes the stable version?
Comment 14 Michael Tremer 2024-01-23 10:17:45 UTC
(In reply to dnl from comment #13)
> PS: If I test update 183 now, is it possible to later change back to the
> 'stable' release when that release version becomes the stable version?

Sorry for the late reply. Yes, it is the dropdown at the bottom of the Pakfire page where you can just go back to stable.
Comment 15 dnl 2024-01-25 06:22:31 UTC
Thank you Michael

I recognise that if there's a problem I'll be affected anyway, but I am nervous about testing this.  Last time it took me a number of hours to stumble upon a workaround.

I appreciate that you said in the test release announcement that you're confident the problem is fixed.  I'll aim to try this early one weekend day, probably this weekend.  

Hopefully if there is a problem I can work around it the same way again.
Comment 16 dnl 2024-01-27 00:51:03 UTC
I'm very happy to report that after I:
* Changed the Pakfire repository to "Testing"
* Installed all updates (strangely it installed testing 182 and then testing 183 even though the system was already on production 182)
* Ran install-bootloader
* Rebooted
It all rebooted without problem!

During boot, after the IPFire full screen logo was displayed, some text was overlaid which had me worried:
```
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Measured initrd data in to PCR 9.
```
but it booted fine after that.


Is my system now using EFI boot?  I have found it difficult to tell conclusively.

Thanks again!
Comment 17 Michael Tremer 2024-01-27 11:33:23 UTC
Thanks for the confirmation. And to me this looks like you have an EFI system there, yes.
Comment 18 dnl 2024-01-28 04:15:31 UTC
Thank you.

This has been confusing, as I'm fairly sure my system was built with legacy BIOS boot.  Anyway it's working now, so I can't complain!


If it helps, the new grub.cfg has set root='hd0,msdos5' while a backup copy from before had that using 'msdos4'.

My system has 6 partitions:

Device     Boot    Start       End   Sectors   Size Id Type
/dev/sda1  *        2048    264191    262144   128M 83 Linux
/dev/sda2         264192    329727     65536    32M ef EFI (FAT-12/16/32)
/dev/sda3         329728   2292233   1962506 958.3M 82 Linux swap / Solaris
/dev/sda4        2293760 125045423 122751664  58.5G  5 Extended
/dev/sda5        2295808  96667647  94371840    45G 83 Linux
/dev/sda6       96669696 125045423  28375728  13.5G 83 Linux

So msdos4 is an empty extended partition (header?) while 5 is actually the root filesystem (sda6 is a Linux VM in its own partition).
Comment 19 Michael Tremer 2024-01-29 16:14:21 UTC
(In reply to dnl from comment #18)
> This has been confusing, as I'm fairly sure my system was built with legacy
> BIOS boot.  Anyway it's working now, so I can't complain!

Our installer will install everything so that the device can boot in EFI mode and legacy mode.

> If it helps, the new grub.cfg has set root='hd0,msdos5' while a backup copy
> from before had that using 'msdos4'.

We default to the classic MS-DOS partition table unless the disk is larger than 2TB where we need to use GPT.
Comment 20 dnl 2024-02-13 06:24:45 UTC
Thank you again Michael.

Should I close this ticket now that it is resolved?
I'm not sure of the etiquette here.