Bug 12760

Summary: kernel error from CU 162 & 163, PC Engines apu4
Product: IPFire Reporter: Jon <jon.murphy>
Component: ---Assignee: Michael Tremer <michael.tremer>
Status: CLOSED FIXED QA Contact: Arne.F <arne.fitzenreiter>
Severity: - Unknown -    
Priority: - Unknown - CC: peter.mueller
Version: 2   
Hardware: x86_64   
OS: Unspecified   

Description Jon 2022-01-11 04:18:44 UTC
Kernel error


System version

IPFire version	IPFire 2.27 (x86_64) - core162
Pakfire version	2.27-x86_64
Kernel version	Linux ipfire.localdomain 5.15.6-ipfire #1 SMP Sun Dec 19 10:54:26 GMT 2021 x86_64 AMD GX-412TC SOC AuthenticAMD GNU/Linux


==============

Jan  7 22:09:40 ipfire kernel: ------------[ cut here ]------------
Jan  7 22:09:40 ipfire kernel: refcount_t: underflow; use-after-free.
Jan  7 22:09:40 ipfire kernel: WARNING: CPU: 1 PID: 17992 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Jan  7 22:09:40 ipfire kernel: Modules linked in: ledtrig_netdev ledtrig_heartbeat tun nfnetlink_queue xt_NFQUEUE xt_MASQUERADE cfg80211 rfkill 8021q garp xt_ipp2p(O) compat_xtables(O) xt_geoip(O) xt_multiport xt_mac xt_REDIRECT xt_hashlimit xt_policy xt_TCPMSS xt_conntrack xt_comment ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_mark xt_connmark nf_log_syslog iptable_raw iptable_mangle iptable_filter vfat fat amd64_edac edac_mce_amd kvm_amd sch_fq_codel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr sp5100_tco fam15h_power k10temp i2c_piix4 igb ptp pps_core dca ccp i2c_algo_bit i2c_core leds_gpio pinctrl_amd gpio_keys acpi_cpufreq lp parport_pc parport video sdhci_pci cqhci sdhci mmc_core
Jan  7 22:09:40 ipfire kernel: CPU: 1 PID: 17992 Comm: W-NFQ#1 Tainted: G           O      5.15.6-ipfire #1
Jan  7 22:09:40 ipfire kernel: Hardware name: PC Engines apu4/apu4, BIOS v4.15.0.1 11/23/2021
Jan  7 22:09:40 ipfire kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110
Jan  7 22:09:40 ipfire kernel: Code: 01 01 e8 32 91 59 00 0f 0b 31 f6 89 f7 c3 80 3d 14 8d 39 01 00 75 85 48 c7 c7 98 f6 f3 8b c6 05 04 8d 39 01 01 e8 0f 91 59 00 <0f> 0b 31 f6 89 f7 c3 80 3d ef 8c 39 01 00 0f 85 5e ff ff ff 48 c7
Jan  7 22:09:40 ipfire kernel: RSP: 0018:ffffb583c8c4f868 EFLAGS: 00010246
Jan  7 22:09:40 ipfire kernel: RAX: 0000000000000000 RBX: ffffa0f9c1bf8200 RCX: 0000000000000000
Jan  7 22:09:40 ipfire kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan  7 22:09:40 ipfire kernel: RBP: ffffa0f9c1bf8200 R08: 0000000000000000 R09: 0000000000000000
Jan  7 22:09:40 ipfire kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0f9c1bf8200
Jan  7 22:09:40 ipfire kernel: R13: 0000000000000000 R14: 0000000000000005 R15: ffffa0f9c3b12100
Jan  7 22:09:40 ipfire kernel: FS:  000079dc0ba4e640(0000) GS:ffffa0f9eac80000(0000) knlGS:0000000000000000
Jan  7 22:09:40 ipfire kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan  7 22:09:40 ipfire kernel: CR2: 0000716fa42b7bc8 CR3: 0000000103486000 CR4: 00000000000406e0
Jan  7 22:09:40 ipfire kernel: Call Trace:
Jan  7 22:09:40 ipfire kernel:  <TASK>
Jan  7 22:09:40 ipfire kernel:  nf_queue_entry_release_refs+0x8b/0xa0
Jan  7 22:09:40 ipfire kernel:  nf_reinject+0x7a/0x1e0
Jan  7 22:09:40 ipfire kernel:  nfqnl_recv_verdict+0x303/0x4f0 [nfnetlink_queue]
Jan  7 22:09:40 ipfire kernel:  nfnetlink_rcv_msg+0x251/0x310
Jan  7 22:09:40 ipfire kernel:  ? nfnetlink_net_init+0xa0/0xa0
Jan  7 22:09:40 ipfire kernel:  netlink_rcv_skb+0x5b/0x110
Jan  7 22:09:40 ipfire kernel:  netlink_unicast+0x215/0x2e0
Jan  7 22:09:40 ipfire kernel:  netlink_sendmsg+0x233/0x480
Jan  7 22:09:40 ipfire kernel:  ? netlink_unicast+0x2e0/0x2e0
Jan  7 22:09:40 ipfire kernel:  ____sys_sendmsg+0x2a6/0x2e0
Jan  7 22:09:40 ipfire kernel:  ___sys_sendmsg+0xa3/0x100
Jan  7 22:09:40 ipfire kernel:  __sys_sendmsg+0x81/0xe0
Jan  7 22:09:40 ipfire kernel:  do_syscall_64+0x5c/0x90
Jan  7 22:09:40 ipfire kernel:  ? syscall_exit_to_user_mode+0x23/0x50
Jan  7 22:09:40 ipfire kernel:  ? __x64_sys_recvfrom+0x20/0x40
Jan  7 22:09:40 ipfire kernel:  ? do_syscall_64+0x69/0x90
Jan  7 22:09:40 ipfire kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan  7 22:09:40 ipfire kernel: RIP: 0033:0x79dc0e2eb62d
Jan  7 22:09:40 ipfire kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 fa ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 4e ef ff ff 48
Jan  7 22:09:40 ipfire kernel: RSP: 002b:000079dc0ba4bf40 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Jan  7 22:09:40 ipfire kernel: RAX: ffffffffffffffda RBX: 000079dbfc268dd0 RCX: 000079dc0e2eb62d
Jan  7 22:09:40 ipfire kernel: RDX: 0000000000000000 RSI: 000079dc0ba4bf80 RDI: 0000000000000006
Jan  7 22:09:40 ipfire kernel: RBP: 000079dc0ba4bfe0 R08: 0000000000000000 R09: 000079dc0d9c9de0
Jan  7 22:09:40 ipfire kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jan  7 22:09:40 ipfire kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000080
Jan  7 22:09:40 ipfire kernel:  </TASK>
Jan  7 22:09:40 ipfire kernel: ---[ end trace 9676210543ac4447 ]---
Comment 1 Jon 2022-02-15 02:50:34 UTC
Another kernel error.  This time on Core Update 163.

System versions

IPFire version	IPFire 2.27 (x86_64) - core163
Pakfire version	2.27-x86_64
Kernel version	Linux ipfire.localdomain 5.15.6-ipfire #1 SMP Sun Dec 19 10:54:26 GMT 2021 x86_64 AMD GX-412TC SOC AuthenticAMD GNU/Linux

======

Feb 14 18:17:19 ipfire kernel: ------------[ cut here ]------------
Feb 14 18:17:19 ipfire kernel: refcount_t: underflow; use-after-free.
Feb 14 18:17:19 ipfire kernel: WARNING: CPU: 2 PID: 12784 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Feb 14 18:17:19 ipfire kernel: Modules linked in: ledtrig_netdev ledtrig_heartbeat tun xt_NFQUEUE nfnetlink_queue xt_MASQUERADE cfg80211 rfkill 8021q garp xt_ipp2p(O) compat_xtables(O) xt_geoip(O) xt_multiport xt_mac xt_REDIRECT xt_hashlimit xt_policy xt_TCPMSS xt_conntrack xt_comment ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_mark xt_connmark nf_log_syslog iptable_raw iptable_mangle iptable_filter vfat fat amd64_edac edac_mce_amd kvm_amd sch_cake kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel igb sp5100_tco pcspkr fam15h_power i2c_piix4 k10temp ptp pps_core dca i2c_algo_bit i2c_core ccp leds_gpio gpio_keys pinctrl_amd acpi_cpufreq lp parport_pc parport video sdhci_pci cqhci sdhci mmc_core
Feb 14 18:17:19 ipfire kernel: CPU: 2 PID: 12784 Comm: W-NFQ#1 Tainted: G           O      5.15.6-ipfire #1
Feb 14 18:17:19 ipfire kernel: Hardware name: PC Engines apu4/apu4, BIOS v4.15.0.1 11/23/2021
Feb 14 18:17:19 ipfire kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110
Feb 14 18:17:19 ipfire kernel: Code: 01 01 e8 32 91 59 00 0f 0b 31 f6 89 f7 c3 80 3d 14 8d 39 01 00 75 85 48 c7 c7 98 f6 73 90 c6 05 04 8d 39 01 01 e8 0f 91 59 00 <0f> 0b 31 f6 89 f7 c3 80 3d ef 8c 39 01 00 0f 85 5e ff ff ff 48 c7
Feb 14 18:17:19 ipfire kernel: RSP: 0018:ffffac12001df888 EFLAGS: 00010246
Feb 14 18:17:19 ipfire kernel: RAX: 0000000000000000 RBX: ffffa1c08c564d80 RCX: 0000000000000000
Feb 14 18:17:19 ipfire kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Feb 14 18:17:19 ipfire kernel: RBP: ffffa1c08c564d80 R08: 0000000000000000 R09: 0000000000000000
Feb 14 18:17:19 ipfire kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa1c08c564d80
Feb 14 18:17:19 ipfire kernel: R13: ffffa1c08c564db0 R14: 0000000000000006 R15: ffffa1c083385000
Feb 14 18:17:19 ipfire kernel: FS:  000078cd4e048640(0000) GS:ffffa1c0aad00000(0000) knlGS:0000000000000000
Feb 14 18:17:19 ipfire kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 18:17:19 ipfire kernel: CR2: 0000000000b15ca8 CR3: 000000010c2d6000 CR4: 00000000000406e0
Feb 14 18:17:19 ipfire kernel: Call Trace:
Feb 14 18:17:19 ipfire kernel:  <TASK>
Feb 14 18:17:19 ipfire kernel:  nf_queue_entry_release_refs+0x8b/0xa0
Feb 14 18:17:19 ipfire kernel:  nf_reinject+0x7a/0x1e0
Feb 14 18:17:19 ipfire kernel:  nfqnl_recv_verdict+0x303/0x4f0 [nfnetlink_queue]
Feb 14 18:17:19 ipfire kernel:  nfnetlink_rcv_msg+0x251/0x310
Feb 14 18:17:19 ipfire kernel:  ? nfnetlink_net_init+0xa0/0xa0
Feb 14 18:17:19 ipfire kernel:  netlink_rcv_skb+0x5b/0x110
Feb 14 18:17:19 ipfire kernel:  netlink_unicast+0x215/0x2e0
Feb 14 18:17:19 ipfire kernel:  netlink_sendmsg+0x233/0x480
Feb 14 18:17:19 ipfire kernel:  ? netlink_unicast+0x2e0/0x2e0
Feb 14 18:17:19 ipfire kernel:  ____sys_sendmsg+0x2a6/0x2e0
Feb 14 18:17:19 ipfire kernel:  ___sys_sendmsg+0xa3/0x100
Feb 14 18:17:19 ipfire kernel:  __sys_sendmsg+0x81/0xe0
Feb 14 18:17:19 ipfire kernel:  do_syscall_64+0x5c/0x90
Feb 14 18:17:19 ipfire kernel:  ? __do_softirq+0xc6/0x27e
Feb 14 18:17:19 ipfire kernel:  ? irq_exit_rcu+0x3e/0xb0
Feb 14 18:17:19 ipfire kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Feb 14 18:17:19 ipfire kernel: RIP: 0033:0x78cd508e562d
Feb 14 18:17:19 ipfire kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 fa ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 4e ef ff ff 48
Feb 14 18:17:19 ipfire kernel: RSP: 002b:000078cd4e045f40 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Feb 14 18:17:19 ipfire kernel: RAX: ffffffffffffffda RBX: 000078cd40268dd0 RCX: 000078cd508e562d
Feb 14 18:17:19 ipfire kernel: RDX: 0000000000000000 RSI: 000078cd4e045f80 RDI: 0000000000000006
Feb 14 18:17:19 ipfire kernel: RBP: 000078cd4e045fe0 R08: 0000000000000000 R09: 000078cd4ffc3de0
Feb 14 18:17:19 ipfire kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Feb 14 18:17:19 ipfire kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000080
Feb 14 18:17:19 ipfire kernel:  </TASK>
Feb 14 18:17:19 ipfire kernel: ---[ end trace 451aae9999ce3052 ]---
Comment 2 Michael Tremer 2022-02-15 10:49:16 UTC
This is exactly the same problem as before. Just to confirm :)
Comment 3 Jon 2022-02-15 16:53:59 UTC
I believe you!  I just wish I understood how you know that!!

Do you want me to keep adding kernel errors?  (these are the only two I have seen this year)
Comment 4 Michael Tremer 2022-02-15 17:27:06 UTC
(In reply to Jon from comment #3)
> I believe you!  I just wish I understood how you know that!!

The functions in the call trace are the same :)

> Do you want me to keep adding kernel errors?  (these are the only two I have
> seen this year)

Yes, I would like to know if this problem still happens with every kernel update. They are not a high priority for me since they do not case any crashes. Worst case is that you will drop a packet and return back to normal operation.
Comment 5 Jon 2022-03-22 17:54:48 UTC
Another similar kernel error...

System versions

IPFire version	IPFire 2.27 (x86_64) - core164
Pakfire version	2.27-x86_64
Kernel version	Linux ipfire.localdomain 5.15.23-ipfire #1 SMP Wed Mar 9 16:56:16 GMT 2022 x86_64 AMD GX-412TC SOC AuthenticAMD GNU/Linux

==============

Mar 21 19:22:46 ipfire kernel: ------------[ cut here ]------------
Mar 21 19:22:46 ipfire kernel: refcount_t: underflow; use-after-free.
Mar 21 19:22:46 ipfire kernel: WARNING: CPU: 3 PID: 13171 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Mar 21 19:22:46 ipfire kernel: Modules linked in: xt_NFQUEUE nfnetlink_queue nf_conntrack_netlink ledtrig_netdev ledtrig_heartbeat tun act_mirred act_connmark em_ipt act_gact cls_basic ifb sch_ingress xt_layer7 cls_u32 sch_htb xt_MASQUERADE cfg80211 rfkill 8021q garp xt_ipp2p(O) compat_xtables(O) xt_multiport xt_mac xt_REDIRECT xt_hashlimit xt_policy xt_geoip(O) xt_TCPMSS xt_conntrack xt_comment ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_mark xt_connmark nf_log_syslog iptable_raw iptable_mangle iptable_filter vfat fat amd64_edac edac_mce_amd kvm_amd sch_cake kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sp5100_tco pcspkr igb k10temp fam15h_power ptp i2c_piix4 pps_core dca i2c_algo_bit ccp i2c_core gpio_keys pinctrl_amd leds_gpio acpi_cpufreq lp parport_pc parport video sdhci_pci cqhci sdhci mmc_core
Mar 21 19:22:46 ipfire kernel: CPU: 3 PID: 13171 Comm: W-NFQ#1 Tainted: G           O      5.15.23-ipfire #1
Mar 21 19:22:46 ipfire kernel: Hardware name: PC Engines apu4/apu4, BIOS v4.15.0.1 11/23/2021
Mar 21 19:22:46 ipfire kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110
Mar 21 19:22:46 ipfire kernel: Code: 01 01 e8 71 9d 59 00 0f 0b 31 f6 89 f7 c3 80 3d 04 92 39 01 00 75 85 48 c7 c7 d0 04 34 b5 c6 05 f4 91 39 01 01 e8 4e 9d 59 00 <0f> 0b 31 f6 89 f7 c3 80 3d df 91 39 01 00 0f 85 5e ff ff ff 48 c7
Mar 21 19:22:46 ipfire kernel: RSP: 0018:ffffb1390c0df8d8 EFLAGS: 00010246
Mar 21 19:22:46 ipfire kernel: RAX: 0000000000000000 RBX: ffff9f3084afca80 RCX: 0000000000000000
Mar 21 19:22:46 ipfire kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Mar 21 19:22:46 ipfire kernel: RBP: ffff9f3084afca80 R08: 0000000000000000 R09: 0000000000000000
Mar 21 19:22:46 ipfire kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f3084afca80
Mar 21 19:22:46 ipfire kernel: R13: ffff9f3084afcab0 R14: 0000000000000006 R15: ffff9f3091f55f00
Mar 21 19:22:46 ipfire kernel: FS:  00007c6e3991a640(0000) GS:ffff9f30aad80000(0000) knlGS:0000000000000000
Mar 21 19:22:46 ipfire kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 21 19:22:46 ipfire kernel: CR2: 0000716e97369080 CR3: 0000000101a24000 CR4: 00000000000406e0
Mar 21 19:22:46 ipfire kernel: Call Trace:
Mar 21 19:22:46 ipfire kernel:  <TASK>
Mar 21 19:22:46 ipfire kernel:  nf_queue_entry_release_refs+0x8b/0xa0
Mar 21 19:22:46 ipfire kernel:  nf_reinject+0x7a/0x1e0
Mar 21 19:22:46 ipfire kernel:  nfqnl_recv_verdict+0x303/0x4f0 [nfnetlink_queue]
Mar 21 19:22:46 ipfire kernel:  nfnetlink_rcv_msg+0x251/0x310
Mar 21 19:22:46 ipfire kernel:  ? nfnetlink_net_init+0xa0/0xa0
Mar 21 19:22:46 ipfire kernel:  netlink_rcv_skb+0x5b/0x110
Mar 21 19:22:46 ipfire kernel:  netlink_unicast+0x215/0x2e0
Mar 21 19:22:46 ipfire kernel:  netlink_sendmsg+0x240/0x4b0
Mar 21 19:22:46 ipfire kernel:  ? netlink_unicast+0x2e0/0x2e0
Mar 21 19:22:46 ipfire kernel:  ____sys_sendmsg+0x2a6/0x2e0
Mar 21 19:22:46 ipfire kernel:  ___sys_sendmsg+0xa3/0x100
Mar 21 19:22:46 ipfire kernel:  __sys_sendmsg+0x81/0xe0
Mar 21 19:22:46 ipfire kernel:  do_syscall_64+0x5c/0x90
Mar 21 19:22:46 ipfire kernel:  ? irq_exit_rcu+0x3e/0xb0
Mar 21 19:22:46 ipfire kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 21 19:22:46 ipfire kernel: RIP: 0033:0x7c6e3c1b762d
Mar 21 19:22:46 ipfire kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 fa ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 4e ef ff ff 48
Mar 21 19:22:46 ipfire kernel: RSP: 002b:00007c6e39917f40 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Mar 21 19:22:46 ipfire kernel: RAX: ffffffffffffffda RBX: 00007c6e2c269020 RCX: 00007c6e3c1b762d
Mar 21 19:22:46 ipfire kernel: RDX: 0000000000000000 RSI: 00007c6e39917f80 RDI: 0000000000000006
Mar 21 19:22:46 ipfire kernel: RBP: 00007c6e39917fe0 R08: 0000000000000000 R09: 00007c6e3b895de0
Mar 21 19:22:46 ipfire kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Mar 21 19:22:46 ipfire kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000080
Mar 21 19:22:46 ipfire kernel:  </TASK>
Mar 21 19:22:46 ipfire kernel: ---[ end trace fb242a6769a3455c ]---
Comment 6 Michael Tremer 2022-03-23 11:16:01 UTC
They are all the same and I believe that I have a patch here that fixes it:

> https://git.ipfire.org/?p=people/ms/linux.git;a=commitdiff;h=4ecd5474b7a19aa84158f8e727fa6dbfc9464191

It looks like the RCU lock is not being held while calling nf_reinject which is then causing this problem.

@Arne: Could you merge this into the kernel so that we can test this and see if it resolves the problem? If so, I will submit it upstream as soon as possible.