12703 – e1000e Network card hang driver issue

Bug 12703 - e1000e Network card hang driver issue

Summary: e1000e Network card hang driver issue

Status:	CLOSED NOTABUG

Alias:	None

Product:	IPFire
Classification:	Unclassified
Component:	--- (show other bugs)
Version:	2
Hardware:	x86_64 Linux

Importance:	- Unknown - Crash
Assignee:	Arne.F
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-10-08 23:53 UTC by Stefan
Modified:	2024-07-24 11:26 UTC (History)
CC List:	4 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stefan 2021-10-08 23:53:27 UTC

since a couple of ipfire versions my network card 1000e hangs sporadically only a restart helps

and entering the command ethtool -K green0  gso off gro off tso off 

after that it runs stably until a restart

here the log from kernel when the network card hangs up

e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:48	kernel:	TDH <c8>
00:03:48	kernel:	TDT <e9>
00:03:48	kernel:	next_to_use <e9>
00:03:48	kernel:	next_to_clean <c8>
00:03:48	kernel:	buffer_info[next_to_clean]:
00:03:48	kernel:	time_stamp <1017eeb1c>
00:03:48	kernel:	next_to_watch <c8>
00:03:48	kernel:	jiffies <1017eebc0>
00:03:48	kernel:	next_to_watch.status <0>
00:03:48	kernel:	MAC Status <40080083>
00:03:48	kernel:	PHY Status <796d>
00:03:48	kernel:	PHY 1000BASE-T Status <3800>
00:03:48	kernel:	PHY Extended Status <3000>
00:03:48	kernel:	PCI Status <10>
00:03:50	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:50	kernel:	TDH <c8>
00:03:50	kernel:	TDT <e9>
00:03:50	kernel:	next_to_use <e9>
00:03:50	kernel:	next_to_clean <c8>
00:03:50	kernel:	buffer_info[next_to_clean]:
00:03:50	kernel:	time_stamp <1017eeb1c>
00:03:50	kernel:	next_to_watch <c8>
00:03:50	kernel:	jiffies <1017eec88>
00:03:50	kernel:	next_to_watch.status <0>
00:03:50	kernel:	MAC Status <40080083>
00:03:50	kernel:	PHY Status <796d>
00:03:50	kernel:	PHY 1000BASE-T Status <3800>
00:03:50	kernel:	PHY Extended Status <3000>
00:03:50	kernel:	PCI Status <10>
00:03:52	kernel:	e1000e 0000:00:19.0 green0: Reset adapter unexpectedly
00:03:56	kernel:	e1000e 0000:00:19.0 green0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
00:03:58	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:58	kernel:	TDH <0>
00:03:58	kernel:	TDT <9>
00:03:58	kernel:	next_to_use <9>
00:03:58	kernel:	next_to_clean <0>
00:03:58	kernel:	buffer_info[next_to_clean]:
00:03:58	kernel:	time_stamp <1017eeee9>
00:03:58	kernel:	next_to_watch <0>
00:03:58	kernel:	jiffies <1017eefa8>
00:03:58	kernel:	next_to_watch.status <0>
00:03:58	kernel:	MAC Status <40080083>
00:03:58	kernel:	PHY Status <796d>
00:03:58	kernel:	PHY 1000BASE-T Status <3800>
00:03:58	kernel:	PHY Extended Status <3000>
00:03:58	kernel:	PCI Status <10>
00:04:00	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:

Comment 1 Michael Tremer 2021-10-09 12:10:18 UTC

This is quite a popular NIC which is running fine for probably most people.

Could you try a firmware update of the board? What is the rest of the hardware?

Comment 2 Stefan 2021-10-09 21:26:17 UTC

(In reply to Michael Tremer from comment #1)
> This is quite a popular NIC which is running fine for probably most people.
> 
> Could you try a firmware update of the board? What is the rest of the
> hardware?

first of all i will do a bios update

then I keep looking I help myself to enter it befhelms each new star so that the map runs until the next restart without any problems

if the error is not resolved, I will buy a new network card 4x 10gbits intel

Comment 3 Stefan 2021-10-09 21:27:52 UTC

this is my hardware

https://www.supermicro.com/products/motherboard/Xeon/C202_C204/X9SCL.cfm

https://fireinfo.ipfire.org/profile/17d94e763f9ebbd470a3e57866faffd5b7caf12f

Comment 4 Man Grove 2021-11-01 18:16:13 UTC

(In reply to Michael Tremer from comment #1)
> This is quite a popular NIC which is running fine for probably most people.
> 
> Could you try a firmware update of the board? What is the rest of the
> hardware?

This is old as sin for the e1000 and later e1000e. Check my comments and links in the thread, especially this:
https://community.ipfire.org/t/e1000e-green0-detected-hardware-unit-hang/6324/18?u=mangrove
Just google for the message and you will get hundreds of reports of this error from other projects.
I was also suffering from this (in ipfire) a couple of years ago and then I came to the conclusion that it's probably physical chip revision related or something like that, so all instances aren't suffering (I run lots of virtualized e1000 machines and not a single one has had this bug). My solution was to swap my NICs in the affected physical machine (a Dell Optiplex).
But it's a (driver|hardware|firmware) bug and a really old one at that.

Comment 5 ChrisK 2022-10-27 12:05:29 UTC

Could be related to the IOMMU-problems like in Bug #12943.

Comment 6 Adolf Belka 2024-07-22 20:44:17 UTC

Is this bug still valid with Core Update 186

Comment 7 Stefan 2024-07-22 20:52:47 UTC

I haven't had this problem for over a year

with some core update it suddenly went away