Bug 12703 - e1000e Network card hang driver issue
Summary: e1000e Network card hang driver issue
Status: NEW
Alias: None
Product: IPFire
Classification: Unclassified
Component: --- (show other bugs)
Version: 2
Hardware: x86_64 Linux
: - Unknown - Crash
Assignee: Arne.F
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-08 23:53 UTC by Stefan
Modified: 2022-10-27 12:05 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan 2021-10-08 23:53:27 UTC
since a couple of ipfire versions my network card 1000e hangs sporadically only a restart helps

and entering the command ethtool -K green0  gso off gro off tso off 

after that it runs stably until a restart

here the log from kernel when the network card hangs up

e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:48	kernel:	TDH <c8>
00:03:48	kernel:	TDT <e9>
00:03:48	kernel:	next_to_use <e9>
00:03:48	kernel:	next_to_clean <c8>
00:03:48	kernel:	buffer_info[next_to_clean]:
00:03:48	kernel:	time_stamp <1017eeb1c>
00:03:48	kernel:	next_to_watch <c8>
00:03:48	kernel:	jiffies <1017eebc0>
00:03:48	kernel:	next_to_watch.status <0>
00:03:48	kernel:	MAC Status <40080083>
00:03:48	kernel:	PHY Status <796d>
00:03:48	kernel:	PHY 1000BASE-T Status <3800>
00:03:48	kernel:	PHY Extended Status <3000>
00:03:48	kernel:	PCI Status <10>
00:03:50	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:50	kernel:	TDH <c8>
00:03:50	kernel:	TDT <e9>
00:03:50	kernel:	next_to_use <e9>
00:03:50	kernel:	next_to_clean <c8>
00:03:50	kernel:	buffer_info[next_to_clean]:
00:03:50	kernel:	time_stamp <1017eeb1c>
00:03:50	kernel:	next_to_watch <c8>
00:03:50	kernel:	jiffies <1017eec88>
00:03:50	kernel:	next_to_watch.status <0>
00:03:50	kernel:	MAC Status <40080083>
00:03:50	kernel:	PHY Status <796d>
00:03:50	kernel:	PHY 1000BASE-T Status <3800>
00:03:50	kernel:	PHY Extended Status <3000>
00:03:50	kernel:	PCI Status <10>
00:03:52	kernel:	e1000e 0000:00:19.0 green0: Reset adapter unexpectedly
00:03:56	kernel:	e1000e 0000:00:19.0 green0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
00:03:58	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:58	kernel:	TDH <0>
00:03:58	kernel:	TDT <9>
00:03:58	kernel:	next_to_use <9>
00:03:58	kernel:	next_to_clean <0>
00:03:58	kernel:	buffer_info[next_to_clean]:
00:03:58	kernel:	time_stamp <1017eeee9>
00:03:58	kernel:	next_to_watch <0>
00:03:58	kernel:	jiffies <1017eefa8>
00:03:58	kernel:	next_to_watch.status <0>
00:03:58	kernel:	MAC Status <40080083>
00:03:58	kernel:	PHY Status <796d>
00:03:58	kernel:	PHY 1000BASE-T Status <3800>
00:03:58	kernel:	PHY Extended Status <3000>
00:03:58	kernel:	PCI Status <10>
00:04:00	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
Comment 1 Michael Tremer 2021-10-09 12:10:18 UTC
This is quite a popular NIC which is running fine for probably most people.

Could you try a firmware update of the board? What is the rest of the hardware?
Comment 2 Stefan 2021-10-09 21:26:17 UTC
(In reply to Michael Tremer from comment #1)
> This is quite a popular NIC which is running fine for probably most people.
> 
> Could you try a firmware update of the board? What is the rest of the
> hardware?

first of all i will do a bios update

then I keep looking I help myself to enter it befhelms each new star so that the map runs until the next restart without any problems

if the error is not resolved, I will buy a new network card 4x 10gbits intel
Comment 4 Man Grove 2021-11-01 18:16:13 UTC
(In reply to Michael Tremer from comment #1)
> This is quite a popular NIC which is running fine for probably most people.
> 
> Could you try a firmware update of the board? What is the rest of the
> hardware?

This is old as sin for the e1000 and later e1000e. Check my comments and links in the thread, especially this:
https://community.ipfire.org/t/e1000e-green0-detected-hardware-unit-hang/6324/18?u=mangrove
Just google for the message and you will get hundreds of reports of this error from other projects.
I was also suffering from this (in ipfire) a couple of years ago and then I came to the conclusion that it's probably physical chip revision related or something like that, so all instances aren't suffering (I run lots of virtualized e1000 machines and not a single one has had this bug). My solution was to swap my NICs in the affected physical machine (a Dell Optiplex).
But it's a (driver|hardware|firmware) bug and a really old one at that.
Comment 5 ChrisK 2022-10-27 12:05:29 UTC
Could be related to the IOMMU-problems like in Bug #12943.