Salta al contenuto principale


Hey #Linux friends out there - I could use some opinions / input on something I’ve been brooding over for a few days!

I have a small Intel N100 based server running various services / automations at my parent’s house. The box has a double-NIC running as a transparent bridge with some filtering and other network management applied.

Both NICs are identical Realtek on-board chips (10ec:8168 / sub: 10ec:0123) normally running on the in-tree #r8169 driver on kernel 6.12.6:

r8169 0000:01:00.0 eth0: RTL8168h/8111h, XX:XX:XX:XX:XX:XX, XID 541, IRQ 142
r8169 0000:01:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]

r8169 0000:03:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko8169 0000:03:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko]


One of them (eth1 / enp3s0) is regularly tossing me these errors:

r8169 0000:03:00.0 enp3s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
r8169 0000:03:00.0 enp3s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
r8169 0000:03:00.0 enp3s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5317 ms


As far as I can tell, when this happens half of the bridge silently stops working. I can reach the PC from “my side" which is connected to a router on eth0 / enp1s0 but devices on “the other side” are unreachable until I reboot.

Searching online wasn't very helpful at all as the main solution other users with this issue get is "replace the NIC with something not Realtek!” - yeah, no, I can’t.

There's also bug reports on kernel.org going as far back as 2020 but no clear solutions.

I turned off ASPM on the system and eventually switched to the r8168-DKMS drivers. On #r8168 the link will go down for 1-3 seconds but then fully recover. Not a great solution but a workaround I can live with for now.

Anyone got any ideas / similar experiences that could help shed some light on the problem?

in reply to Heals

Update: even running a constant ping on the affected NIC doesn't prevent it from sporadic errors every 2-5 hours. If anything I almost want to say it causes more.

The only good news for now is that using r8168 will simply trigger an interface link down/up in the span of a few seconds and everything keeps running normally where r8169 caused the NIC to stay in an unusable up-state constantly logging errors.

Seems I'll have to take the house offline sometime tonight and check the BIOS settings.

#Linux #r8169 #r8168