[If you don't use the forcedeth driver, you can stop reading; this
is just for people who might benefit from the research we did on this.]
For all of you with NVIDIA integrated ethernet adapters which use
the forcedeth driver, I thought I'd share the results of some
troubleshooting and benchmarking we did this week.
As you may know, the forcedeth driver has a provision to limit the
amount of work done by the driver per invocation of the irq handler.
As far as I can tell, this was implemented specifically because
there is some broken hardware out there which can get a stuck irq
and so the handler will then never exit and hang your machine.
In any case, in doing some nfs benchmarks (using bonnie++), I was
able to reliably crash my forcedeth-equipped nfs client machine VERY
HARD, even using 18.104.22.168 which includes a forcedeth-related patch
to partially address this behavior.
To make a long story short, we discovered that this problem goes
away if you increase the limit by doing the following in
options forcedeth max_interrupt_work=20
My theory is that this still is in the spirit of the initial
implementation.. if you have broken hardware with a stuck irq, one
imagines it's likely to go through the loop FAR more than 20 times
and the mechanism will still prevent your whole system from hanging.
But under heavy load, it will still perform fine. You could probably
even go higher than 20 with no ill effects, but I've never hit the
20 limit even under heavy load, so it seemed like a good place to stop.
*Maybe even more interestingly: Increasing max_interrupt_work seemed
to boost performance significantly on the adapter; where before it
was pushing about 30-40Mbit out the adapter, it is now 50-60Mbit.*
So if you've got one of these motherboards with forcedeth ether,
especially in a server, try increasing your max_interrupt_work
parameter for MUCH better performance and stability.