[If you don't use the forcedeth driver, you can stop reading; this is just for people who might benefit from the research we did on this.] For all of you with NVIDIA integrated ethernet adapters which use the forcedeth driver, I thought I'd share the results of some troubleshooting and benchmarking we did this week. As you may know, the forcedeth driver has a provision to limit the amount of work done by the driver per invocation of the irq handler. As far as I can tell, this was implemented specifically because there is some broken hardware out there which can get a stuck irq and so the handler will then never exit and hang your machine. In any case, in doing some nfs benchmarks (using bonnie++), I was able to reliably crash my forcedeth-equipped nfs client machine VERY HARD, even using 2.6.22.5 which includes a forcedeth-related patch to partially address this behavior. To make a long story short, we discovered that this problem goes away if you increase the limit by doing the following in /etc/modprobe.conf: options forcedeth max_interrupt_work=20 My theory is that this still is in the spirit of the initial implementation.. if you have broken hardware with a stuck irq, one imagines it's likely to go through the loop FAR more than 20 times and the mechanism will still prevent your whole system from hanging. But under heavy load, it will still perform fine. You could probably even go higher than 20 with no ill effects, but I've never hit the 20 limit even under heavy load, so it seemed like a good place to stop. *Maybe even more interestingly: Increasing max_interrupt_work seemed to boost performance significantly on the adapter; where before it was pushing about 30-40Mbit out the adapter, it is now 50-60Mbit.* So if you've got one of these motherboards with forcedeth ether, especially in a server, try increasing your max_interrupt_work parameter for MUCH better performance and stability. - P
participants (1)
-
Paul Mattal