[arch-general] Unreachable system and general ARP weirdness

swh root at pwnly.com
Sat Sep 3 16:39:23 EDT 2011

Beginning a few months ago, two of the systems on my LAN, a desktop
and a laptop, have periodically become mutually uncommunicative. I
believe it started occurring when I did a massive -Syu on the laptop,
installing six months worth of updates for the laptop at once and
effectively upgrading 1/2 of the installed packages... but I'm not
positive. The issue still occurs regularly with both systems running
3.0.x kernels.

The issue typically occurs after the desktop has been up for a few
days, but on rare occasion it's present when the desktop boots. When
the issue does occur, other systems on the LAN are able to reach both
systems as per normal. It's a simple switched network, and both
systems have manually-assigned static IPs.

I've done a fair bit of testing with tshark and have determined that
the issue presents itself as such: The laptop is able to reach the
desktop, with ICMP echo requests and nmap scans going through just
fine. However, a response is never received by the laptop.

When running tshark on the laptop and sending traffic from the
desktop, an oddity occurs. Although the laptop's static IP is (and this is explicitly present as an argument to ping,
nmap, etc.) the desktop is actually sending out an ARP broadcast,
attempting to get the MAC of an IP in the Philippines - far outside
the LAN's range.

What makes this doubly odd is that I've manually added a correct entry
into the desktop's ARP cache for the laptop's IP (via arp -s), yet
it's apparently ignored and the bogus ARP broadcasts continue.
Interestingly, the arping utility is able to broadcast for the correct
IP's MAC, and does receive a reply from the laptop - but the issue
persists for everything except arping.

If I put an unused IP (e.g. along with the laptop's MAC
into the desktop's ARP cache and ping that new address, predictably
the ICMP echo requests do get to the laptop, but of course aren't
acknowledged as the laptop isn't using that IP. Actually moving the
laptop to another IP results in the systems being able to reach each
other normally, but the bizarre ARP broadcasts persist when attempting
to reach the laptop's old IP,

A further observation I made (which simply added to the confusion): An
Arch VM, running with bridged networking on the desktop, is perfectly
capable of reaching the laptop - I can SSH in without issue. Yet
nothing (aside from arping) is even able to get a reply from the
laptop when run from a shell on the desktop. Because the VM works,
I've essentially ruled out any sort of hardware or driver issue.

The part where I'm stuck is... what's below the ARP cache? I have no
idea why "ping" doesn't work when there is an
explicitly-placed entry for the IP in the ARP cache, and how an
explicitly-specified IP can turn into an ARP broadcast for a
completely different IP (in the Philippines, no less) just baffles me.

Any help is appreciated.

