[arch-general] Boot SuperMicro H8QM8-2 w/4 Opteron 8360 - hangs on boot of install media
All, This may be a stupid question. If so I apologize. Do I have to do anything different to boot the arch install media on a board with 4 Opteron 8360 processors (16 core total), or should the kernel just handle however many cores there are? The reason I ask is it always hangs after it says Booting Kernel (immediately after decompress). When I memtest the memory, (I have two sets and I've gone stick by stick with a minimum memory config, and I still get errors at the exact same percentage of every test.) I have rotated each stick in/out with a replacement, and replaced the whole set, but same issue. So I either have to do something different to boot with 16 core which will in turn also address the memory issue, or I have lots of bad ram or a bad mem controller in one of the processors. Hardware Info shows all cores and memory, it's just using it that is the problem :) What say the experts? Do I need to pass a kernel flag, or something similar for the 4 Opteron boot, or does it just smell like multiple failed sticks in each set? -- David C. Rankin, J.D.,P.E.
El 29/02/2016 a las 6:44 p. m., David C. Rankin escribió:
All,
This may be a stupid question. If so I apologize. Do I have to do anything different to boot the arch install media on a board with 4 Opteron 8360 processors (16 core total), or should the kernel just handle however many cores there are?
There is no problem there. Arch's default kernel configuration supports up to and including 32 cores.
The reason I ask is it always hangs after it says Booting Kernel (immediately after decompress). When I memtest the memory, (I have two sets and I've gone stick by stick with a minimum memory config, and I still get errors at the exact same percentage of every test.) I have rotated each stick in/out with a replacement, and replaced the whole set, but same issue.
If you are sure the RAM sticks are working fine by, say, burning them in a different rig, then you may have a problem with the memory controller. or something more insidious in the firmware controller. In any case, it sounds to me you need to RMI the box. I've heard SuperMicro has above average tech support... [snip]
What say the experts? Do I need to pass a kernel flag, or something similar for the 4 Opteron boot, or does it just smell like multiple failed sticks in each set?
I've never had problems with AMD-based servers. But the usual stuff in these parts is HP (now HPE), what used to be IBM (don't recall which Chinese outfit scavenged that part of the business), Dell and MSI-based white boxes. -- Pedro A. López-Valencia http://about.me/palopezv/ Recession is when a neighbor loses his job. Depression is when you lose yours. -Ronald Reagan
On 02/29/2016 07:30 PM, P. A. López-Valencia wrote:
There is no problem there. Arch's default kernel configuration supports up to and including 32 cores.
The reason I ask is it always hangs after it says Booting Kernel (immediately after decompress). When I memtest the memory, (I have two sets and I've gone stick by stick with a minimum memory config, and I still get errors at the exact same percentage of every test.) I have rotated each stick in/out with a replacement, and replaced the whole set, but same issue. If you are sure the RAM sticks are working fine by, say, burning them in a different rig, then you may have a problem with the memory controller. or something more insidious in the firmware controller. In any case, it sounds to me you need to RMI the box. I've heard SuperMicro has above average tech support...
Thank you. I wish tech support were an issue, but this is an older box from an ISP I was given, so it is more or less a project/fun box at this point. The RAM is HP. (not yet HPE when this box was minted). Like I said, I have 2 sets of RAM (36 2-Gig sticks total, all HP Part No. 405476-051) I have attempted to run with as low a 8-Gig (4 sticks, non-interleaved, all sticks in DIMM slot 1A of each bank per the manual) and it will not memtest in that config. Adding a stick to slot 1B of each bank allows the test to go to 32% precisely where the error begins, regardless of which sticks are used. I'll keep playing with it. I've been through the BIOS (there are about 50 settings for RAM/ECC alone, most have 'Auto' setting). I'll double-check there as well. The box was apparently working fine before it came to me (I don't know how long ago the 'working fine' was through) I'll keep fiddling with this. It is truly one impressive piece of hardware that comes in a 100 lb. box and sounds like a 747 at takeoff (I think it has equivalent thrust too given all the fans (17 at last count, 9 of which are the 80mm 6000 rpm jobs)). Thank you again for your help. -- David C. Rankin, J.D.,P.E.
On 02/29/2016 07:59 PM, David C. Rankin wrote:
I'll keep playing with it. I've been through the BIOS (there are about 50 settings for RAM/ECC alone, most have 'Auto' setting). I'll double-check there as well. The box was apparently working fine before it came to me (I don't know how long ago the 'working fine' was through)
Well, it was 2 things, so in case there are others that run into this. 1) 2nd set of 20 sticks of RAM was just junk. Same HP part no., but Puerto Rico made, the machine will not boot on this RAM. Original 8 sticks work fine. 2) I am an idiot. The Advanced -> Boot -> OS Type settings on this server are the reverse of a desktop BIOS. Generally, the OS Type setting is [Windows | Other], I left the setting as "Other". However a closer inspection of the setting is [Other | Linux], flipping to "Linux" solved the issue here. The "Other" is for a hypervisor, such as vmware exsi. All 16 cores found fine. Thanks for the help. -- David C. Rankin, J.D.,P.E.
participants (2)
-
David C. Rankin
-
P. A. López-Valencia