I benchmarked it on my mkinitcpio image, and zstd with mkinitcpio's […] Though you have benchmarked a wrong thing. It’s decompression time that matters here, not compression. The image is compressed to make it load faster during boot and that’s the important metric here.
I did my own benchmarks, though. On my decade-old system lz4 is still faster than zstd during decompression: 0.117415s (s=0.005948) vs 0.229915s (s=0.027075s). However, looking deeper, the time spent in syscalls suggests a different thing. lz4, having a bigger file, spent 0.049630s (s=0.007560s), while zstd 0.038125s (s=0.011196). That is not enough to claim they are different or zstd is better than lz4. But you can see in which direction it goes, in particular if one assumes a faster CPU+RAM. Methodology: image compressed using lz4 and zstd (same options as Geert Hendrickx has used). Copied to 50 files to create a batch, flushed to disk. Decompression of each file into /dev/null timed separately with caches dropped between each batch. Each batch ran 4 times for the total sample size of 200 for each group.