On 6/23/20 3:02 PM, Daan De Meyer via arch-general wrote:
This is not about locale-gen. locale-gen (and /etc/locale.gen) are Arch-specific custom scripts which IIRC were copied from Debian once upon a time, which just run localedef. I actually use a much simpler locale-gen program which uses flag files e.g. /etc/locales/en_US (file contents can contain a charset but are otherwise assumed to be UTF-8). It's not hard to hack your own.
Running localedef directly doesn't really solve any of the issues I mentioned either though.
It would: - avoid the *additional* issue "what to do if locale-gen doesn't exist", - solve the issue "locale-gen does not have a --root option" It wouldn't: - solve the issue "host/guest glibc version mismatches"
What if we make do with a single locale package? I just found out there's some progress on the C.UTF-8 locale upstream support in glibc ( https://sourceware.org/pipermail/libc-alpha/2020-June/115224.html). It doesn't look like it will be built-in though unless they manage to get the size down significantly. If it isn't built-in, maybe we could add a single package just for the C.UTF-8 locale? That should be sufficient for 95% of the "I'm building an Arch container/vm image for development/server/any other development stuff" use cases which generally will be using an english locale and avoids all the problems I mentioned earlier without requiring the addition of 300+ packages. It'll have to wait until we have C.UTF-8 in glibc though. I guess we could add a package for en_US.UTF-8 as a stopgap but that doesn't seem worth the effort assuming C.UTF-8 gets merged in a reasonable timeframe.
The ultimate goal is to ensure C.UTF-8 always exists no matter what. If it gets merged upstream in glibc as a non-builtin localedef generated locale, then the probable best solution is to make locale-gen always include C.UTF-8 regardless of which other locales are requested by the user's system. Or include its compiled form in the glibc package directly, if it isn't too bloated.
As an example of why one would need a UTF-8 locale specifically in a container/vm image, meson (actually python) does not like running under a non UTF-8 locale at all.
You're preaching to the choir, here. ;) I thoroughly agree there must be a UTF-8 locale. The question is at what stage should this be selected and generated.
(I don't use mailing lists very often, I hope I didn't mess up the reply etiquette)
Generally people tend to delete the sections they are not replying to, but reply inline, rather than including everytyhing the bottom as a second copy of the sections you quoted and replied to inline. Still, replying inline is the main thing, and you did that. :) -- Eli Schwartz Bug Wrangler and Trusted User