Very firm -1 to any approach that involves creating hundreds of new packages which each provide a tiny file.
You're right, this would be overkill. Even when limiting to only UTF-8 we'd still have 313 packages.
This is not about locale-gen. locale-gen (and /etc/locale.gen) are Arch-specific custom scripts which IIRC were copied from Debian once upon a time, which just run localedef. I actually use a much simpler locale-gen program which uses flag files e.g. /etc/locales/en_US (file contents can contain a charset but are otherwise assumed to be UTF-8). It's not hard to hack your own.
Running localedef directly doesn't really solve any of the issues I mentioned either though. What if we make do with a single locale package? I just found out there's some progress on the C.UTF-8 locale upstream support in glibc ( https://sourceware.org/pipermail/libc-alpha/2020-June/115224.html). It doesn't look like it will be built-in though unless they manage to get the size down significantly. If it isn't built-in, maybe we could add a single package just for the C.UTF-8 locale? That should be sufficient for 95% of the "I'm building an Arch container/vm image for development/server/any other development stuff" use cases which generally will be using an english locale and avoids all the problems I mentioned earlier without requiring the addition of 300+ packages. It'll have to wait until we have C.UTF-8 in glibc though. I guess we could add a package for en_US.UTF-8 as a stopgap but that doesn't seem worth the effort assuming C.UTF-8 gets merged in a reasonable timeframe. As an example of why one would need a UTF-8 locale specifically in a container/vm image, meson (actually python) does not like running under a non UTF-8 locale at all. (I don't use mailing lists very often, I hope I didn't mess up the reply etiquette) Daan On Mon, 22 Jun 2020 at 22:31, Eli Schwartz via arch-general < arch-general@archlinux.org> wrote:
Hi,
While working on locale-gen support for systemd-firstboot ( https://github.com/systemd/systemd/pull/15994), I started wondering if it wouldn't be simpler to delegate the installation of locales to pacman instead. I haven't been following the mailing lists for very long so I don't know if this has ever been discussed. I'd imagine Arch could
On 6/22/20 3:11 PM, Daan De Meyer via arch-general wrote: provide
a package for each locale supported by glibc and users would install the ones they need.
Very firm -1 to any approach that involves creating hundreds of new packages which each provide a tiny file.
The PKGBUILD would use localedef to generate separate folders of compiled locale files for each locale that would be stored in /usr/lib/locale. This approach is already implemented by distros such as Fedora (and co) and Ubuntu.
The main advantage of this approach is that there's no need to set up an entire chroot to run locale-gen when pacstrapping a new Arch system image. This might seem easy but becomes trickier when the image uses a different architecture than the host system since emulation of that architecture has to be set up first. Even if locale-gen had a --root option so using the host's locale-gen would be an option, I'm not sure if there's any guarantee that compiled locale definitions generated by the host system's locale-gen would work with the glibc version used by the image (less of a problem with Arch but the glibc on the host could still potentially be out-of-date compared to the one installed in the image). Being able to install locales with pacman would solve all these problems.
Any interest in something like this from the Arch developers? I'd be willing to try my hand at a PKGBUILD for this but I'm not a TU so I'd need some support to get this implemented (if there is any interest at all).
(This also doesn't imply that locale-gen wouldn't work anymore, locale-gen stores everything in /usr/lib/locale/locale-archive which would be independent from the files installed by the locale packages, so both approaches should work side-by-side)
This is not about locale-gen. locale-gen (and /etc/locale.gen) are Arch-specific custom scripts which IIRC were copied from Debian once upon a time, which just run localedef. I actually use a much simpler locale-gen program which uses flag files e.g. /etc/locales/en_US (file contents can contain a charset but are otherwise assumed to be UTF-8). It's not hard to hack your own.
IIRC Fedora follows the "hundreds of packages which each provide a small file" approach, that being the localedef --no-archive intersection of a locale and a charmap. The combination of all possibilities will result in significant size bloat, so it is not feasible to provide them all in the glibc package itself. (e.g. try uncommenting all 487 locales in /etc/locale.gen and it is a 500MB locale-archive, "only" 100MB if you stick to UTF-8 locales)
-- Eli Schwartz Bug Wrangler and Trusted User