On Wed, 2017-02-01 at 00:21 -0700, Leonid Isaev wrote:
On Wed, Feb 01, 2017 at 01:20:41AM -0500, Daniel Micay via arch- general wrote:
On Wed, 2017-02-01 at 00:18 +0100, sivmu wrote:
Summary:
Arch Linux is one of the few, if not the only distribution that still disables or restricts the use of unprivileged user namespaces, a feature that is used by many applications and containers to provide secure sandboxing. There have been request to turn this feature on since Linux 3.13 (in 2013) but they are still being denied. While there may have been some reason for doing so a few year ago, leading to many distributions like Debian and Red Hat to restrict its use to privileged users via a kernel patch (they never disabled it completely), today arch seems to be the only distribution to block this feature. Even conservative distros like Debian 8 and 9 have this feature fully enabled.
There are still endless unprivileged user namespace vulnerabilities and it's a nearly useless feature. The uid/gid mapping is poorly thought out and immature without the necessary environment (filesystem support, etc.) built around it, but no one really wants it for that reason. They want it because it started pretending that it can offer something that it can't actually deliver safely. There are much better ways to do unprivileged sandboxes with significantly less risk than CLONE_NEWUSER or setuid executables where the user controls the environment. Anything depending on this mechanism instead of properly designed plumbing for it is simply lazy garbage. Lack of a proper layer on top of the kernel providing infrastructure (systemd is so far from that) on desktop/server Linux is not going to be fixed by delegating everything to the kernel even when it massively increases attack surface.
BTW, why can't one simply create a *privileged* lxc container on a host filesystem mounted with nosuid, then create an unprivileged user inside that container for browsing / viewing of untrusted pdfs, etc?
Application containers don't have a use for the user namespace quasi root and no one really needs the half baked uid/gid mapping feature. There's no real reason for stuff being done that way beyond desktop Linux having the disease of inability to do plumbing in userspace, but instead putting everything in the kernel simply to have it universally available rather than for technical reasons. It would make sense to simply have a service spawning on-demand unpriv users from a range of uid/gid pairs. That's exactly how this works on Android for both apps and isolatedProcess services (they each get a unique uid/gid pair assigned), although they also layer SELinux and mount namespaces on top. The only real use case for user namespaces is unprivileged, contained usage of OS containers since they actually need the quasi root. For application containers / sandboxes, it's just laziness and bad design.