On Thu, 2017-02-02 at 02:40 +0100, sivmu wrote:
Am 01.02.2017 um 21:21 schrieb Daniel Micay via arch-general:
it's a nearly useless feature.
That's a baseless claim, that was already proved wrong in my first post by the many applications that use this feature.
That doesn't demonstrate that it's useful relative to the alternatives. It enables unprivileged OS containers but isn't really any use for app containers.
Pretty much all famous container programms use this. I wonder why if there is no use for it.
Also I would still like to see a simple alternative for unprivileged namespaces to sandbox apps. How do you provide something like bubblewrap without user namespaces? And no that android example below is not the same as long as there is no simple way to use this (which I am not aware of)
Doing things properly is not easy.
but no one really wants it for that reason. They want it because it started pretending that it can offer something that it can't actually deliver safely.
Again a claim without prove
The proof is easy to find. You're the one making a proposal but you clearly haven't done your research. It's not my job to spoon feed you.
I do know some of the discussions about this feature on the kernel mailing list. But the opinions even there are not as clear as you want to make us believe.
The kernel configuration disables it by default. It enables UTS, IPC, PID and NET namespaces by default. That's the opinion from upstream on the sane default for a general purpose build: disabled.
It is quite clear that it's a major security risk. It exposes an endless stream of privesc vulnerabilities from all of the attack surface it adds. That attack surface was never exposed like that before and the code is not at all robust against attackers, since it was only exposed to root users before. It is going to take years for it to settle down and become more like core kernel code that was already exposed, and it's always going to be a ton of extra attack surface.
There are much better ways to do unprivileged sandboxes with significantly less risk than CLONE_NEWUSER or setuid executables where the user controls the environment.
And yet you fail to name even one alternative. Please do
Uh, yeah, I did. M
Sorry but 'M' ? I don't get it.
Anything depending on this mechanism instead of properly designed plumbing for it is simply lazy garbage.
Another baseless and arrogant claim
Not baseless and it's not arrogant to point out that this is a bad feature for app containers. It's the truth.
even if that is correct, it is a pretty weird/funny argument to say it's the truth ... :)
There's still an unrelenting torrent of security issues from it.
Look at the discussion on the issue report or do basic research on the topic. It's your proposal, if you haven't done even basic research that's your problem.
I did, but we differ about the interpretations (see below)
Maybe wait until that stops before proposing this.
Vulnerabilities in kernel features will never stop to exist. If we disable everything with potential vulnerabilities, we did not have a kernel anymore.
It's a very niche feature with better alternatives for sandboxes and app containers. It exposes all of the netfilter administration code and tons of other networking and mount code as new attack surface.
Android uses minijail (default app sandbox in android 7), which relies on user namespaces… Just opened a terminal on my android and checked it. Its inside a user namespaces.
No, that's incorrect and you're just further demonstrating how far out of your depth you are here. Google doesn't even enable user namespaces in the kernel in AOSP / stock Android for Nexus/Pixel. Doubt that any other vendors are enabling it. It doesn't use any namespaces other than mount namespaces as part of the multi-user emulation for backwards compatibility. It certainly doesn't use minijail as the 'default app sandbox'. It uses minijail as a library to factor out common patterns involved in privilege dropping, like dropping capabilities. The app sandbox is done with uid/gid pairs (AIDs) and the full system SELinux policy (untrusted_app domain for regular non-platform apps and isolated_app for isolatedProcess services). Permissions are generally done with IPC checks but some are done with secondary groups. Before it had SELinux, it was just using the POSIX user/group/permission model to implement the app sandbox and that's still the base. It has no use case at all for user namespaces, and process namespaces would not really have much use either due to hidepid=2 since 7.x combined with uid isolation. It would just be a mess since they turn a process into a subreaper / secondary init.
Trying to explain to me how Android works from skimming and misinterpreting news / documentation and making incorrect assumptions is not going to get you far.
Considering what you do for a living I believe you here.
However that also means that A LOT of documentation about how chromium, android and minijail work is completely wrong. Which is kinda disturbing...
The documentation isn't wrong. Chromium never claims to have a mandatory dependency on user namespaces since they've kept the setuid sandbox and Android's documentation *definitely* doesn't claim to use them. Android has never enabled user namespaces and has no use for them.
Again no real life example for an alternative
Android, which was given as an example. You are going out of the way to ignore all of the information that's right in front of you.
I am talking about alternatives that provide the same funktionality as the full set of namespaces like bubblewrap does.
I can point to 30+ kernel bugs from the
past couple years that are privesc via user namespaces. Also those kernel vulnerabilities impact *everyone*.
Please do point out some from the last 6 mounth.
CVE-2016-8655 is a simple one that comes to mind. Not accessible attack surface to unprivileged users without user namespaces. There are a bunch more though!
Now I get this. Your risk assessment includes all vulnerabilities in all parts of the kernel that are available to unprivileged users because of user namespaces. That does make sense but:
There are A LOT of features that provide simular access to these kernel parts and would make those volnerabilities exploitable for normal users. That's why I do not share this assessment, although I have to admit that the provided attack surface of userns is by itself way larger then by using other vectors.
There was an interesting presentations somewhere that talked about this, but I cannot find it right now, so I concede this point for now and agree to your assessment of the risks involved
There's no other kernel feature exposing all of that attack surface.
Solutions to change user namespaces inside the kernel? This isn’t the kernel mailing list and arch won’t patch the kernel, so I do not get what you are proposing.
The kernel change that's required is already upstream
Please provide a link, I would very much like to see this but could not find it so far.
user.max_cgroup_namespaces = 257166 user.max_ipc_namespaces = 257166 user.max_mnt_namespaces = 257166 user.max_net_namespaces = 257166 user.max_pid_namespaces = 257166 user.max_user_namespaces = 257166 user.max_uts_namespaces = 257166
A starting point is always setting max_user_namespaces to 0 by default, and enabling the feature at compile-time. The proper way to do this is not forcing people to toggle it on globally to use container software depending on it though. It's scoped per userns and it's meant to be only exposed where it's needed. The way the kernel implemented it makes this painful, but it's doable. It would make sense to enable it, disabled by default via the sysctl, with a policy to not automatically enable it in packages, with the goal of a proper scoped implementation. Or just take a sane approach to sandboxing / app containers...
The people responsible for linux distributions like debian, red hat and pretty much all other distros, as well as many developers of sandboxing applications including the tails and chromium people all believe this feature is a useful tool to provide unprivileged sandbox applications worth the risk.
I haven't seen any such assessment by them about the risk vs. reward and comparing it to alternative solutions from a security perspective. The Chromium change has a lot more to do with them only really caring about ChromeOS (where they can disable userns everywhere but the spawning process) and Android (where it's not needed due to a better alternative and user namespaces aren't available).
An argument from authority is worth nothing particularly when those people are not actually saying what you claim they are, and here is someone that works full time on infosec that's telling you otherwise.
You are right there is no assessment of these people I can point to, but that was not what I was trying to say anyway.
The point is: All those distros, everyone except arch has decided at some point to no longer restrict the use of unprivileged user namespaces. That's the result we have today, that cannot be denied.
So by enableing this feature I do see a decision that involves the risks. You can of course claim they do not know what they are doing but I think that would be pretty arogant to do.
The majority of people working on desktop Linux security definitely have no clue. It's a disaster and container tech is a terrible approach to addressing it that brings many drawbacks vs. better solutions elsewhere.
I think it's laughable really. You can escape from all of these app container 'sandbox' implementations via pulseaudio / dbus. The only proper sandbox you named is the Chromium one, and it doesn't have a hard dependency on this. It's only one of the options to make it work, and they choose to do things differently elsewhere. It's all about the expedience of using an available feature for a platform that's not exactly first tier (desktop Linux) like ChromeOS, Android and Windows.
In any case: arch is the last distribution to disable this feature and I doubt this will go away anytime soon, plus more programms will rely on it.
It's not the last distribution to not have this enabled at all, and some of the distributions enabling it are constraining it to be disabled by default or only accessible to privileged users. The grsecurity patch set restricts user namespaces to privileged users too.
So even assuming that I am in no position to assess the risks involved, I think it would be obvious to question this decision when everyone else seems to think otherwise. Not that majorities are anything to go by but the maintainers of other distros are not stupid either...