[arch-general] user namespaces

Daniel Micay danielmicay at gmail.com
Thu Feb 2 10:28:09 UTC 2017


On Thu, 2017-02-02 at 02:40 +0100, sivmu wrote:
> 
> Am 01.02.2017 um 21:21 schrieb Daniel Micay via arch-general:
> > > > it's a nearly useless feature. 
> > > 
> > > That's a baseless claim, that was already proved wrong in my first
> > > post
> > > by the many applications that use this feature.
> > 
> > That doesn't demonstrate that it's useful relative to the
> > alternatives.
> > It enables unprivileged OS containers but isn't really any use for
> > app
> > containers.
> > 
> 
> Pretty much all famous container programms use this. I wonder why if
> there is no use for it.
> 
> Also I would still like to see a simple alternative for unprivileged
> namespaces to sandbox apps.
> How do you provide something like bubblewrap without user namespaces?
> And no that android example below is not the same as long as there is
> no
> simple way to use this (which I am not aware of)

Doing things properly is not easy.

> > > > but no one really wants it for that reason. They
> > > > want it because it started pretending that it can offer
> > > > something
> > > > that
> > > > it can't actually deliver safely.
> > > 
> > > Again a claim without prove
> > 
> > The proof is easy to find. You're the one making a proposal but you
> > clearly haven't done your research. It's not my job to spoon feed
> > you.
> > 
> 
> I do know some of the discussions about this feature on the kernel
> mailing list. But the opinions even there are not as clear as you want
> to make us believe.

The kernel configuration disables it by default. It enables UTS, IPC,
PID and NET namespaces by default. That's the opinion from upstream on
the sane default for a general purpose build: disabled.

It is quite clear that it's a major security risk. It exposes an endless
stream of privesc vulnerabilities from all of the attack surface it
adds. That attack surface was never exposed like that before and the
code is not at all robust against attackers, since it was only exposed
to root users before. It is going to take years for it to settle down
and become more like core kernel code that was already exposed, and it's
always going to be a ton of extra attack surface.

> > > > There are much better ways to do
> > > > unprivileged sandboxes with significantly less risk than
> > > > CLONE_NEWUSER
> > > > or setuid executables where the user controls the environment.
> > > 
> > > And yet you fail to name even one alternative. Please do
> > 
> > Uh, yeah, I did. M
> > 
> 
> Sorry but 'M' ? I don't get it.
> 
> > > > Anything
> > > > depending on this mechanism instead of properly designed
> > > > plumbing
> > > > for it
> > > > is simply lazy garbage.
> > > 
> > > Another baseless and arrogant claim
> > 
> > Not baseless and it's not arrogant to point out that this is a bad
> > feature for app containers. It's the truth.
> 
> even if that is correct, it is a pretty weird/funny argument to say
> it's
> the truth ... :)
> 
> 
> > > > There's still an unrelenting torrent of security issues from
> > > > it. 
> > > 
> > > Name one
> > 
> > Look at the discussion on the issue report or do basic research on
> > the
> > topic. It's your proposal, if you haven't done even basic research
> > that's your problem.
> 
> I did, but we differ about the interpretations (see below)
> 
> 
> > 
> > > > Maybe wait until that stops before proposing this. 
> > > 
> > > Vulnerabilities in kernel features will never stop to exist. If we
> > > disable everything with potential vulnerabilities, we did not have
> > > a
> > > kernel anymore.
> > 
> > It's a very niche feature with better alternatives for sandboxes and
> > app
> > containers. It exposes all of the netfilter administration code and
> > tons
> > of other networking and mount code as new attack surface.
> 
> Point taken
> 
> 
> > > 
> > > Android uses minijail (default app sandbox in android 7), which
> > > relies
> > > on user namespaces…
> > > Just opened a terminal on my android and checked it. Its inside a
> > > user
> > > namespaces.
> > 
> > No, that's incorrect and you're just further demonstrating how far
> > out
> > of your depth you are here. Google doesn't even enable user
> > namespaces
> > in the kernel in AOSP / stock Android for Nexus/Pixel. Doubt that
> > any
> > other vendors are enabling it. It doesn't use any namespaces other
> > than
> > mount namespaces as part of the multi-user emulation for backwards
> > compatibility. It certainly doesn't use minijail as the 'default app
> > sandbox'. It uses minijail as a library to factor out common
> > patterns
> > involved in privilege dropping, like dropping capabilities. The app
> > sandbox is done with uid/gid pairs (AIDs) and the full system
> > SELinux
> > policy (untrusted_app domain for regular non-platform apps and
> > isolated_app for isolatedProcess services). Permissions are
> > generally
> > done with IPC checks but some are done with secondary groups. Before
> > it
> > had SELinux, it was just using the POSIX user/group/permission model
> > to
> > implement the app sandbox and that's still the base. It has no use
> > case
> > at all for user namespaces, and process namespaces would not really
> > have
> > much use either due to hidepid=2 since 7.x combined with uid
> > isolation.
> > It would just be a mess since they turn a process into a subreaper /
> > secondary init.
> > 
> > Trying to explain to me how Android works from skimming and
> > misinterpreting news / documentation and making incorrect
> > assumptions is
> > not going to get you far.
> > 
> 
> Considering what you do for a living I believe you here.
> 
> However that also means that A LOT of documentation about how
> chromium,
> android and minijail work is completely wrong. Which is kinda
> disturbing...

The documentation isn't wrong. Chromium never claims to have a mandatory
dependency on user namespaces since they've kept the setuid sandbox and
Android's documentation *definitely* doesn't claim to use them. Android
has never enabled user namespaces and has no use for them.


> > > > 
> > > 
> > > Again no real life example for an alternative
> > 
> > Android, which was given as an example. You are going out of the way
> > to
> > ignore all of the information that's right in front of you.
> > 
> 
> I am talking about alternatives that provide the same funktionality as
> the full set of namespaces like bubblewrap does.
> 
> 
> 
> > >  I can point to 30+ kernel bugs from the
> > > > past couple years that are privesc via user namespaces. Also
> > > > those
> > > > kernel vulnerabilities impact *everyone*.
> > > > 
> > > 
> > > Please do point out some from the last 6 mounth.
> > 
> > CVE-2016-8655 is a simple one that comes to mind. Not accessible
> > attack
> > surface to unprivileged users without user namespaces. There are a
> > bunch
> > more though!
> > 
> 
> Now I get this.
> Your risk assessment includes all vulnerabilities in all parts of the
> kernel that are available to unprivileged users because of user
> namespaces. That does make sense but:
> 
> There are A LOT of features that provide simular access to these
> kernel
> parts and would make those volnerabilities exploitable for normal
> users.
> That's why I do not share this assessment, although I have to admit
> that
> the provided attack surface of userns is by itself way larger then by
> using other vectors.
> 
> There was an interesting presentations somewhere that talked about
> this,
> but I cannot find it right now, so I concede this point for now and
> agree to your assessment of the risks involved

There's no other kernel feature exposing all of that attack surface.

> > > Solutions to change user namespaces inside the kernel? This isn’t
> > > the
> > > kernel mailing list and arch won’t patch the kernel, so I do not
> > > get
> > > what you are proposing.
> > 
> > The kernel change that's required is already upstream
> > 
> 
> Please provide a link, I would very much like to see this but could
> not
> find it so far.

sysctl:

user.max_cgroup_namespaces = 257166
user.max_ipc_namespaces = 257166
user.max_mnt_namespaces = 257166
user.max_net_namespaces = 257166
user.max_pid_namespaces = 257166
user.max_user_namespaces = 257166
user.max_uts_namespaces = 257166

A starting point is always setting max_user_namespaces to 0 by default,
and enabling the feature at compile-time. The proper way to do this is
not forcing people to toggle it on globally to use container software
depending on it though. It's scoped per userns and it's meant to be only
exposed where it's needed. The way the kernel implemented it makes this
painful, but it's doable. It would make sense to enable it, disabled by
default via the sysctl, with a policy to not automatically enable it in
packages, with the goal of a proper scoped implementation. Or just take
a sane approach to sandboxing / app containers...

> > > The people responsible for linux distributions like debian, red
> > > hat
> > > and
> > > pretty much all other distros, as well as many developers of
> > > sandboxing
> > > applications including the tails and chromium people all believe
> > > this
> > > feature is a useful tool to provide unprivileged sandbox
> > > applications
> > > worth the risk.
> > 
> > I haven't seen any such assessment by them about the risk vs. reward
> > and
> > comparing it to alternative solutions from a security perspective.
> > The
> > Chromium change has a lot more to do with them only really caring
> > about
> > ChromeOS (where they can disable userns everywhere but the spawning
> > process) and Android (where it's not needed due to a better
> > alternative
> > and user namespaces aren't available).
> > 
> > An argument from authority is worth nothing particularly when those
> > people are not actually saying what you claim they are, and here is
> > someone that works full time on infosec that's telling you
> > otherwise.
> > 
> 
> You are right there is no assessment of these people I can point to,
> but
> that was not what I was trying to say anyway.
> 
> The point is:
> All those distros, everyone except arch has decided at some point to
> no
> longer restrict the use of unprivileged user namespaces. That's the
> result we have today, that cannot be denied.
>
> So by enableing this feature I do see a decision that involves the
> risks. You can of course claim they do not know what they are doing
> but
> I think that would be pretty arogant to do.

The majority of people working on desktop Linux security definitely have
no clue. It's a disaster and container tech is a terrible approach to
addressing it that brings many drawbacks vs. better solutions elsewhere.

I think it's laughable really. You can escape from all of these app
container 'sandbox' implementations via pulseaudio / dbus. The only
proper sandbox you named is the Chromium one, and it doesn't have a hard
dependency on this. It's only one of the options to make it work, and
they choose to do things differently elsewhere. It's all about the
expedience of using an available feature for a platform that's not
exactly first tier (desktop Linux) like ChromeOS, Android and Windows.

> In any case: arch is the last distribution to disable this feature and
> I
> doubt this will go away anytime soon, plus more programms will rely on
> it.

It's not the last distribution to not have this enabled at all, and some
of the distributions enabling it are constraining it to be disabled by
default or only accessible to privileged users. The grsecurity patch set
restricts user namespaces to privileged users too.

> So even assuming that I am in no position to assess the risks
> involved,
> I think it would be obvious to question this decision when everyone
> else
> seems to think otherwise. Not that majorities are anything to go by
> but
> the maintainers of other distros are not stupid either...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 866 bytes
Desc: This is a digitally signed message part
URL: <https://lists.archlinux.org/pipermail/arch-general/attachments/20170202/cef8eff6/attachment.asc>


More information about the arch-general mailing list