[arch-dev-public] GitLab switchover and SSO migration
Hi everyone, As some of you know, we've been toying with two ideas for a while: Arch-wide centralized user management as well as using GitLab to consolidate some of our current services. The overall goal is manifold. In no particular order, the goals are to - make Arch more contributor friendly - provide more modern tools for ourselves - enable more automation - make Arch services more secure - make team management activities less error-prone and more streamlined These two topics (SSO + Gitlab) are a bit intertwined because we wanted to have SSO on GitLab before starting off with that so we'd have a properly validated user base to work with going forward. Also, GitLab seemed like a good first service for SSO due to it having good support for that. After looking at various solutions, we eventually settled on Keycloak since it seemed like a modern, well-maintained, and secure piece of software. It allows us to enable logins for services via OpenID Connect and SAML which is likely the best coverage we could hope for. It also allows us to connect other social login providers such as github.com and gitlab.com and it supports Recaptcha, 2FA, and WebAuthn out of the box. The idea is to eventually transition all our online service as well as SSH keys to Keycloak to ease on-/offboarding and make it less error-prone. As for GitLab, a few months back, we applied for a GitLab Ultimate license in their open-source program [0] and we received one [1]. It's an official program that many other open-source projects benefit from as well and we think it's safe to assume that it'll continue being a thing for the foreseeable future. We have to renew our license yearly. The current license we have has support for 1000 seats but we can likely get more seats if we need them. Our general path is: 1) Transition as many staff-only services to Keycloak as possible. We looked at our current services and put up a table on the wiki that shows support per service [2]. Some of the services that we operate are deprecated or have functionality that is also provided by GitLab. In our current understanding, this concerns Flyspray, Kanboard, and Patchwork. Of those, Kanboard and some of the Flyspray projects will be our first targets to transition to GitLab. We'll continue using Flyspray for the time being for package bugs but will discontinue its use for all non-packaging bugs. The reason for this is that how we manage our bugs for packages is somewhat intertwined with the svn2git migration which is yet to be done and might dictate a different repo structure than what we would come up with currently and we don't want to block on this. This was also discussed in a recent DevOps meeting [3]. 2) We'd like to get rid of our own cgit instance at git.archlinux.org and transition our git hosting into GitLab. AUR git access will stay as it is due to its special shell magic. 3) Eventually, after an internal testing phase of at least a few weeks, we'll want to open Keycloak and GitLab up for outside contributors. We know of the abuse potential and the potential moderation problems and have to make sure to set proper limits and set up monitoring before opening this up. 4) Connect remaining services like BBS and wiki to SSO. In 1) we only mentioned staff-only services because those are less problematic. However, in the future, we'd also like to enable our remaining services to connect to Keycloak. We tried hard to come up with a good source of users to import into Keycloak so that we could seed that database with a solid user base but sadly it appears that there is no trusted source of users that we can rely on. Potential candidates were the wiki, BBS, AUR but we ruled them all out in their current state as none of them have always had email verification and so we can't trust those emails to be the sole source of truth. In order to still allow users to keep their old contributions in cases where they can prove their identity via email, we'll build a new small web application that allows them to connect their new Keycloak identity to their other identities. For now, we seeded the Keycloak database with the only known-good source of trusted emails: Staff from Archweb. We'd like to make heavy use of GitLab CI for running automatic tests and release automation. We're aware that the implication of eventually allowing non-staff users to come in will result in untrusted code being run on our CI. This is fine by itself but security-wise would prevent us from creating trusted releases on the same CI runners. We currently have two sponsored bare-metal GitLab CI runners that we plan on using for running untrusted code. We'll get a new bare-metal box from Hetzner for trusted releases that will only run on hand-picked pipelines that only a select few of us can push to. Bare metal runners also allow us to test and build VM images and such which isn't usually possible on most VPS. On more goal we had is automatic github.com mirroring in some fashion. We looked at creating a two-way github.com <-> GitLab mirror but that setup can break easily in the case of force pushes and race conditions and also would have us looking at both places for pull requests. It seems simpler to us for the time being to have one-way mirroring from our GitLab to github.com only and then allow github.com users to easily collaborate on our GitLab via github.com social login. It's a little bit more hassle for the users than collaborating directly on github.com but it's a lot less hassle for us so it's perhaps the best compromise. These changes will affect all of you in some way but we'll try to make it as painless as possible. As we progress, we'll send concise emails with instructions on what to do. I'm sure that we missed a few details here and there but I'm still convinced that this is a change for the better across the board, even if we don't immediately get everything right. Sorry for the long mail. All of this has been a long time in the making and has been the subject of at least four hackathons with many hours spent in-between. It was a ton of work but I'm happy that we're finally at a stage where we can present something tangible with a plan of attack. Cheers, Sven and the DevOps team [0] https://gitlab.com/gitlab-com/marketing/community-relations/gitlab-oss [1] https://gitlab.com/gitlab-com/marketing/community-relations/gitlab-oss/-/com... [2] https://wiki.archlinux.org/index.php/DeveloperWiki:SSOMigration [3] https://wiki.archlinux.org/index.php/DeveloperWiki:DevopsMeetings/2020-05-06
On Sun, 17 May 2020 at 14:39:25, Sven-Hendrik Haase via arch-dev-public wrote:
As some of you know, we've been toying with two ideas for a while: Arch-wide centralized user management as well as using GitLab to consolidate some of our current services. The overall goal is manifold. In no particular order, the goals are to - make Arch more contributor friendly - provide more modern tools for ourselves - enable more automation - make Arch services more secure - make team management activities less error-prone and more streamlined
Great! Thanks a lot for working on this!
After looking at various solutions, we eventually settled on Keycloak since it seemed like a modern, well-maintained, and secure piece of software. It allows us to enable logins for services via OpenID Connect and SAML which is likely the best coverage we could hope for. It also allows us to connect other social login providers such as github.com and gitlab.com and it supports Recaptcha, 2FA, and WebAuthn out of the box. The idea is to eventually transition all our online service as well as SSH keys to Keycloak to ease on-/offboarding and make it less error-prone.
Is 2FA going to be opt-in? Will it be mandatory for members of the staff? You mentioned increased security before but having a single password also increases the attack surface in case a password is stolen.
1) Transition as many staff-only services to Keycloak as possible. We looked at our current services and put up a table on the wiki that shows support per service [2]. Some of the services that we operate are deprecated or have functionality that is also provided by GitLab. In our current understanding, this concerns Flyspray, Kanboard, and Patchwork. Of those, Kanboard and some of the Flyspray projects will be our first targets to transition to GitLab. We'll continue using Flyspray for the time being for package bugs but will discontinue its use for all non-packaging bugs. The reason for this is that how we manage our bugs for packages is somewhat intertwined with the svn2git migration which is yet to be done and might dictate a different repo structure than what we would come up with currently and we don't want to block on this. This was also discussed in a recent DevOps meeting [3].
When saying "svn2git migration", are you referring to the already existing svntogit repositories on git.archlinux.org or to the migration of our main package VCS to Git? Could you elaborate on how svn2git migration and Flyspray are intertwined? I could not find any details in [3].
2) We'd like to get rid of our own cgit instance at git.archlinux.org and transition our git hosting into GitLab. AUR git access will stay as it is due to its special shell magic.
You may already be aware of this but I'd like to clarify, especially since the DevOps meeting notes [3] mention that "git.archlinux.org likely needs to be kept for svn2git and the AUR": the cgit instance for AUR packages is entirely separate from the cgit instance running git.archlinux.org --- shutting down git.archlinux.org should not impact aur.archlinux.org in any way.
We tried hard to come up with a good source of users to import into Keycloak so that we could seed that database with a solid user base but sadly it appears that there is no trusted source of users that we can rely on. Potential candidates were the wiki, BBS, AUR but we ruled them all out in their current state as none of them have always had email verification and so we can't trust those emails to be the sole source of truth. In order to still allow users to keep their old contributions in cases where they can prove their identity via email, we'll build a new small web application that allows them to connect their new Keycloak identity to their other identities. For now, we seeded the Keycloak database with the only known-good source of trusted emails: Staff from Archweb.
Email addresses of aurweb accounts have to be confirmed (and accounts without verified email addresses are not usable and can be filtered out by a simple SQL query). Old accounts have been purged in ~2014 and, to the best of my knowledge, there should not be any active accounts left that did not go through the email verification process.
On more goal we had is automatic github.com mirroring in some fashion. We looked at creating a two-way github.com <-> GitLab mirror but that setup can break easily in the case of force pushes and race conditions and also would have us looking at both places for pull requests. It seems simpler to us for the time being to have one-way mirroring from our GitLab to github.com only and then allow github.com users to easily collaborate on our GitLab via github.com social login. It's a little bit more hassle for the users than collaborating directly on github.com but it's a lot less hassle for us so it's perhaps the best compromise.
One argument for users to prefer GitHub is that many already have an account there. Ceasing GitHub support might have an impact on engagement and make it less likely for one-time contributors to submit a simple patch. Maybe that's a sacrifice we're willing to make, though. On a related note, will this impact projects that prefer email patch submissions in any way (except that they can now opt-in to GitLab too)? Best, Lukas
[0] https://gitlab.com/gitlab-com/marketing/community-relations/gitlab-oss [1] https://gitlab.com/gitlab-com/marketing/community-relations/gitlab-oss/-/com... [2] https://wiki.archlinux.org/index.php/DeveloperWiki:SSOMigration [3] https://wiki.archlinux.org/index.php/DeveloperWiki:DevopsMeetings/2020-05-06
On 5/17/20 4:37 PM, Lukas Fleischer via arch-dev-public wrote:
Email addresses of aurweb accounts have to be confirmed (and accounts without verified email addresses are not usable and can be filtered out by a simple SQL query). Old accounts have been purged in ~2014 and, to the best of my knowledge, there should not be any active accounts left that did not go through the email verification process. I think the other concern here was that after you've confirmed your email address, you can modify it and it won't be re-verified.
The same rule applies to flyspray since it will email you a code to complete registration, but doesn't use a verification link when you change it. The current state of the forum and the wiki do, however, require verifying changes of email address for both sites. :) -- Eli Schwartz Bug Wrangler and Trusted User
On Sun, 17 May 2020 at 17:38:33, Eli Schwartz via arch-dev-public wrote:
On 5/17/20 4:37 PM, Lukas Fleischer via arch-dev-public wrote:
Email addresses of aurweb accounts have to be confirmed (and accounts without verified email addresses are not usable and can be filtered out by a simple SQL query). Old accounts have been purged in ~2014 and, to the best of my knowledge, there should not be any active accounts left that did not go through the email verification process.
I think the other concern here was that after you've confirmed your email address, you can modify it and it won't be re-verified.
Good point. I would say that this should be considered a bug and should be fixed as soon as possible... Lukas
On 17.05.20 22:37, Lukas Fleischer via arch-dev-public wrote:
On Sun, 17 May 2020 at 14:39:25, Sven-Hendrik Haase via arch-dev-public wrote:
As some of you know, we've been toying with two ideas for a while: Arch-wide centralized user management as well as using GitLab to consolidate some of our current services. The overall goal is manifold. In no particular order, the goals are to - make Arch more contributor friendly - provide more modern tools for ourselves - enable more automation - make Arch services more secure - make team management activities less error-prone and more streamlined
Great! Thanks a lot for working on this!
After looking at various solutions, we eventually settled on Keycloak since it seemed like a modern, well-maintained, and secure piece of software. It allows us to enable logins for services via OpenID Connect and SAML which is likely the best coverage we could hope for. It also allows us to connect other social login providers such as github.com and gitlab.com and it supports Recaptcha, 2FA, and WebAuthn out of the box. The idea is to eventually transition all our online service as well as SSH keys to Keycloak to ease on-/offboarding and make it less error-prone.
Is 2FA going to be opt-in? Will it be mandatory for members of the staff? You mentioned increased security before but having a single password also increases the attack surface in case a password is stolen.
The current idea is to make 2FA mandatory for all staff (because let's face it, we have a lot of staff and a single hacked staff account could cause A LOT of trouble). As of right now, only DevOps people are forced to use 2FA but this requirement will be expanded soon. For ease of collaboration, we're not planning to force outside contributors to use 2FA for the time being but we'll monitor that situation.
1) Transition as many staff-only services to Keycloak as possible. We looked at our current services and put up a table on the wiki that shows support per service [2]. Some of the services that we operate are deprecated or have functionality that is also provided by GitLab. In our current understanding, this concerns Flyspray, Kanboard, and Patchwork. Of those, Kanboard and some of the Flyspray projects will be our first targets to transition to GitLab. We'll continue using Flyspray for the time being for package bugs but will discontinue its use for all non-packaging bugs. The reason for this is that how we manage our bugs for packages is somewhat intertwined with the svn2git migration which is yet to be done and might dictate a different repo structure than what we would come up with currently and we don't want to block on this. This was also discussed in a recent DevOps meeting [3].
When saying "svn2git migration", are you referring to the already existing svntogit repositories on git.archlinux.org or to the migration of our main package VCS to Git?
I was talking about the latter point you mentioned: the permanent conversion of our repos from svn to git. The svntogit repos, on the other hand, can likely be easily migrated to GitLab. To those who don't know: This is a git-accessible read-only mirror for our svn package repos currently hosted at git.archlinux.org [0][1]. Maybe there's a complication I don't know about right now but I don't see why not.
Could you elaborate on how svn2git migration and Flyspray are intertwined? I could not find any details in [3].
We currently don't have a perfect grasp on what needs doing for svn2git seeing as that project was begun a few times and stopped just as often. I think the last idea was to have a git repo per package and then have a metadata repo gather a bunch of package metadata but I'm not sure this was ever perfectly fleshed out and I don't want to block this migration on that decision. If we _do_ go with one repo per package then we could track bugs for each package in its repo right there on GitLab and personally, I think this is what we should go with. In this way, the Flyspray migration of package bugs and the GitLab migration are intertwined since we have to know _where_ the bugs will go in the end. It does make sense to have package sources and package bugs together in the same repo. However, doing this also requires some more tooling from our side (you're not going to create 10000 repos in GitLab by hand and keep them in sync if you decide to change something). Again, we don't want to block on that. Keeping Flyspray running for only package bugs but migrating the project issues will already improve the status quo. freswa had some opinions on that as well and a plan of attack.
2) We'd like to get rid of our own cgit instance at git.archlinux.org and transition our git hosting into GitLab. AUR git access will stay as it is due to its special shell magic. You may already be aware of this but I'd like to clarify, especially since the DevOps meeting notes [3] mention that "git.archlinux.org likely needs to be kept for svn2git and the AUR": the cgit instance for AUR packages is entirely separate from the cgit instance running git.archlinux.org --- shutting down git.archlinux.org should not impact aur.archlinux.org in any way.
Indeed, thanks for clearing that up. To be clear: The only way we currently plan on touching AUR is by connecting it with Keycloak.
We tried hard to come up with a good source of users to import into Keycloak so that we could seed that database with a solid user base but sadly it appears that there is no trusted source of users that we can rely on. Potential candidates were the wiki, BBS, AUR but we ruled them all out in their current state as none of them have always had email verification and so we can't trust those emails to be the sole source of truth. In order to still allow users to keep their old contributions in cases where they can prove their identity via email, we'll build a new small web application that allows them to connect their new Keycloak identity to their other identities. For now, we seeded the Keycloak database with the only known-good source of trusted emails: Staff from Archweb.
Email addresses of aurweb accounts have to be confirmed (and accounts without verified email addresses are not usable and can be filtered out by a simple SQL query). Old accounts have been purged in ~2014 and, to the best of my knowledge, there should not be any active accounts left that did not go through the email verification process.
According to the side thread, this seems to only be partially dependable. I defer to that other thread.
On more goal we had is automatic github.com mirroring in some fashion. We looked at creating a two-way github.com <-> GitLab mirror but that setup can break easily in the case of force pushes and race conditions and also would have us looking at both places for pull requests. It seems simpler to us for the time being to have one-way mirroring from our GitLab to github.com only and then allow github.com users to easily collaborate on our GitLab via github.com social login. It's a little bit more hassle for the users than collaborating directly on github.com but it's a lot less hassle for us so it's perhaps the best compromise.
One argument for users to prefer GitHub is that many already have an account there. Ceasing GitHub support might have an impact on engagement and make it less likely for one-time contributors to submit a simple patch. Maybe that's a sacrifice we're willing to make, though.
I do understand and I actually was strongly in favour of two-way mirroring originally before trying it out. :) It can be done but has some extra complexity associated with it (hooks to ensure atomicity, GitHub API usage for CI in Github-based PRs). The compromise we agreed on was to add github.com as a social account provider for our Keycloak to allow users with GitHub accounts to quickly log in to our GitLab via that. We'll see how that goes I guess and if it turns out to be too much of a bother, I guess we'll work something out. I don't want to block the transition on that, though.
On a related note, will this impact projects that prefer email patch submissions in any way (except that they can now opt-in to GitLab too)?
Yes and no: The current idea is to stop operating patchwork, the current primary way for namcap, pacman, AUR, archiso to accept patches via email. However, GitLab has some support for email-based collaboration [2]. David, who recently became archiso maintainer, is fine with taking archiso to GitLab and Allan is fine taking pacman dev over there as well. It is my hope that the other projects who are currently primarily relying on email patches will follow suit. It would be nice to consolidate everything onto the same platform. We're in no hurry to shut down patchwork right now though so we'll give everyone some time to evaluate GitLab and if it turns out that it can't support some much-desired development workflows, we'll re-evaluate. Cheers, Sven [0] https://git.archlinux.org/svntogit/community.git/ [1] https://git.archlinux.org/svntogit/packages.git/ [2] https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_reques...
On 5/17/20 7:22 PM, Sven-Hendrik Haase via arch-dev-public wrote:
On a related note, will this impact projects that prefer email patch submissions in any way (except that they can now opt-in to GitLab too)?
Yes and no: The current idea is to stop operating patchwork, the current primary way for namcap, pacman, AUR, archiso to accept patches via email. However, GitLab has some support for email-based collaboration [2]. David, who recently became archiso maintainer, is fine with taking archiso to GitLab and Allan is fine taking pacman dev over there as well. It is my hope that the other projects who are currently primarily relying on email patches will follow suit. It would be nice to consolidate everything onto the same platform. We're in no hurry to shut down patchwork right now though so we'll give everyone some time to evaluate GitLab and if it turns out that it can't support some much-desired development workflows, we'll re-evaluate.
Cheers, Sven
[0] https://git.archlinux.org/svntogit/community.git/ [1] https://git.archlinux.org/svntogit/packages.git/ [2] https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_reques...
ISTR some discussion long in the past about how gitlab could "do it all", but I poked into it recently out of curiosity and the status seems to be more confusing than that. This email submission thing is imperfect, since as far as I've been able to tell it creates a per-user email address for users who already have a gitlab account, so no anonymous or ad-hoc submissions. And it is strictly one way -- i.e. it enters the gitlab silo and emails don't leave. You could, I guess, email a mailing list or the project maintainers manually, and bcc your secret email submission endpoint with its user API token address. That would, however, simply end up with the two discussions being completely siloed away from each other. I haven't been able to find discussion about gitlab cc'ing a mailing list for issue or patch discussion, or even archive purposes. There is of course email notification that doesn't invoke a mailing list, but that doesn't provide patch diffs for inline comments so it's not quite the same experience and I think you'll inevitably end up interacting with merge requests mainly via the gitlab website (leaving aside the question of "what if someone just really likes mailing lists"). -- Eli Schwartz Bug Wrangler and Trusted User
participants (3)
-
Eli Schwartz
-
Lukas Fleischer
-
Sven-Hendrik Haase