[aur-general] Using git as a backend for the AUR
All, So in my spare time I was thinking about the AUR and how it could be better. Back in January I commented on a bug[1] about integrating the AUR and git to have a powerful, robust backend for the AUR. I think that the original idea of creating one massive repository was inherently flawed for most users, as it requires that the entire repository be cloned, and that is no trivial task for people with only moderate internet speeds. Were it similar to the way the official repositories git setup is, then it would be okay for fetching, but using single repositories for each project would make access control much less of a nightmare. At the moment I have no experience with PHP or messing with the AUR, but I have been playing around with git and using it to track packages and the like, as well as some minimal experience with access control and git. Currently, this is kind of the roadmap I see in my head: * Set up access controls for the AUR users to access over ssh using keys. While I could see this being somewhat arduous, it shouldn't be terribly hard to automate or set up. Something like gitolite or whatever should make it simpler. * Each repository on the server would contain a single package (if someone decides to do a split package on the AUR it would contain the whole set of packages), allowing for multiple users to have push access and maintain the packages. * The repositories would be limited to 5M, unless given special permission (certain kernel packages with lots of massive configuration files) which should make enough room for the needed files, while helping to enforce the idea of not putting retrievable sources in the source tarball. I did some quick math using the number of packages on the AUR right now, and if every repo used that 5M limit, it would require a bit over 200G. Granted, I doubt that this limit would be reached for most packages. * With the repository set up, give maintainers a week or so to upload history for their packages (in case they already keep their repos in this set up), and new packages would get put directly into a new repo. * Once the week for transition is over, begin moving all of the current packages in the AUR to the git format. This basically concludes setting up the git stuff, which I have done on a local machine on a very small scale (5 packages) in the past. One thing I was thinking of was the use of `makepkg -S` tarballs still being available. cryptocrack came up with a good point, that just simply copying things over could overwrite the .git directory in the designated directory. I don't think that would be too hard to get around, but I'm no expert. I wrote a quick and dirty script that extracts and commits to the respective repository. Currently, almost no packages use a .changelog file, but there could be some minimal parsing done of a packages .changelog file to craft a git commit message from a tarball. Obviously there needs to be some kind of security check, but that could be done by the frontend using the system we have now for source tarballs. At the moment, I am unsure exactly how the AUR parses the PKGBUILDs it receives, but I'm sure that this could easily be adapted to the git system. One major advantage to having the AUR managed this way is that it would allow users to check for updates on the AUR without the need for complex helpers outside of git. Secondly, it would make mirroring the AUR, or just parts of the AUR extremely easy. In case something happens to the AUR, people can still get their packages and maintainers can still update them very easily using git. It would also still allow people to grab tarballs of the current master branch. This way git is not a requirement to use the AUR, it just makes things a butt ton easier. Also, it would allow users to see the entire source of the package, not just the PKGBUILD, similar to how the official repository works, just that things would be in distinct git repositories not branches. I would really like to see this kind of thing come to fruition, and if anyone would be willing to help, then please say so. Pardon my scattered thoughts, -- William Giokas | KaiSforza GnuPG Key: 0x73CD09CF Fingerprint: F73F 50EF BBE2 9846 8306 E6B8 6902 06D8 73CD 09CF [1]: https://bugs.archlinux.org/task/23010
1. 5M is probably an overkill. I think 1-2M is usually enough. There are simply patches and PKGBUILD 2. https://github.com/libgit2/php-git 3. I can help if anyone needs to code in php side notes: why dont we have package deletion for maintainer?? On Sat, Mar 16, 2013 at 3:10 PM, William Giokas <1007380@gmail.com> wrote:
All,
So in my spare time I was thinking about the AUR and how it could be better. Back in January I commented on a bug[1] about integrating the AUR and git to have a powerful, robust backend for the AUR. I think that the original idea of creating one massive repository was inherently flawed for most users, as it requires that the entire repository be cloned, and that is no trivial task for people with only moderate internet speeds. Were it similar to the way the official repositories git setup is, then it would be okay for fetching, but using single repositories for each project would make access control much less of a nightmare.
At the moment I have no experience with PHP or messing with the AUR, but I have been playing around with git and using it to track packages and the like, as well as some minimal experience with access control and git.
Currently, this is kind of the roadmap I see in my head:
* Set up access controls for the AUR users to access over ssh using keys. While I could see this being somewhat arduous, it shouldn't be terribly hard to automate or set up. Something like gitolite or whatever should make it simpler. * Each repository on the server would contain a single package (if someone decides to do a split package on the AUR it would contain the whole set of packages), allowing for multiple users to have push access and maintain the packages. * The repositories would be limited to 5M, unless given special permission (certain kernel packages with lots of massive configuration files) which should make enough room for the needed files, while helping to enforce the idea of not putting retrievable sources in the source tarball. I did some quick math using the number of packages on the AUR right now, and if every repo used that 5M limit, it would require a bit over 200G. Granted, I doubt that this limit would be reached for most packages. * With the repository set up, give maintainers a week or so to upload history for their packages (in case they already keep their repos in this set up), and new packages would get put directly into a new repo. * Once the week for transition is over, begin moving all of the current packages in the AUR to the git format.
This basically concludes setting up the git stuff, which I have done on a local machine on a very small scale (5 packages) in the past. One thing I was thinking of was the use of `makepkg -S` tarballs still being available. cryptocrack came up with a good point, that just simply copying things over could overwrite the .git directory in the designated directory. I don't think that would be too hard to get around, but I'm no expert. I wrote a quick and dirty script that extracts and commits to the respective repository. Currently, almost no packages use a .changelog file, but there could be some minimal parsing done of a packages .changelog file to craft a git commit message from a tarball. Obviously there needs to be some kind of security check, but that could be done by the frontend using the system we have now for source tarballs.
At the moment, I am unsure exactly how the AUR parses the PKGBUILDs it receives, but I'm sure that this could easily be adapted to the git system.
One major advantage to having the AUR managed this way is that it would allow users to check for updates on the AUR without the need for complex helpers outside of git. Secondly, it would make mirroring the AUR, or just parts of the AUR extremely easy. In case something happens to the AUR, people can still get their packages and maintainers can still update them very easily using git. It would also still allow people to grab tarballs of the current master branch. This way git is not a requirement to use the AUR, it just makes things a butt ton easier. Also, it would allow users to see the entire source of the package, not just the PKGBUILD, similar to how the official repository works, just that things would be in distinct git repositories not branches.
I would really like to see this kind of thing come to fruition, and if anyone would be willing to help, then please say so.
Pardon my scattered thoughts, -- William Giokas | KaiSforza GnuPG Key: 0x73CD09CF Fingerprint: F73F 50EF BBE2 9846 8306 E6B8 6902 06D8 73CD 09CF
On 16/03/13 05:18 PM, Tai-Lin Chu wrote:
1. 5M is probably an overkill. I think 1-2M is usually enough. There are simply patches and PKGBUILD 2. https://github.com/libgit2/php-git 3. I can help if anyone needs to code in php
side notes: why dont we have package deletion for maintainer??
Because then packages that people find useful would mysteriously go missing. When things go missing under the current system, there is a publicly archived message stating when and why that was.
mysteriously? there is no uncertainty when a maintainer decides to remove the package. I feel any package should be fully controlled by the maintainer. To make things less "mysterious", it is possible to create remove history in the database. On Sat, Mar 16, 2013 at 6:20 PM, Connor Behan <connor.behan@gmail.com> wrote:
On 16/03/13 05:18 PM, Tai-Lin Chu wrote:
1. 5M is probably an overkill. I think 1-2M is usually enough. There are simply patches and PKGBUILD 2. https://github.com/libgit2/php-git 3. I can help if anyone needs to code in php
side notes: why dont we have package deletion for maintainer??
Because then packages that people find useful would mysteriously go missing. When things go missing under the current system, there is a publicly archived message stating when and why that was.
On Sun, Mar 17, 2013 at 2:38 AM, Tai-Lin Chu <tailinchu@gmail.com> wrote:
mysteriously? there is no uncertainty when a maintainer decides to remove the package. I feel any package should be fully controlled by the maintainer. To make things less "mysterious", it is possible to create remove history in the database.
A few days ago we had a case of a package 'mysteriously went missing' for not being compliant to the Arch PKGBUILD standards - deleted by a TU. I think this is wrong wrong wrong! There are only a few reason to delete a package build: If the source isn't available anymore, (eg. deleted or a move from svn to git) or if it doesn't build due to lack of maintenance. And I think it's a good idea to do it in public, eg. on the ML and not by the (current) maintainer!
On Sun, Mar 17, 2013 at 2:38 AM, Tai-Lin Chu <tailinchu@gmail.com> wrote:
I feel any package should be fully controlled by the maintainer.
It shouldn’t. Long story short: https://mailman.archlinux.org/pipermail/aur-general/2013-February/021970.htm... -- Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16 stop html mail | always bottom-post http://asciiribbon.org | http://caliburn.nl/topposting.html
participants (5)
-
Connor Behan
-
Kwpolska
-
Rob Til Freedmen
-
Tai-Lin Chu
-
William Giokas