[arch-general] svn packaging, abs => git ?
Couldn't find any discussion about this, but what about we maintain our packages in git instead of svn? pros: 1) git is awesome 2) we don't need abs/rsync anymore. users can just read from git. 3) git network communication is more efficient then rsync (afaik) 4) users can check out older versions of packages easily, with limited storage overhead. 5) makes it easier to maintain forks of packages (have your own git repository with some changes, then merge in upstream changes to keep them up to date. upstream == arch linux here) cons: 1) using git for abs will use more disk space because you need the checkout + the repo (a 60% or increase or so? my abs tree is now 57MB, so even if this becomes 100MB that's still ok imho) 2) svn->git migration is not trivial, since tools, the website, .. will need to be adapted. Dieter
Am Sonntag, 7. März 2010 12:03:08 schrieb Dieter Plaetinck:
2) svn->git migration is not trivial, since tools, the website, .. will need to be adapted.
The problem is that our current repo layout and workflow is not mappable to git. E.g. we use svn's feature to checkout specific dirs/files etc.. But I guess if someone would come up with a superior solution using git we would be quite happy. -- Pierre Schmitz, https://users.archlinux.de/~pierre
On 07/03/2010, Pierre Schmitz <pierre@archlinux.de> wrote:
Am Sonntag, 7. März 2010 12:03:08 schrieb Dieter Plaetinck:
2) svn->git migration is not trivial, since tools, the website, .. will need to be adapted.
The problem is that our current repo layout and workflow is not mappable to git. E.g. we use svn's feature to checkout specific dirs/files etc..
But I guess if someone would come up with a superior solution using git we would be quite happy.
Sparseness is a complicated matter in git: http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_c... And we need that the most. -- GPG/PGP ID: B42DDCAD
On Sun, 7 Mar 2010 21:18:26 +0800 Ray Rashif <schivmeister@gmail.com> wrote:
On 07/03/2010, Pierre Schmitz <pierre@archlinux.de> wrote:
Am Sonntag, 7. März 2010 12:03:08 schrieb Dieter Plaetinck:
2) svn->git migration is not trivial, since tools, the website, .. will need to be adapted.
The problem is that our current repo layout and workflow is not mappable to git. E.g. we use svn's feature to checkout specific dirs/files etc..
But I guess if someone would come up with a superior solution using git we would be quite happy.
Sparseness is a complicated matter in git: http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_c...
And we need that the most.
-- GPG/PGP ID: B42DDCAD
hmm.. it doesn't look _that_ hard: http://vmiklos.hu/blog/sparse-checkout-example-in-git-1-7 Btw, re: my con 1 (disk space needed for history), one could use git clone --depth if one only wants recent history. (though i don't think this will be an issue at all in practice) Dieter
Am 07.03.2010 12:03, schrieb Dieter Plaetinck:
Couldn't find any discussion about this, but what about we maintain our packages in git instead of svn?
pros: 1) git is awesome 2) we don't need abs/rsync anymore. users can just read from git. 3) git network communication is more efficient then rsync (afaik) 4) users can check out older versions of packages easily, with limited storage overhead. 5) makes it easier to maintain forks of packages (have your own git repository with some changes, then merge in upstream changes to keep them up to date. upstream == arch linux here)
This comes up every month at least (not necessarily on the mailing list, but somewhere) and people always say "use git" without even thinking how that would work - so far, nobody has ever presented a workflow that would match our packaging requirements and was based on git. We don't use SVN for fun - using SVN is everything but fun. 1) We want to be able to see which PKGBUILD matches the package in the repository. In SVN, we use copy - which is subversion's equivalent to branching: By copying, you create a reference and all history of the copied file is still there. In git, copying means that the copy has no history, it is entirely unrelated to the original. The only equivalent in git would be branching - but you cannot branch a single file or path, you can only branch the entire tree. 2) Partial checkouts and commits: We check out single directories and most importantly we commit to single directories without updating the rest of the repository. These operations come naturally to SVN, but they are against the very concept of git. The only viable solution I could think of is using one git repository per package - and that is just crazy.
On Sun, 07 Mar 2010 14:49:01 +0100 Thomas Bächler <thomas@archlinux.org> wrote:
1) We want to be able to see which PKGBUILD matches the package in the repository. In SVN, we use copy - which is subversion's equivalent to branching: By copying, you create a reference and all history of the copied file is still there. In git, copying means that the copy has no history, it is entirely unrelated to the original. The only equivalent in git would be branching - but you cannot branch a single file or path, you can only branch the entire tree.
Note that with "package in repository" Thomas means the <package>/repos/i686 and such directories in svn. There are some approaches we could take: * git diff has a -C flag to detect copies. --find-copies-harder For performance reasons, by default, -C option finds copies only if the original file of the copy was modified in the same changeset. This flag makes the command inspect unmodified files as candidates for the source of copy. This is a very expensive operation for large projects, so use it with caution. Giving more than one -C option has the same effect. so, we tell all packagers to do the add/update/test/add-to-repositories in one commit. (or use the slow -C flag, I don't know how often you want to do this) * git branches. 3 branches or so for each package. that's a lot of branches, but maybe that's not really a problem, depends on how many times you want to merge branches i guess (i.e. how related packages are to each other) * we could also get rid of these branch directories. what's the point of them anyway? the tools who build the packages (tarballs) must know the latest version for the particular architecture? maybe we can put tags in the commit messages, or keep a textfile in the package directory to know which "state" of the directory is usuable to build packages for. * just do normal copies and don't care about the histories. I'm not really a packager so I don't know how feasible all approaches are, but some of them seem pretty feasible.
2) Partial checkouts and commits: We check out single directories and most importantly we commit to single directories without updating the rest of the repository. These operations come naturally to SVN, but they are against the very concept of git.
did you see http://vmiklos.hu/blog/sparse-checkout-example-in-git-1-7 ? is this not enough? I mean, you can clone the (complete) repository, checkout the git repo sparsely, commit in your subdirs, add the clone as remote in your original and pull in the changes. okay you do have the complete 57MB repository locally, but at least a clean checkout. I actually just tried this and it just works! Dieter
On 07/03/2010, Dieter Plaetinck <dieter@plaetinck.be> wrote:
On Sun, 07 Mar 2010 14:49:01 +0100 Thomas Bächler <thomas@archlinux.org> wrote:
1) We want to be able to see which PKGBUILD matches the package in the repository. In SVN, we use copy - which is subversion's equivalent to branching: By copying, you create a reference and all history of the copied file is still there. In git, copying means that the copy has no history, it is entirely unrelated to the original. The only equivalent in git would be branching - but you cannot branch a single file or path, you can only branch the entire tree.
Note that with "package in repository" Thomas means the <package>/repos/i686 and such directories in svn.
There are some approaches we could take: * git diff has a -C flag to detect copies. --find-copies-harder For performance reasons, by default, -C option finds copies only if the original file of the copy was modified in the same changeset. This flag makes the command inspect unmodified files as candidates for the source of copy. This is a very expensive operation for large projects, so use it with caution. Giving more than one -C option has the same effect.
so, we tell all packagers to do the add/update/test/add-to-repositories in one commit. (or use the slow -C flag, I don't know how often you want to do this)
* git branches. 3 branches or so for each package. that's a lot of branches, but maybe that's not really a problem, depends on how many times you want to merge branches i guess (i.e. how related packages are to each other)
* we could also get rid of these branch directories. what's the point of them anyway? the tools who build the packages (tarballs) must know the latest version for the particular architecture? maybe we can put tags in the commit messages, or keep a textfile in the package directory to know which "state" of the directory is usuable to build packages for.
* just do normal copies and don't care about the histories.
I'm not really a packager so I don't know how feasible all approaches are, but some of them seem pretty feasible.
2) Partial checkouts and commits: We check out single directories and most importantly we commit to single directories without updating the rest of the repository. These operations come naturally to SVN, but they are against the very concept of git.
did you see http://vmiklos.hu/blog/sparse-checkout-example-in-git-1-7 ? is this not enough? I mean, you can clone the (complete) repository, checkout the git repo sparsely, commit in your subdirs, add the clone as remote in your original and pull in the changes. okay you do have the complete 57MB repository locally, but at least a clean checkout. I actually just tried this and it just works!
It will work, no doubt. But the problem is this: svn co $url --depth empty # nothing cd $dir svn up $pkg ..against this: git clone $url # everything cd $dir git config core.sparsecheckout true echo $pkg > .git/info/sparse-checkout git read-tree -m -u HEAD And then with svn you can maintain the sparseness with 'svn up --set-depth empty' everytime. And also I think the main thing here is git will work backwards, and as such, will pull in the whole repo: "DO NOT CHECK OUT THE ENTIRE SVN REPO." From: http://www.archlinux.org/svn/ So when someone says it's alright to do that, then I think it'll not be too hard to migrate the tools to git, and use those tools instead of using git directly. In this case, I think we'd no longer need 'archrelease' since a git commit is local only, and use push instead to "release" the package. That'd then eliminate all directories except for the package itself. -- GPG/PGP ID: B42DDCAD
On 07.03.2010 17:49, Ray Rashif wrote:
And also I think the main thing here is git will work backwards, and as such, will pull in the whole repo:
"DO NOT CHECK OUT THE ENTIRE SVN REPO."
http://learn.github.com/p/intro.html#small_vs_svn Maybe someone (dev?) could clone the entire svn repo, convert it to git and post some numbers? -- Florian Pritz -- {flo,bluewind}@server-speed.net
On Mon, 8 Mar 2010 00:49:22 +0800 Ray Rashif <schivmeister@gmail.com> wrote:
It will work, no doubt. But the problem is this:
svn co $url --depth empty # nothing cd $dir svn up $pkg
..against this:
git clone $url # everything cd $dir git config core.sparsecheckout true echo $pkg > .git/info/sparse-checkout git read-tree -m -u HEAD
And then with svn you can maintain the sparseness with 'svn up --set-depth empty' everytime. And also I think the main thing here is git will work backwards, and as such, will pull in the whole repo:
once a user did what you did, i think it's fine. if you want to track a new package, you write it into .git/info/sparse-checkout, which is not really harder then "svn up <packagename>"
"DO NOT CHECK OUT THE ENTIRE SVN REPO."
From: http://www.archlinux.org/svn/
So when someone says it's alright to do that, then I think it'll not be too hard to migrate the tools to git, and use those tools instead of using git directly. In this case, I think we'd no longer need 'archrelease' since a git commit is local only, and use push instead to "release" the package. That'd then eliminate all directories except for the package itself.
maybe - if we would switch to git - we could have mirrors mirror our git repository. Dieter
I agree with everyone else that's said it. This comes up often enough but no one ever has a good workflow that works. I have seen nothing proposed in this thread that is good. The ONLY thing gained is "oh neat, it's in git". We lose quite a bit, especially in the branch-ing department. It seems like this is a "solution" that's looking for a problem to happen. As far as I know, working with svn isn't a big deal and isn't a problem. On Sun, Mar 7, 2010 at 9:07 AM, Dieter Plaetinck <dieter@plaetinck.be> wrote:
* git branches. 3 branches or so for each package. that's a lot of branches, but maybe that's not really a problem, depends on how many times you want to merge branches i guess (i.e. how related packages are to each other)
This wouldn't allow you to safely checkout multiple packages at a time as you can only have one branch checked out at a time. Also could you imagine scripting this to verify the version of say xorg-server that is in extra? Ugh. With svn, it's imply a checkout of $URL/xorg-server/extra-i686 and checking the PKGBUILD.
* we could also get rid of these branch directories. what's the point of them anyway? the tools who build the packages (tarballs) must know the latest version for the particular architecture? maybe we can put tags in the commit messages, or keep a textfile in the package directory to know which "state" of the directory is usuable to build packages for.
Knowing the version of the files which built a package in the repos is important. Using tags in commit messages is a terrible idea because not only do we need to change the PAINFULLY simple way of checking with something using git, but we also have to parse plain text commit messages.
* just do normal copies and don't care about the histories.
This shouldn't ever happen. The history is important
On Tue, Mar 9, 2010 at 1:04 AM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
It seems like this is a "solution" that's looking for a problem to happen. As far as I know, working with svn isn't a big deal and isn't a problem.
Just a side-note : I discovered git svn today, it's really a blessing ! I very rarely hack on any svn projects, but today was the case, and I am not able to do anything with svn. But there is a simple explanation to that : git is the only scm I learned and used. I wouldn't argue whether Arch packages should use git or not simply because I don't know better and I don't have any precise ideas how everything would work. So anyone in the same situation is invited to do the same :)
On Mon, 8 Mar 2010 18:04:50 -0600 Aaron Griffin <aaronmgriffin@gmail.com> wrote:
I agree with everyone else that's said it. This comes up often enough but no one ever has a good workflow that works. I have seen nothing proposed in this thread that is good. The ONLY thing gained is "oh neat, it's in git". We lose quite a bit, especially in the branch-ing department.
It seems like this is a "solution" that's looking for a problem to happen. As far as I know, working with svn isn't a big deal and isn't a problem.
The biggest advantage of a git-based approach is that imho it becomes much easier to - as an end user - maintain forks of packages. you could just have a clone with your own packages and merge in changes as ABS gets updated. with the current ABS, i always need to manually apply & commit the changes from abs into my customized packages. but if i'm missing something, please enlighten me. Dieter
On Tue, Mar 9, 2010 at 6:16 AM, Dieter Plaetinck <dieter@plaetinck.be> wrote:
On Mon, 8 Mar 2010 18:04:50 -0600 Aaron Griffin <aaronmgriffin@gmail.com> wrote:
I agree with everyone else that's said it. This comes up often enough but no one ever has a good workflow that works. I have seen nothing proposed in this thread that is good. The ONLY thing gained is "oh neat, it's in git". We lose quite a bit, especially in the branch-ing department.
It seems like this is a "solution" that's looking for a problem to happen. As far as I know, working with svn isn't a big deal and isn't a problem.
The biggest advantage of a git-based approach is that imho it becomes much easier to - as an end user - maintain forks of packages. you could just have a clone with your own packages and merge in changes as ABS gets updated. with the current ABS, i always need to manually apply & commit the changes from abs into my customized packages.
While that may be true, you're talking about complicating the system for developers and backend tools so that users can maintain their own "forks" of PKGBUILDs. Considering a "package" is just a dir with 1-5 files in it, it's not THAT big of a deal to handle that manually. Additionally, "git svn fetch" will cover this use case just fine.
The only viable solution I could think of is using one git repository per package - and that is just crazy.
With submodules it wouldn't be that bad.
On Sun, Mar 07, 2010 at 02:49:01PM +0100, Thomas Bächler wrote:
The only viable solution I could think of is using one git repository per package - and that is just crazy.
I wonder, is it really that crazy ? I've been looking into git as a replacement for my own use. One repo per project seems the 'natural' way to use it. Downloading the complete abs would require a lot of 'git clone' operations, but is that the typical use case ? I guess it is not. And if you really need everything you do it once, after that it's just updates. And most users probably don't need everything. Also, even if I find it hard to believe, it seems that git repos are typically much smaller than the equivalent in svn. Ciao, - FA O tu, che porte, correndo si ? E guerra e morte !
On Sun, Mar 7, 2010 at 10:14 PM, <fons@kokkinizita.net> wrote:
On Sun, Mar 07, 2010 at 02:49:01PM +0100, Thomas Bächler wrote:
The only viable solution I could think of is using one git repository per package - and that is just crazy.
I wonder, is it really that crazy ?
I've been looking into git as a replacement for my own use. One repo per project seems the 'natural' way to use it.
Downloading the complete abs would require a lot of 'git clone' operations, but is that the typical use case ? I guess it is not. And if you really need everything you do it once, after that it's just updates. And most users probably don't need everything.
Also, even if I find it hard to believe, it seems that git repos are typically much smaller than the equivalent in svn.
I have grepped the full abs tree many times for various reasons. It is very practical. And in 95% of the cases, I do not need any history, I just need the last version to read/edit/rebuild.
On Sun, Mar 07, 2010 at 10:24:42PM +0100, Xavier Chantry wrote:
I have grepped the full abs tree many times for various reasons. It is very practical.
No question about that, but it would still be possible to download everything. When you do a netinstall pacman gets a few hundred packages individually. Abs could do the same, it would transparent to the user.
And in 95% of the cases, I do not need any history, I just need the last version to read/edit/rebuild.
Yes, with git you get the full history. I've still not grokked why that is the only option... (except for Linus' motto: if in doubt, do the opposite of svn :-) Ciao, -- FA O tu, che porte, correndo si ? E guerra e morte !
On 07/03/10 21:34, fons@kokkinizita.net wrote:
On Sun, Mar 07, 2010 at 10:24:42PM +0100, Xavier Chantry wrote:
[...]
And in 95% of the cases, I do not need any history, I just need the last version to read/edit/rebuild.
Yes, with git you get the full history. I've still not grokked why that is the only option... (except for Linus' motto: if in doubt, do the opposite of svn :-)
Ciao,
Maybe you're looking for `git clone --depth 1`
The only way for this to actually happen would be for someone to set up a git repo with a handful of packages and demonstrate that it works better with the usual packaging workflow. That is what was done with SVN and why it was chosen when we switched from CVS. Allan
On Sun, Mar 7, 2010 at 11:23 PM, Allan McRae <allan@archlinux.org> wrote:
The only way for this to actually happen would be for someone to set up a git repo with a handful of packages and demonstrate that it works better with the usual packaging workflow. That is what was done with SVN and why it was chosen when we switched from CVS.
By the way, I am not sure anyone mentioned that there were actually two proposals for getting rid of cvs : git vs svn And svn won.
From http://mailman.archlinux.org/pipermail/arch-dev-public/2007-December/003330....
* Getting rid of CVS Last status report, I pointed this guy out. Roman responded with a vote for Jason's SVN proposal. In summary: * Jason has provided us with an svn solution, where sub-directories control the location of the package (i.e. package-name/repos/extra/PKGBUILD will place the package into extra) * Dan has provided us with a git solution that uses named branches to control the location (i.e. a branch named "testing" has changes to PKGBUILDs present only in the testing repo) I'm going to put my weight behind Jason's SVN proposal too, for the following reasons: * There is no reason to manage our packages in a distributed manner * SVN will be an easier transition for some users and developers unfamiliar with the esoteric commands of git. * It has a real implementation * One can use the git-svn porcelain on top of this, to still get the full power if git if they so wish. So, the next steps: Jason, can you provide us with some more details on your implementation, or perhaps something on gerolde as a preliminary system? I'd like to setup something side-by-side for people to use and to play with a bit. This way we can easily flesh out the hairier details. Paul, you did some similar work with repoman, yes? Do you have anything to add to this topic?
On 03/07/2010 07:49 AM, Thomas Bächler wrote:
This comes up every month at least (not necessarily on the mailing list, but somewhere) and people always say "use git" without even thinking how that would work - so far, nobody has ever presented a workflow that would match our packaging requirements and was based on git. We don't use SVN for fun - using SVN is everything but fun.
1) We want to be able to see which PKGBUILD matches the package in the repository. In SVN, we use copy - which is subversion's equivalent to branching: By copying, you create a reference and all history of the copied file is still there. In git, copying means that the copy has no history, it is entirely unrelated to the original. The only equivalent in git would be branching - but you cannot branch a single file or path, you can only branch the entire tree. 2) Partial checkouts and commits: We check out single directories and most importantly we commit to single directories without updating the rest of the repository. These operations come naturally to SVN, but they are against the very concept of git.
The only viable solution I could think of is using one git repository per package - and that is just crazy.
If it ain't broke..... don't fix it! -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
Am 07.03.2010 12:03, schrieb Dieter Plaetinck:
Couldn't find any discussion about this, but what about we maintain our packages in git instead of svn?
pros: 1) git is awesome
That is a personal opinion, not an argument.
2) we don't need abs/rsync anymore. users can just read from git. Users would have to learn git commands. 3) git network communication is more efficient then rsync (afaik) Maybe, I do not know. 4) users can check out older versions of packages easily, with limited storage overhead. Do you want to store binary packages in the git repo? Maybe I misunderstand you. Checking out older PKGBUILDs would be doable in svn also, I guess. 5) makes it easier to maintain forks of packages (have your own git repository with some changes, then merge in upstream changes to keep them up to date. upstream == arch linux here)
cons: 1) using git for abs will use more disk space because you need the checkout + the repo (a 60% or increase or so? my abs tree is now 57MB, so even if this becomes 100MB that's still ok imho) 2) svn->git migration is not trivial, since tools, the website, .. will need to be adapted.
Dieter
We should not do that. git imho is by far to complicated for end users to use. Lets keep it easy. Regards Stefan
On Sun, 07 Mar 2010 19:51:30 +0100 Stefan Husmann <stefan-husmann@t-online.de> wrote:
4) users can check out older versions of packages easily, with limited storage overhead. Do you want to store binary packages in the git repo? Maybe I misunderstand you. Checking out older PKGBUILDs would be doable in svn also, I guess.
no, i was talking about the "source packages" (pkgbuilds, install files etc). now you can get all that stuff with ABS, but only the latest version. Dieter
On Sun, Mar 7, 2010 at 7:55 PM, Dieter Plaetinck <dieter@plaetinck.be> wrote:
On Sun, 07 Mar 2010 19:51:30 +0100 Stefan Husmann <stefan-husmann@t-online.de> wrote:
4) users can check out older versions of packages easily, with limited storage overhead. Do you want to store binary packages in the git repo? Maybe I misunderstand you. Checking out older PKGBUILDs would be doable in svn also, I guess.
no, i was talking about the "source packages" (pkgbuilds, install files etc). now you can get all that stuff with ABS, but only the latest version.
uhm ? Ray already showed you can obviously do that with svn as well, and you even answered to him :) http://wiki.archlinux.org/index.php/Getting_PKGBUILDS_From_SVN
participants (14)
-
Aaron Griffin
-
Allan McRae
-
Baho Utot
-
David C. Rankin
-
Dieter Plaetinck
-
Florian Pritz
-
fons@kokkinizita.net
-
Muhammed Uluyol
-
Nathan Wayde
-
Pierre Schmitz
-
Ray Rashif
-
Stefan Husmann
-
Thomas Bächler
-
Xavier Chantry