[arch-dev-public] Killing CVS [was: Status Report 2007-10-15]
On 10/15/07, Andreas Radke <a.radke@arcor.de> wrote:
From what I read and talk around it seems most European devs are more comfortable with svn and American devs prefer git.
Heh, you forgot Canadian! Either way, I think the distinction is more like so: The coders tend to prefer git, while the non-coders prefer svn. I'm glad you brought this up as it's something that has gone by the wayside for far too long. I have basically been waiting to see how other discussions have gone. So let me make a few salient points here: * The SCM doesn't matter. It doesn't. We're not doing anything complicated that we need advanced features for. The *ONLY* reason we use an SCM at all for PKGBUILDs is to track history. We keep discussing all this stuff as if we need super advanced features for PKGBUILDs and local branches, complex N-way merges, etc etc. We don't. We need to commit, and update. That's it. * There will always be a discomfort when we switch these things. However, if all developers use devtools except in extreme circumstances, there's no reason this discomfort should actually cause any problems. * Pros and cons are useless here. As I said in the first point, we have no need for the advanced features based on our usage patterns. Sure, we can make use of them later, but right now it shouldn't be weighting any decision. * Changing SCMs is fairly easy. If we fuck up, we change to something else - I could do all the grunt work here if people see this as any sort of issue (it's a 5-10 line config file for tailor) ------------------------------ So, here's what I'd like to do. This decision is fairly arbitrary, so let's try and throw out FACTUAL points FOR each SCM. I don't want to see any "X is better than Y because of Z" drivel. I want to see "X supports Z". Everyone has their "baby" SCM. That's not the issue. We need to make a decision, and opinions should have no place here. So. Bring up your points as they apply to our usage patterns. We'll discuss this until this weekend, and come next monday, I'd like to start a vote and see if we can get anywhere. Thanks guys, Aaron
On 10/16/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On 10/15/07, Andreas Radke <a.radke@arcor.de> wrote:
From what I read and talk around it seems most European devs are more comfortable with svn and American devs prefer git.
Heh, you forgot Canadian!
Either way, I think the distinction is more like so: The coders tend to prefer git, while the non-coders prefer svn.
I'm glad you brought this up as it's something that has gone by the wayside for far too long. I have basically been waiting to see how other discussions have gone.
So let me make a few salient points here:
* The SCM doesn't matter. It doesn't. We're not doing anything complicated that we need advanced features for. The *ONLY* reason we use an SCM at all for PKGBUILDs is to track history.
We keep discussing all this stuff as if we need super advanced features for PKGBUILDs and local branches, complex N-way merges, etc etc.
We don't. We need to commit, and update. That's it.
* There will always be a discomfort when we switch these things. However, if all developers use devtools except in extreme circumstances, there's no reason this discomfort should actually cause any problems.
* Pros and cons are useless here. As I said in the first point, we have no need for the advanced features based on our usage patterns. Sure, we can make use of them later, but right now it shouldn't be weighting any decision.
* Changing SCMs is fairly easy. If we fuck up, we change to something else - I could do all the grunt work here if people see this as any sort of issue (it's a 5-10 line config file for tailor)
------------------------------
So, here's what I'd like to do. This decision is fairly arbitrary, so let's try and throw out FACTUAL points FOR each SCM.
I don't want to see any "X is better than Y because of Z" drivel. I want to see "X supports Z". Everyone has their "baby" SCM. That's not the issue. We need to make a decision, and opinions should have no place here.
So. Bring up your points as they apply to our usage patterns. We'll discuss this until this weekend, and come next monday, I'd like to start a vote and see if we can get anywhere.
Thanks guys, Aaron
The only real salient (read: non fanboy) problem I have with subversion would be repository size. Subversion isn't very good at keeping repo side down, and with large checkouts this could be a problem. The arch pkgbuild repository would grow some considerable history over time, and this may be a factor. A work around would be to occasionally nuke history. I have no problem with doing it for something like our pkgbuilds, since they only matter for 'today' anyway. Other than that, I think I have come to accept that subversion is the 'low hanging fruit' so to speak, insofar as across the board adoption. 1. It is easy to learn for the non-coders. 2. It works 'well enough' for now. 3. We *can* always change it later, like Aaron said. 4. It interfaces with git, so I can still use 'my favorite tools' with it.
oh. forgot to mention.. subversion has a decent permission model when set up through apache with mod_svn (if we go that route).
eliott wrote:
Other than that, I think I have come to accept that subversion is the 'low hanging fruit' so to speak, insofar as across the board adoption. 1. It is easy to learn for the non-coders.
But it will be primarily coders using it, no?
2. It works 'well enough' for now. 3. We *can* always change it later, like Aaron said.
This is true, though there's a lot more retrofitting than we probably realize of existing tools/etc. It will be twice x where x is however much work that is, and x may not be a trivially small amount.
4. It interfaces with git, so I can still use 'my favorite tools' with it.
Good point. - P
Aaron Griffin wrote:
On 10/15/07, Andreas Radke <a.radke@arcor.de> wrote:
From what I read and talk around it seems most European devs are more comfortable with svn and American devs prefer git.
Heh, you forgot Canadian!
Either way, I think the distinction is more like so: The coders tend to prefer git, while the non-coders prefer svn.
I'm glad you brought this up as it's something that has gone by the wayside for far too long. I have basically been waiting to see how other discussions have gone.
So let me make a few salient points here:
* The SCM doesn't matter. It doesn't. We're not doing anything complicated that we need advanced features for. The *ONLY* reason we use an SCM at all for PKGBUILDs is to track history.
We keep discussing all this stuff as if we need super advanced features for PKGBUILDs and local branches, complex N-way merges, etc etc.
We don't. We need to commit, and update. That's it.
I know I could potentially be more productive if I could cherrypick PKGBUILD mods from community members. Sounds like a small thing, and it probably is, but git is a powerful way of working. That said, this is gravy, not necessity. Potential for improvement but not required.
* There will always be a discomfort when we switch these things. However, if all developers use devtools except in extreme circumstances, there's no reason this discomfort should actually cause any problems.
* Pros and cons are useless here. As I said in the first point, we have no need for the advanced features based on our usage patterns. Sure, we can make use of them later, but right now it shouldn't be weighting any decision.
* Changing SCMs is fairly easy. If we fuck up, we change to something else - I could do all the grunt work here if people see this as any sort of issue (it's a 5-10 line config file for tailor)
------------------------------
So, here's what I'd like to do. This decision is fairly arbitrary, so let's try and throw out FACTUAL points FOR each SCM.
I don't want to see any "X is better than Y because of Z" drivel. I want to see "X supports Z". Everyone has their "baby" SCM. That's not the issue. We need to make a decision, and opinions should have no place here.
The existing cvs seems to support everything, and if we're only evaluating based on support and not based on better features, why switch at all? Why not stay with CVS? That said, repoman v0.1 will support SVN. Primarily because there is a nice python interface to svn. Maybe there's one for git, I don't know. I'm sure we will eventually do this in a pluggable kind of way, and support more than SVN, but for right now, I needed to choose 1 and I've chosen SVN. SVN supports checking out only part of a tree/repo. GIT supports gitweb and sign-offs which is a really good way to view history and evaluate changes. GIT supports tags as flexible as we need, I think. svn does but not as flexibly as the way we currently use CURRENT and TESTING, etc. in cvs. So we'd need to figure out how the db scripts get modified and how they know which versions correspond to our old CVS tags. This is intended just as a start for discussion. - P
Paul Mattal schrieb:
I know I could potentially be more productive if I could cherrypick PKGBUILD mods from community members. Sounds like a small thing, and it probably is, but git is a powerful way of working.
Can anyone explain to me what cherrypicking is?
The existing cvs seems to support everything, and if we're only evaluating based on support and not based on better features, why switch at all? Why not stay with CVS?
CVS has no proper way of moving, removing or copying files or whole directories. We moved around between repositories by moving the directories between CVS trees to keep history, which is ugly. We cannot at the same time clean up the repository from directories we do not need anymore and keep the history (and I do want a clean working copy with only directories that are still required and used). So CVS does NOT support everything we need.
On 10/16/07, Thomas Bächler <thomas@archlinux.org> wrote:
Paul Mattal schrieb:
I know I could potentially be more productive if I could cherrypick PKGBUILD mods from community members. Sounds like a small thing, and it probably is, but git is a powerful way of working.
Can anyone explain to me what cherrypicking is?
man git-cherry-pick might help. It basically means picking one commit off another branch, and having it be correctly applied to the current branch through a possible 3-way merge if necessary.
The existing cvs seems to support everything, and if we're only evaluating based on support and not based on better features, why switch at all? Why not stay with CVS?
CVS has no proper way of moving, removing or copying files or whole directories. We moved around between repositories by moving the directories between CVS trees to keep history, which is ugly. We cannot at the same time clean up the repository from directories we do not need anymore and keep the history (and I do want a clean working copy with only directories that are still required and used). So CVS does NOT support everything we need.
Do you have a .cvsrc? Try adding a "update -dP" line (and a "checkout -P"), or something along those lines.
GIT supports tags as flexible as we need, I think. svn does but not as flexibly as the way we currently use CURRENT and TESTING, etc. in cvs. So we'd need to figure out how the db scripts get modified and how they know which versions correspond to our old CVS tags.
The way that git does tagging (which is more correct anyway) it's not easy to move tags from one commit to another. The way we do CVS tagging won't be portable to git. Jason
Jason Chu wrote:
GIT supports tags as flexible as we need, I think. svn does but not as flexibly as the way we currently use CURRENT and TESTING, etc. in cvs. So we'd need to figure out how the db scripts get modified and how they know which versions correspond to our old CVS tags.
The way that git does tagging (which is more correct anyway) it's not easy to move tags from one commit to another. The way we do CVS tagging won't be portable to git.
Thanks for pointing this out. It actually occurred to me that this was the case while I was walking around outside at lunch! Git is entire-content centric, so the tag applies to the whole repo. - P
Aaron Griffin schrieb:
We don't. We need to commit, and update. That's it. [...] So, here's what I'd like to do. This decision is fairly arbitrary, so let's try and throw out FACTUAL points FOR each SCM.
Before 2.6.22, tpowa and I maintained rc-kernels and published them. We didn't do that for 2.6.23, but we will do it again in the future. We didn't use history tracking, so it was very difficult to keep in sync. I want to maintain the testing PKGBUILDs for kernel and modules in my local working copy with history (or maybe as a branch on the master repository, so we can share the PKGBUILDs). I think doing that is very easy with git (I would need some help from a git-guru in the beginning, but I am sure it will be easy), very ugly with subversion, but still possible. That is why I want git.
On 10/16/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
So let me make a few salient points here:
* The SCM doesn't matter. It doesn't. We're not doing anything complicated that we need advanced features for. The *ONLY* reason we use an SCM at all for PKGBUILDs is to track history.
We keep discussing all this stuff as if we need super advanced features for PKGBUILDs and local branches, complex N-way merges, etc etc.
We don't. We need to commit, and update. That's it.
I think you may be simplifying it a bit too much. We have this issue called current/testing. Right now, it sucks. We use CVS tags. Why is there no kernel 2.6.22.10 in our core repo right now? Oh yeah, because the CVS has already been shifted to the 2.6.23 path so we can't go back and update it. That is a problem whatever SCM we move to has to overcome. I agree with Thomas here- maintaining RC branches for things like the kernel should at least be possible if we can do that, even if they do not get maintained on the main server, they should at least be able to be shared among developers. On the other hand, and I know its odd, but no one has proposed both a SCM tool and a usage pattern that sticks out to me as better than CVS. CVS, for all its shortcomings, does have one good thing that most projects see as bad- changes are tracked on a file basis, not a atomic commit basis. This makes a lot more sense for things like PKGBUILDs because packages (usually) do not depend on other changes in the repository, although in cases like a rebuild they could. So I've proposed nothing here, sorry about that. But I really think the point I made in my first paragraph needs to be addressed by whatever SCM we switch to, otherwise there is no reason to switch away from the current (but working) CVS model we have. -Dan
Dan McGee wrote:
On 10/16/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
So let me make a few salient points here:
* The SCM doesn't matter. It doesn't. We're not doing anything complicated that we need advanced features for. The *ONLY* reason we use an SCM at all for PKGBUILDs is to track history.
We keep discussing all this stuff as if we need super advanced features for PKGBUILDs and local branches, complex N-way merges, etc etc.
We don't. We need to commit, and update. That's it.
I think you may be simplifying it a bit too much. We have this issue called current/testing. Right now, it sucks. We use CVS tags. Why is there no kernel 2.6.22.10 in our core repo right now? Oh yeah, because the CVS has already been shifted to the 2.6.23 path so we can't go back and update it. That is a problem whatever SCM we move to has to overcome.
Actually, I don't think this is a problem the SCM should HAVE to overcome if we do our repo maintenance scripts properly. The SCM should not be tracking what is current/testing/etc. The repository manager should know what's CURRENT and should be pointing to the proper SCM version. This is the design in repoman. I believe right now we are really burdened because this division between repo maintenance and SCM is not optimal. - P
On Tue, Oct 16, 2007 at 12:58:49PM -0500, Dan McGee wrote:
On 10/16/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
So let me make a few salient points here:
* The SCM doesn't matter. It doesn't. We're not doing anything complicated that we need advanced features for. The *ONLY* reason we use an SCM at all for PKGBUILDs is to track history.
We keep discussing all this stuff as if we need super advanced features for PKGBUILDs and local branches, complex N-way merges, etc etc.
We don't. We need to commit, and update. That's it.
I think you may be simplifying it a bit too much. We have this issue
YES! I totally agree that this is an oversimplification that I think is untrue.
called current/testing. Right now, it sucks. We use CVS tags. Why is there no kernel 2.6.22.10 in our core repo right now? Oh yeah, because the CVS has already been shifted to the 2.6.23 path so we can't go back and update it. That is a problem whatever SCM we move to has to overcome.
I agree with Thomas here- maintaining RC branches for things like the kernel should at least be possible if we can do that, even if they do not get maintained on the main server, they should at least be able to be shared among developers.
Yes!
On the other hand, and I know its odd, but no one has proposed both a SCM tool and a usage pattern that sticks out to me as better than CVS. CVS, for all its shortcomings, does have one good thing that most projects see as bad- changes are tracked on a file basis, not a atomic commit basis. This makes a lot more sense for things like PKGBUILDs because packages (usually) do not depend on other changes in the repository, although in cases like a rebuild they could.
Agreed. That's why I haven't made any recommendations, though the closest was that slightly modified svn suggestion. I'm glad that someone else understands what I've been saying.
So I've proposed nothing here, sorry about that. But I really think the point I made in my first paragraph needs to be addressed by whatever SCM we switch to, otherwise there is no reason to switch away from the current (but working) CVS model we have.
I can't make a good suggestion right now either. I'll help people flesh out ideas because I've thought about this lots, but sometimes I feel like people think I impeed things unnecessarily. If we run into problems later with a solution, I know I'll feel like saying "I told you so". Jason
On 10/16/07, Jason Chu <jason@archlinux.org> wrote:
If we run into problems later with a solution, I know I'll feel like saying "I told you so".
I didn't want this to turn to opinions and fanboy-ism, but it appears to have happened in record time. I need to repeat this point. About 2 months ago I converted extra and current to git repos. Full history and tags were kept and very little changed. It was done in about 10 minutes. I committed to git and converted back to CVS is about the same time. So, let me make this abundantly clear:
We can't fuck this up. <<<
There is no way anything gets lost besides time. We're already soaking time by going off on these little tangents.
I need to repeat this point. About 2 months ago I converted extra and current to git repos. Full history and tags were kept and very little changed. It was done in about 10 minutes. I committed to git and converted back to CVS is about the same time.
Do you have those repos still? I wouldn't mind getting copies of them all and checking them out myself. Or even if you have the tailor conf files that you used. Jason
On Tue, Oct 16, 2007 at 02:30:18PM -0500, Aaron Griffin wrote:
On 10/16/07, Jason Chu <jason@archlinux.org> wrote:
If we run into problems later with a solution, I know I'll feel like saying "I told you so".
I didn't want this to turn to opinions and fanboy-ism, but it appears to have happened in record time.
I need to repeat this point. About 2 months ago I converted extra and current to git repos. Full history and tags were kept and very little changed. It was done in about 10 minutes. I committed to git and converted back to CVS is about the same time.
So, let me make this abundantly clear:
We can't fuck this up. <<<
There is no way anything gets lost besides time. We're already soaking time by going off on these little tangents.
I don't care about what Aaron says here. Whether I disagree or not really doesn't matter. The point he's trying to get to is let's try something (without necessarily putting it into place right away) and talk about that instead of just talking about the ideas. So, I'm doing my part. A while ago there was talk of a different svn layout that we could use to help us track repos apart from the version control repo. For those of you that remember, good for you! To quote an email from Paul, the problems this method is trying to solve are thus:
a) Moving packages from one repo to another is hard. b) Placing packages in multiple repos is hard. c) Continued separate-track development on a package while in testing is hard. d) Tracking multiple binary repos for different architectures is hard. e) Maintenance of a package by more than one person is hard.
It addresses all of these issues fairly well, if I do say so myself. I tried writing some scripts for it and using a new tool (svnmerge) to possibly help keep versions in sync. I can recreate the svn repo using the newest changes in about an hour. I created the current repo based on changes as of last night. I will now share what I have done. First the svn repo: http://projects.xennet.org/svnarch/ You can svn co the whole repo by itself, but (last time I tried) it takes about 2 hours to do (it isn't network traffic either... I think it's just a limitation of svn). A better suggestion (and the whole point of this layout) is to only check out the packages you need (and possibly even remove the working copies when you're done). I've written a couple of scripts (archco, archrelease, and archrm) to help with this flow. The basic flow of this method goes like this: 1) archco package you want to update 2) edit the files in trunk and commit as if you were a developer doing whatever you wanted to do to source code 3) once all changes are commited, run archrelease <repo> from the trunk directory -- this will merge all unmerged changes from trunk into that repo or create the repo if it doesn't currently exist 4) archrm the directory While a checkout of the entire repo takes 2 hours, checking out a package takes about 5 seconds. Now, how does this address Paul's points: a) Moving packages from one repo to another is a simple svn copy (or svnmerge, depending on the situation) b) To put a package in multiple repos, just archrelease the trunk (or svn copy or svnmerge from a different repo). c) Files in <pkgname>/repos/* can be edited and commited to as if they were in trunk. This should work even when wanting to merge other changes from trunk into that repo later. d) Different architectures are dealt with just like repos, it's the db scripts that will treat these directories differently. e) Commits to trunk don't automatically go anywhere, people can make whatever changes they want without first rolling back other people's changes. The major flaw that I can find with this layout is that bulk editing becomes more difficult. Because we don't a) abuse CVS tags and b) check out the whole repository, mass changes are difficult to apply to packages. Eventually, I'm confident that the tools we write can make up for this. archco, archrelease, and archrm can be seen here: http://projects.xennet.org/svnarch-tools/ Notice that these scripts are really simple. archrelease would need to be expanded later (as would the FIXME in archrm). Feel free to try things out. If you want write access, just send me an email. I've been applying random commits that I see an arch-commits and everything is really easy. Jason
On 10/17/07, Jason Chu <jason@archlinux.org> wrote:
I don't care about what Aaron says here. Whether I disagree or not really doesn't matter. The point he's trying to get to is let's try something (without necessarily putting it into place right away) and talk about that instead of just talking about the ideas.
Ok, keeping things rolling here. This thread is marked in my little "to respond" bucket, but I haven't had enough time to get to it. Comments: a) First off, this work is here, and that means quite a lot to me. Even if I don't think the concept is "perfect", it's done and it's good. b) The concepts for the tools are solid. I actually didn't like the single package checkout at first, but it's grown on me. For those that don't get it, here's a usecase: cd ~/mypackages archco gtkpod cd gtkpod && do stuff cd .. archco fetchmail ...etc etc... This means two things. 1) You don't need the whole repo checked out all the time, so it saves on disk space, and 2) you can have a nice subset of packages in a directory, instead of having to hunt for packages. c) Could you explain how the usage would work with the svnmerge tool. Personally, it's a little confusing with the repos/ dir, even though I had a small part in this original idea 8) Other: This implementation grew on me as archco did. I don't think it'd be as easy with an other SCM. But I still am a little put off by svn - after having used fancier SCMs out there, I don't really like the tamer ones anymore. git has some GREAT tools, but I don't think we'd need half of them. I'd still love to see a git or even mercurial implementation (bonus points: mercurial works on Win32 better, so we could ease into a 'cygwin' arch, which is something I wanted to try out for a while, hah). When I have time and finish up some things I wanted to finish, I will see about a git implementation. I believe Dan already has something, so he may beat me. Anyone else have something to add?
Am Fri, 19 Oct 2007 19:06:39 -0500 schrieb "Aaron Griffin" <aaronmgriffin@gmail.com>:
This means two things. 1) You don't need the whole repo checked out all the time, so it saves on disk space, and 2) you can have a nice subset of packages in a directory, instead of having to hunt for packages.
Hm. Do you remember the way we try to find all deep dependencies? how should we do this not having the whole tree checked out? -Andy
On Sat, Oct 20, 2007 at 12:56:14PM +0200, Andreas Radke wrote:
Am Fri, 19 Oct 2007 19:06:39 -0500 schrieb "Aaron Griffin" <aaronmgriffin@gmail.com>:
This means two things. 1) You don't need the whole repo checked out all the time, so it saves on disk space, and 2) you can have a nice subset of packages in a directory, instead of having to hunt for packages.
Hm. Do you remember the way we try to find all deep dependencies? how should we do this not having the whole tree checked out?
That's a good point. It's not that you can't check out the whole repo, you still can, it's just not how you'd do it when maintaining packages. Abs will still exist too. It probably won't use csup, but it will have copies of all the PKGBUILDs in it. Jason
c) Could you explain how the usage would work with the svnmerge tool. Personally, it's a little confusing with the repos/ dir, even though I had a small part in this original idea 8)
svnmerge came out recently with subversion. The basic premise is that subversion is dumb about merges. It doesn't store any meta data when you merge a set of changes from one branch (directory) to another. This means that if you try to merge the same thing twice, you'll get a merge conflict (like applying a patch twice). svnmerge records which revisions have been merged into a branch. Now, the repos/ dir is what we use to keep track of the actual package repositories. Possible subdirectories of repos/ are core, core-64, extra, extra-64, testing, testing-64, unstable, and unstable-64. Any files in these directories are "tagged" as the versions in the package repository. Other directories will be allowed when we want to start making custom repos on the fly (kernel26, xorg, etc). If you think of the trunk/ directory as the HEAD CVS version (where most changes to a package are recorded), the repos/ subdirectories are our CURRENT/TESTING cvs tags. svnmerge helps us get the changes from trunk/ into our particular repo branch. The reason we use svnmerge instead of just svn copy is because svn copy won't overwrite directories ("svn cp trunk repos/core" if repos/core exists will create repos/core/core) and, more importantly, because we can make specific changes in the repos that aren't tracked in trunk. An example of such a situation came up with python and a security fix (I've modified the original situation slightly to explain things better). Python was updated in testing and a bug was found in the packaged version in extra. The bug had been fixed in the testing version, but people who weren't using testing didn't have access to it. In CVS, I would have to get a copy of the old PKGBUILD, roll back all the testing changes, apply the security fix patches, commit, tag, and apply the testing changes again for each architecture. In this new SVN scheme, I just have to edit the PKGBUILD in repos/extra and repos/extra-64. I guess I never went through the actual usage. Basic usage to expand on your example: cd ~/mypackages archco gtkpod cd gtkpod/trunk # Make changes svn commit -m 'my changes' archrelease extra To explain the python example up above: cd ~/mypackages archco gtkpod cd gtkpod/repos/extra # Make extra-specific changes svn commit -m 'my changes' cd ../../trunk # Make trunk-specific changes svn commit -m 'awesome new version with changes going to testing' archrelease testing # Time passes (signoffs, etc) archrelease extra There is one thing we'd want to check svnmerge for. In the second example, svnmerge will probably try to merge the specific extra changes with the overarching testing changes. If we applied a patch that isn't needed in the testing version, it'd be nice not to merge in this case. The problem is that it's a 3 or 4 step process with svn and one of those steps actually "untags" the package in the repo. We'd have to modify archrelease to handle this sort of case.
Other:
This implementation grew on me as archco did. I don't think it'd be as easy with an other SCM. But I still am a little put off by svn - after having used fancier SCMs out there, I don't really like the tamer ones anymore. git has some GREAT tools, but I don't think we'd need half of them.
I really do think in this case we're using the strengths that svn has that other version control systems don't. The biggest two are partial checkouts and branches being directories (which means very very simple branch management). The funny part is that these are the exact features that don't help when working on complex interrelated source code. That's why it's more difficult to apply some other version control to package management.
I'd still love to see a git or even mercurial implementation (bonus points: mercurial works on Win32 better, so we could ease into a 'cygwin' arch, which is something I wanted to try out for a while, hah). When I have time and finish up some things I wanted to finish, I will see about a git implementation. I believe Dan already has something, so he may beat me.
Doesn't git work on win32 as well? Jason
On 10/20/07, Jason Chu <jason@archlinux.org> wrote:
Doesn't git work on win32 as well?
A bit off topic, but... it "works" but poorly. Performance is like 10X worse.
On 10/19/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
I'd still love to see a git or even mercurial implementation (bonus points: mercurial works on Win32 better, so we could ease into a 'cygwin' arch, which is something I wanted to try out for a while, hah). When I have time and finish up some things I wanted to finish, I will see about a git implementation. I believe Dan already has something, so he may beat me.
Anyone else have something to add?
a) Moving packages from one repo to another is hard. In this case, the definition of a repository will be outside of the actual PKGBUILD tree. Similar to Jason's flat-tree layout, the actual
And now for part 2 of 2 of my email. This one may be a bit long so hold on to your hats and try to stay with me. Links: Gitweb: http://www.archlinux.org/~dan/newrepo/gitweb.cgi?p=repo.git;a=summary Download bare repo: http://www.archlinux.org/~dan/newrepo/repo.git.tar.gz Clone repo address: http://www.archlinux.org/~dan/newrepo/repo.git First off, I'm going to step through the examples in the repository, which is much easier to visualize if you have a local clone and are using a gitk or qgit-type tool where you can see the branches and merges. Let's start by stepping through my example repo. Example 1- simple commit. Should be self explanatory. Given a package already in the repo, update the PKGBUILD and any auxiliary files, and make the commit straight to the master branch. Any SCM can do this. Example 2- addition of a new package. As simple as git add <packagedir>, commit, and boom- we are in business. Example 3- single package bump through testing. Here is where the GIT approach will start to slide from a centralized SCM. Because of lightweight branching, we have no issues with creating a lot of branches. For my proposal, each independent "testing group" gets its own branch. In this commit's case, we are creating a group for just one package, udev. The steps here were "git checkout -b test_udev; <edit files>; git commit". Note the testing prefix- this will be used by repo-generation scripts to know that a package is in what we currently call the testing repository. Example 4- rebuild a set of packages. Here we see an example of rebuilding multiple packages. This happens frequently for us with things like db. I did the rebuild here in two separate commits- whether it is one commit or 100 it really doesn't matter as long as they are all done on the same branch. Example 5- kernel RC development. This is something I have yet to see CVS do well at. We haven't ever tracked this in a version control system simply because our current framework won't let us. Here, we simply have a long running branch with the 'devel_' prefix used to indicate it is something special. There will be a few more commits on this branch, but that won't hurt us later when we... < take a breather, still a few examples to go here :) > (unnumbered example)- oops, forgot to number a commit, but the one labeled "Kernel pkg bump" was a commit to testing for a new minor kernel release. Note that we can successfully use our VCS to track both a kernel RC and the mainline release. Example 6- merging test back into "current". Here we merge our pacman branch back into current, pulling all of the changes made on the branch in with us. This is a simple "git checkout master; git merge test_pacman". For your sake reading this, I did not delete the test_pacman branch tag at this point, although in the real usage of this system we would in order for DB scripts to drop the pacman package out of testing. Example 7- merge multiple test branches at once. Here we pulled both the udev and kernel branches back into master. (once again, we would delete the 'test_' branches.) Example 8- The new kernel is released. its simple to base this branch off of the RC PKGBUILDs we have already been tracking, so we do this. In this sense, we are successfully tracking three different kernel PKGBUILDs if we need that capability. Example 9- Another merge, this time the kernel. We would delete the test_kernel24 branch, but the devel_kernel branch could live on to support the next round of RCs. Wow, sorry about that overload there. Paul's list of problems to address: pushing of packages into repos wouldn't be determined by where the package files live, but by an external configuration. Thus all you would need to move a package is modify the config.
b) Placing packages in multiple repos is hard. I'm going to interpret this as the current/testing idea. With this system, placing a package in testing should be as easy as making the changes on a 'test_' topic branch.
c) Continued separate-track development on a package while in testing is hard. Addressed above.
d) Tracking multiple binary repos for different architectures is hard. This is something that needs to be addressed, and I do need to figure it out. The issue can be generalized more- how do we track what is actually released vs. what is in the repository? (A split CURRENT & HEAD tag scenario comes to mind from current CVS repositories) With GIT, we surely have the tools that can solve this issue but it needs some further thinking.
e) Maintenance of a package by more than one person is hard. Developers on this system can dive right into the full GIT toolset and leverage it to do what they need to do. If two people are working on a rebuild, they could easily push and pull trees between each other before making the final commit to the main repository. In additon, GIT merges rather than overwrites by default so it is hard to stomp on changes by other developers.
Things that need to be done: 1. Write a toolkit. Not looking for anything huge or crazy here (just take a look at Jason's SVN-based tools), but usage of this system by a developer should be easy. Typing one command instead of having to string together multiple GIT commands is a plus, and allows for more uniform procedure-following by all devs. 2. Examine feasibility of automated DB scripts. I think if things like the 'test_' prefix are done, we can automate db building when a master repository is pushed to. However, I don't know this for sure, and it would be wise to check on what would happen with edge cases and such. 3. Figure out best way to track binary repos and changes that haven't yet been pushed to a built package. Please comment on anything and everything above and I'll respond. And as I said in my first email tonight- so far Jason's SVN solution is the only other one that looks good (besides hopefully this one). Let's keep the ball rolling here. -Dan
Am Mon, 22 Oct 2007 21:32:44 -0500 schrieb "Dan McGee" <dpmcgee@gmail.com>:
Please comment on anything and everything above and I'll respond. And as I said in my first email tonight- so far Jason's SVN solution is the only other one that looks good (besides hopefully this one). Let's keep the ball rolling here.
I trust in you guys that work every day with those revision control software will choose the best one fitting our needs. Please make sure we will have an easy way to add more "branches" (so i would call them in svn) from trunk. Right now we have the various repos for two architectures. But a few devs have (serious?) plans or better certain ideas to add more branches in the future. Right now our infrastructure and manpower is the limit that prevents playing with more stuff. Andy
I'd just like to start by saying this is the best git suggestion I've seen so far ;) Well done.
And now for part 2 of 2 of my email. This one may be a bit long so hold on to your hats and try to stay with me.
Links: Gitweb: http://www.archlinux.org/~dan/newrepo/gitweb.cgi?p=repo.git;a=summary Download bare repo: http://www.archlinux.org/~dan/newrepo/repo.git.tar.gz Clone repo address: http://www.archlinux.org/~dan/newrepo/repo.git
First off, I'm going to step through the examples in the repository, which is much easier to visualize if you have a local clone and are using a gitk or qgit-type tool where you can see the branches and merges.
<snip> I like how the package repositories are separate from the version control repository. Like you say later, it makes it really easy to move packages from one repo to another. And it helps to treat package repositories as branches... though... how do you know what PKGBUILD to use for a repository other than core/extra? There's a test_kernel24, test_kernel, and test_udev. Each of them have different versions of kernel24, kernel, and udev. In the svn suggestion, we're explicit about which version is in testing, how would we figure it out in git? If all commits dealt with only one PKGBUILD, I could see this being figured out, but if you have multiple changes in a commit, they could conflict. What then? I can't really find a good place to put all these comments, so they're just going at the top here... I could see a problem with possibly using branches *and* version control repositories to both represent package repositories. Things could get lost.
Wow, sorry about that overload there.
a) Moving packages from one repo to another is hard. In this case, the definition of a repository will be outside of the actual PKGBUILD tree. Similar to Jason's flat-tree layout, the actual
Paul's list of problems to address: pushing of packages into repos wouldn't be determined by where the package files live, but by an external configuration. Thus all you would need to move a package is modify the config.
b) Placing packages in multiple repos is hard. I'm going to interpret this as the current/testing idea. With this system, placing a package in testing should be as easy as making the changes on a 'test_' topic branch.
c) Continued separate-track development on a package while in testing is hard. Addressed above.
d) Tracking multiple binary repos for different architectures is hard. This is something that needs to be addressed, and I do need to figure it out. The issue can be generalized more- how do we track what is actually released vs. what is in the repository? (A split CURRENT & HEAD tag scenario comes to mind from current CVS repositories) With GIT, we surely have the tools that can solve this issue but it needs some further thinking.
Yeah, you'd almost want to have a "release" repo and a "development" repo. Any patches that go into the "release" repo are actually in the package repositories and "development" is just the staging area for changes. To track separate longterm repositories, you could have multiple branches. You just have to be sure to not lose commits in other branches.
e) Maintenance of a package by more than one person is hard. Developers on this system can dive right into the full GIT toolset and leverage it to do what they need to do. If two people are working on a rebuild, they could easily push and pull trees between each other before making the final commit to the main repository. In additon, GIT merges rather than overwrites by default so it is hard to stomp on changes by other developers.
Things that need to be done: 1. Write a toolkit. Not looking for anything huge or crazy here (just take a look at Jason's SVN-based tools), but usage of this system by a developer should be easy. Typing one command instead of having to string together multiple GIT commands is a plus, and allows for more uniform procedure-following by all devs. 2. Examine feasibility of automated DB scripts. I think if things like the 'test_' prefix are done, we can automate db building when a master repository is pushed to. However, I don't know this for sure, and it would be wise to check on what would happen with edge cases and such. 3. Figure out best way to track binary repos and changes that haven't yet been pushed to a built package.
Please comment on anything and everything above and I'll respond. And as I said in my first email tonight- so far Jason's SVN solution is the only other one that looks good (besides hopefully this one). Let's keep the ball rolling here.
Jason
On 10/29/07, Jason Chu <jason@archlinux.org> wrote:
And it helps to treat package repositories as branches... though... how do you know what PKGBUILD to use for a repository other than core/extra? There's a test_kernel24, test_kernel, and test_udev. Each of them have different versions of kernel24, kernel, and udev.
In the svn suggestion, we're explicit about which version is in testing, how would we figure it out in git?
The intent is to actually have two different versions. If 'test_kernel' and 'test_udev' spawned off their own mini-repos, each would have a kernel24. They would need to be merged to the master branch later. When Dan and I talked about this, the whole purpose was experimental features and/or rebuilds. Here's a use-case: Paul makes his AUFS changes to the kernel on a test_aufs branch (which may include the aufs and aufs-utils packages too). This allows the normal kernel to continue on as planned. People test it, everyone likes it, changes get rebased a few times, and eventually merged to master.
On Mon, Oct 29, 2007 at 10:38:52AM -0500, Aaron Griffin wrote:
On 10/29/07, Jason Chu <jason@archlinux.org> wrote:
And it helps to treat package repositories as branches... though... how do you know what PKGBUILD to use for a repository other than core/extra? There's a test_kernel24, test_kernel, and test_udev. Each of them have different versions of kernel24, kernel, and udev.
In the svn suggestion, we're explicit about which version is in testing, how would we figure it out in git?
The intent is to actually have two different versions. If 'test_kernel' and 'test_udev' spawned off their own mini-repos, each would have a kernel24. They would need to be merged to the master branch later.
When Dan and I talked about this, the whole purpose was experimental features and/or rebuilds. Here's a use-case:
Paul makes his AUFS changes to the kernel on a test_aufs branch (which may include the aufs and aufs-utils packages too). This allows the normal kernel to continue on as planned. People test it, everyone likes it, changes get rebased a few times, and eventually merged to master.
Ah, so this would get rid of the testing repo entirely. Instead we'd have "topic" package repos that would have specific changes contained within them. If that's the case, I like it. Jason
On 10/29/07, Jason Chu <jason@archlinux.org> wrote:
On Mon, Oct 29, 2007 at 10:38:52AM -0500, Aaron Griffin wrote:
On 10/29/07, Jason Chu <jason@archlinux.org> wrote:
And it helps to treat package repositories as branches... though... how do you know what PKGBUILD to use for a repository other than core/extra? There's a test_kernel24, test_kernel, and test_udev. Each of them have different versions of kernel24, kernel, and udev.
In the svn suggestion, we're explicit about which version is in testing, how would we figure it out in git?
The intent is to actually have two different versions. If 'test_kernel' and 'test_udev' spawned off their own mini-repos, each would have a kernel24. They would need to be merged to the master branch later.
When Dan and I talked about this, the whole purpose was experimental features and/or rebuilds. Here's a use-case:
Paul makes his AUFS changes to the kernel on a test_aufs branch (which may include the aufs and aufs-utils packages too). This allows the normal kernel to continue on as planned. People test it, everyone likes it, changes get rebased a few times, and eventually merged to master.
Ah, so this would get rid of the testing repo entirely. Instead we'd have "topic" package repos that would have specific changes contained within them. If that's the case, I like it.
In a way, yes. Depending on the nomenclature we use though, we could always have a branch called "testing" that makes the testing repo. It's a harder sell, but the functionality is there. How we actually use it can be molded a little as we go.
On Mon, Oct 29, 2007 at 12:00:11PM -0500, Aaron Griffin wrote:
On 10/29/07, Jason Chu <jason@archlinux.org> wrote:
On Mon, Oct 29, 2007 at 10:38:52AM -0500, Aaron Griffin wrote:
On 10/29/07, Jason Chu <jason@archlinux.org> wrote:
And it helps to treat package repositories as branches... though... how do you know what PKGBUILD to use for a repository other than core/extra? There's a test_kernel24, test_kernel, and test_udev. Each of them have different versions of kernel24, kernel, and udev.
In the svn suggestion, we're explicit about which version is in testing, how would we figure it out in git?
The intent is to actually have two different versions. If 'test_kernel' and 'test_udev' spawned off their own mini-repos, each would have a kernel24. They would need to be merged to the master branch later.
When Dan and I talked about this, the whole purpose was experimental features and/or rebuilds. Here's a use-case:
Paul makes his AUFS changes to the kernel on a test_aufs branch (which may include the aufs and aufs-utils packages too). This allows the normal kernel to continue on as planned. People test it, everyone likes it, changes get rebased a few times, and eventually merged to master.
Ah, so this would get rid of the testing repo entirely. Instead we'd have "topic" package repos that would have specific changes contained within them. If that's the case, I like it.
In a way, yes. Depending on the nomenclature we use though, we could always have a branch called "testing" that makes the testing repo.
It's a harder sell, but the functionality is there. How we actually use it can be molded a little as we go.
Yeah, the big problem with this sort of system and having a testing branch/repo with it is that we like to release individual packages from testing instead of migrating the whole thing all at once. Cherrypicking at that point becomes quite tedious if possible at all. Jason
OK, This is part one of two on the killing CVS topic for me tonight. In this email, I'll respond to Jason's SVN suggestion. In the next email, I'll present my GIT suggestion. Let me start this off (and I will finish my other email this way as well) that to date, Jason's suggestion below is THE BEST solution I have seen yet. I'll let you be the judge on my GIT solution. On 10/17/07, Jason Chu <jason@archlinux.org> wrote:
A while ago there was talk of a different svn layout that we could use to help us track repos apart from the version control repo. For those of you that remember, good for you!
To quote an email from Paul, the problems this method is trying to solve are thus:
a) Moving packages from one repo to another is hard. b) Placing packages in multiple repos is hard. c) Continued separate-track development on a package while in testing is hard. d) Tracking multiple binary repos for different architectures is hard. e) Maintenance of a package by more than one person is hard.
It addresses all of these issues fairly well, if I do say so myself.
I tried writing some scripts for it and using a new tool (svnmerge) to possibly help keep versions in sync. I can recreate the svn repo using the newest changes in about an hour. I created the current repo based on changes as of last night.
I will now share what I have done.
First the svn repo:
http://projects.xennet.org/svnarch/
You can svn co the whole repo by itself, but (last time I tried) it takes about 2 hours to do (it isn't network traffic either... I think it's just a limitation of svn). I've noticed this at work as well- I do think an ssh-based checkout would go faster than an HTTP one?
A better suggestion (and the whole point of this layout) is to only check out the packages you need (and possibly even remove the working copies when you're done). This does seem like a plus. However, some other developers did bring up the point that in order to get all deps right you will probably need the whole tree anyway. I think the takeaway point here is it shouldn't be a pain in the ass to get everything.
I've written a couple of scripts (archco, archrelease, and archrm) to help with this flow.
The basic flow of this method goes like this:
1) archco package you want to update 2) edit the files in trunk and commit as if you were a developer doing whatever you wanted to do to source code 3) once all changes are commited, run archrelease <repo> from the trunk directory -- this will merge all unmerged changes from trunk into that repo or create the repo if it doesn't currently exist 4) archrm the directory
While a checkout of the entire repo takes 2 hours, checking out a package takes about 5 seconds.
Now, how does this address Paul's points:
a) Moving packages from one repo to another is a simple svn copy (or svnmerge, depending on the situation) Easy = good. Always. That is what I am afraid of if we pick any VCS besides CVS/SVN- the command set can be overwhelming and not familiar to anyone that has only used a centralized VCS.
b) To put a package in multiple repos, just archrelease the trunk (or svn copy or svnmerge from a different repo). CVS tags were a dirty but effective solution for doing what we needed to do with multiple repos, but it just didn't cut it when we had to manually move files around from current/core to extra and stuff. This is clean and simple.
c) Files in <pkgname>/repos/* can be edited and commited to as if they were in trunk. This should work even when wanting to merge other changes from trunk into that repo later. What would the advantage/disadvantage be of editing this file instead of the trunk file? If it was a testing branch file, I could see that. Actually nevermind, this makes sense- keep the edits local to where they belong, but make them at the highest point possible.
d) Different architectures are dealt with just like repos, it's the db scripts that will treat these directories differently. As long as the strategy could logically expand architectures, I like it.
e) Commits to trunk don't automatically go anywhere, people can make whatever changes they want without first rolling back other people's changes. This is smart and similar to the way HEAD and CURRENT can differ in our current repos.
The major flaw that I can find with this layout is that bulk editing becomes more difficult. Because we don't a) abuse CVS tags and b) check out the whole repository, mass changes are difficult to apply to packages. This could hurt when it comes to huge rebuilds.
Eventually, I'm confident that the tools we write can make up for this.
archco, archrelease, and archrm can be seen here:
http://projects.xennet.org/svnarch-tools/
Notice that these scripts are really simple. archrelease would need to be expanded later (as would the FIXME in archrm).
-Dan
I've noticed this at work as well- I do think an ssh-based checkout would go faster than an HTTP one?
Is this a question? I'm pretty sure that, in addition to speed ups because of more revisions, using ssh-based checkouts or svn:// based checkout would also make things faster.
The major flaw that I can find with this layout is that bulk editing becomes more difficult. Because we don't a) abuse CVS tags and b) check out the whole repository, mass changes are difficult to apply to packages. This could hurt when it comes to huge rebuilds.
I still think that we can extend tools around archrelease and archco to make changes on a mass scale. They wouldn't be used as often, but would be available when we needed them. Jason
Tuesday 16 October 2007, Aaron Griffin wrote: | * Pros and cons are useless here. As I said in the first point, we | have no need for the advanced features based on our usage | patterns. Sure, we can make use of them later, but right now it | shouldn't be weighting any decision. thats why i would work with whatever you decide. its a tool we discuss here and we use it not for open-heart surgery but for hitting nails in wood... any heavy tool would do... of course a hammer has its advantages to some scisors... but the decision what hammer is arbituary :) ... so pick one that is good for nails and give it to me... preferentially with a instruction manual if its one of these "for professionals" ones :) LOL ... its very late here... actually its quite early arleady and this email is written maybe in a bit a funny way, sorry for that. - D -- .·´¯`·.¸.·´¯`·.¸¸.·´¯`·.¸.·´¯`·.¸.·´¯`·.¸.·´¯`·.¸¸.·´ ° ° ° ° ° ° ><((((º> ° ° ° ° ° <º)))>< <º)))><
On 10/16/07, Damir Perisa <damir.perisa@solnet.ch> wrote:
Tuesday 16 October 2007, Aaron Griffin wrote: | * Pros and cons are useless here. As I said in the first point, we | have no need for the advanced features based on our usage | patterns. Sure, we can make use of them later, but right now it | shouldn't be weighting any decision.
thats why i would work with whatever you decide.
its a tool we discuss here and we use it not for open-heart surgery but for hitting nails in wood... any heavy tool would do... of course a hammer has its advantages to some scisors... but the decision what hammer is arbituary :)
... so pick one that is good for nails and give it to me... preferentially with a instruction manual if its one of these "for professionals" ones :)
LOL ... its very late here... actually its quite early arleady and this email is written maybe in a bit a funny way, sorry for that.
Actually, I really like this email. It translates what I was trying to say into Damirisms!
participants (8)
-
Aaron Griffin
-
Andreas Radke
-
Damir Perisa
-
Dan McGee
-
eliott
-
Jason Chu
-
Paul Mattal
-
Thomas Bächler