[pacman-dev] Huge cleanup of add.c and my attempt at fixing FS#7484
Ok, I've been slaving away at this for a while and libalpm/add.c is finally getting a bit more manageable. Showing a patch is worthless because it is huge and is much harder to follow due to the refactoring stuff. Instead I've attached a drop-in add.c that compiles cleanly with no other changes needed. I'm looking for some feedback on this. Some highlights: 1. All pactests still pass, so no regressions on that front. 2. *Much* cleaner layout. add_commit has now been split across four functions (upgrade_remove, extract_single_file, commit_single_pkg, and _alpm_add_commit). 3. No more super functions. The heaviest function (extract_single_file) now weighs in at 350 lines, and a lot of that is comments. 4. Most of what Andrew started to do in his 7484 patch has been brought into this. However, I rewrote it a bit for clarity (at least I thought so). We may need a bit more meat added to this section. Comments/concerns/questions appreciated. I still want to cleanup the huge if(needbackup) section a bit, that is still quite long but hard to refactor. -Dan
Dan McGee wrote:
Ok, I've been slaving away at this for a while and libalpm/add.c is finally getting a bit more manageable. Showing a patch is worthless because it is huge and is much harder to follow due to the refactoring stuff. Instead I've attached a drop-in add.c that compiles cleanly with no other changes needed. I'm looking for some feedback on this.
Some highlights: 1. All pactests still pass, so no regressions on that front. 2. *Much* cleaner layout. add_commit has now been split across four functions (upgrade_remove, extract_single_file, commit_single_pkg, and _alpm_add_commit). 3. No more super functions. The heaviest function (extract_single_file) now weighs in at 350 lines, and a lot of that is comments. 4. Most of what Andrew started to do in his 7484 patch has been brought into this. However, I rewrote it a bit for clarity (at least I thought so). We may need a bit more meat added to this section.
Comments/concerns/questions appreciated. I still want to cleanup the huge if(needbackup) section a bit, that is still quite long but hard to refactor.
-Dan
------------------------------------------------------------------------
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev A HUGE improvement :) The only thing missing is the directory permission checks and warning (see the latest patch on my FS7484 branch).
I've got a couple of ideas on how pacman should install a package, your break up of add_commit matches up fairly close to what I was planning. With the exception of the backup code, I'm thinking that all the conflict checks and checks introduced for FS7484 should be done before we start adding/removing files from the users system. But this is a post 3.1 task :) Andrew
On 7/14/07, Andrew Fyfe <andrew@neptune-one.net> wrote:
Dan McGee wrote:
Ok, I've been slaving away at this for a while and libalpm/add.c is finally getting a bit more manageable. Showing a patch is worthless because it is huge and is much harder to follow due to the refactoring stuff. Instead I've attached a drop-in add.c that compiles cleanly with no other changes needed. I'm looking for some feedback on this.
Some highlights: 1. All pactests still pass, so no regressions on that front. 2. *Much* cleaner layout. add_commit has now been split across four functions (upgrade_remove, extract_single_file, commit_single_pkg, and _alpm_add_commit). 3. No more super functions. The heaviest function (extract_single_file) now weighs in at 350 lines, and a lot of that is comments. 4. Most of what Andrew started to do in his 7484 patch has been brought into this. However, I rewrote it a bit for clarity (at least I thought so). We may need a bit more meat added to this section.
Comments/concerns/questions appreciated. I still want to cleanup the huge if(needbackup) section a bit, that is still quite long but hard to refactor.
-Dan
------------------------------------------------------------------------
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev A HUGE improvement :) The only thing missing is the directory permission checks and warning (see the latest patch on my FS7484 branch).
I'm not too keen on having to add that strmode function just for this- I think we can get away with printing numeric style modes, such as 4777, 0644, etc. I should add some warning stuff in there though when permissions differ. Did we make a call yet on whether we should change the permissions to those of the installing package, or leave it for the user to do?
I've got a couple of ideas on how pacman should install a package, your break up of add_commit matches up fairly close to what I was planning. With the exception of the backup code, I'm thinking that all the conflict checks and checks introduced for FS7484 should be done before we start adding/removing files from the users system. But this is a post 3.1 task :)
Remember that all this conflict checking stuff does in fact get done before this add_commit code runs, so we could do all this then as well. It shouldn't be too hard to drop it in, although that will entail doing a second run though the archive unless we extract once to /tmp and copy later. My biggest worry about extracting to /tmp would be filling it up for those that use a tmpfs and are installing a very large package (some things in AUR, openoffice, etc). -Dan
On 7/14/07, Andrew Fyfe <andrew@neptune-one.net> wrote:
Dan McGee wrote:
Ok, I've been slaving away at this for a while and libalpm/add.c is finally getting a bit more manageable. Showing a patch is worthless because it is huge and is much harder to follow due to the refactoring stuff. Instead I've attached a drop-in add.c that compiles cleanly with no other changes needed. I'm looking for some feedback on this.
Some highlights: 1. All pactests still pass, so no regressions on that front. 2. *Much* cleaner layout. add_commit has now been split across four functions (upgrade_remove, extract_single_file, commit_single_pkg, and _alpm_add_commit). 3. No more super functions. The heaviest function (extract_single_file) now weighs in at 350 lines, and a lot of that is comments. 4. Most of what Andrew started to do in his 7484 patch has been brought into this. However, I rewrote it a bit for clarity (at least I thought so). We may need a bit more meat added to this section.
Comments/concerns/questions appreciated. I still want to cleanup the huge if(needbackup) section a bit, that is still quite long but hard to refactor.
-Dan
------------------------------------------------------------------------
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev A HUGE improvement :) The only thing missing is the directory permission checks and warning (see the latest patch on my FS7484 branch).
I've got a couple of ideas on how pacman should install a package, your break up of add_commit matches up fairly close to what I was planning. With the exception of the backup code, I'm thinking that all the conflict checks and checks introduced for FS7484 should be done before we start adding/removing files from the users system. But this is a post 3.1 task :)
Here is a slightly updated version with some directory permission stuff added and a few other changes: http://code.toofishes.net/gitweb.cgi?p=pacman.git;a=shortlog;h=clean_add or just grab my clean_add branch. -Dan
On Sat, Jul 14, 2007 at 02:54:50PM -0400, Dan McGee wrote:
Ok, I've been slaving away at this for a while and libalpm/add.c is finally getting a bit more manageable. Showing a patch is worthless because it is huge and is much harder to follow due to the refactoring stuff. Instead I've attached a drop-in add.c that compiles cleanly with no other changes needed. I'm looking for some feedback on this.
Some highlights: 1. All pactests still pass, so no regressions on that front. 2. *Much* cleaner layout. add_commit has now been split across four functions (upgrade_remove, extract_single_file, commit_single_pkg, and _alpm_add_commit). 3. No more super functions. The heaviest function (extract_single_file) now weighs in at 350 lines, and a lot of that is comments. Comments/concerns/questions appreciated. I still want to cleanup the huge if(needbackup) section a bit, that is still quite long but hard to refactor.
Very nice job here, but yes, there is still that part to clean up. Maybe the suggestion from Aaron here could give some ideas : http://www.archlinux.org/pipermail/pacman-dev/2006-December/006286.html
4. Most of what Andrew started to do in his 7484 patch has been brought into this. However, I rewrote it a bit for clarity (at least I thought so). We may need a bit more meat added to this section.
Yes I find it more clear that way, but there are still several things to take care of in my opinion. + /* Check for file existence. This is one of the more crucial parts + * to get 'right'. Here are the possibilities, with the filesystem + * on the left and the package on the top: + * (F=file, N=node, S=symlink, D=dir) + * | F/N | S | D + * non-existent | 1 | 2 | 3 + * F/N | 4 | 5 | 6 + * S | 7 | 8 | 9 + * D | 10 | 11 | 12 + * + * 1,2,3- just extract, no magic necessary. lstat will fail here. + * 4,5,6,7,8,10- conflict checks should have caught this. overwrite/backup. + * 9- follow the symlink, hopefully it is a directory + * 11,12- skip extraction, dir already exists + */ About cases 6 and 10, I agree that conflict checks should catch this, but do they ? (they don't from my understanding of the code) About cases 9 and 11, I think it depends on the nature of the symlink : * if the symlink points to a directory, then skip extraction * if not, then conflict
On 7/15/07, Xavier <shiningxc@gmail.com> wrote:
On Sat, Jul 14, 2007 at 02:54:50PM -0400, Dan McGee wrote:
Ok, I've been slaving away at this for a while and libalpm/add.c is finally getting a bit more manageable. Showing a patch is worthless because it is huge and is much harder to follow due to the refactoring stuff. Instead I've attached a drop-in add.c that compiles cleanly with no other changes needed. I'm looking for some feedback on this.
Some highlights: 1. All pactests still pass, so no regressions on that front. 2. *Much* cleaner layout. add_commit has now been split across four functions (upgrade_remove, extract_single_file, commit_single_pkg, and _alpm_add_commit). 3. No more super functions. The heaviest function (extract_single_file) now weighs in at 350 lines, and a lot of that is comments. Comments/concerns/questions appreciated. I still want to cleanup the huge if(needbackup) section a bit, that is still quite long but hard to refactor.
Very nice job here, but yes, there is still that part to clean up. Maybe the suggestion from Aaron here could give some ideas : http://www.archlinux.org/pipermail/pacman-dev/2006-December/006286.html
4. Most of what Andrew started to do in his 7484 patch has been brought into this. However, I rewrote it a bit for clarity (at least I thought so). We may need a bit more meat added to this section.
Yes I find it more clear that way, but there are still several things to take care of in my opinion.
+ /* Check for file existence. This is one of the more crucial parts + * to get 'right'. Here are the possibilities, with the filesystem + * on the left and the package on the top: + * (F=file, N=node, S=symlink, D=dir) + * | F/N | S | D + * non-existent | 1 | 2 | 3 + * F/N | 4 | 5 | 6 + * S | 7 | 8 | 9 + * D | 10 | 11 | 12 + * + * 1,2,3- just extract, no magic necessary. lstat will fail here. + * 4,5,6,7,8,10- conflict checks should have caught this. overwrite/backup. + * 9- follow the symlink, hopefully it is a directory + * 11,12- skip extraction, dir already exists + */
About cases 6 and 10, I agree that conflict checks should catch this, but do they ? (they don't from my understanding of the code)
Then we should fix that in the conflict check code, probably. I haven't looked at it enough. By the way- is there a reason we even use the filelist provided with the package? Any package can lie about it, and since we go though the archive once anyway we might as well just use our own generated filelist. We can also gather information about each of the files at this time (dir, file, symlink, etc.).
About cases 9 and 11, I think it depends on the nature of the symlink : * if the symlink points to a directory, then skip extraction * if not, then conflict
If you look at the case 9 code, then you'll see that this is what happens, or at least tries to. I'm not sure how effective returning 1 is by extract_single_file, however it at least spits a warning/error here. For case 11, we don't care about the nature of the symlink- we want to skip extracting the symlink. See the explanation in dpkg code for details, which I have put below: * = If we are trying to install a symlink we do nothing - ie, * dpkg will never replace a directory tree with a symlink. This * is to avoid embarrassing effects such as replacing a directory * tree with a link to a link to the original directory tree. I think this is valid. Perhaps I should put a note like this in the code. In addition, I just realized 10 needs a bit more code to get it to work properly, because we would have to unlink an entire directory tree. However, this is quite dangerous as well in that it could interfere with other packages files. Perhaps we should never allow a file to overwrite a directory, even in a force situation? Comments welcome. New code comments: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index 2c1aa2c..9019c29 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -392,9 +392,14 @@ static int extract_single_file(struct archive *archive, * D | 10 | 11 | 12 * * 1,2,3- just extract, no magic necessary. lstat will fail here. - * 4,5,6,7,8,10- conflict checks should have caught this. overwrite/backup. - * 9- follow the symlink, hopefully it is a directory - * 11,12- skip extraction, dir already exists + * 4,5,6,7,8- conflict checks should have caught this. either overwrite + * or backup the file. + * 9- follow the symlink, hopefully it is a directory, check it. + * 10- file replacing directory. a simple unlink probably won't work here. + * TODO handle this case, it is a *very* dangerous one. + * 11- don't extract symlink- a dir exists here. we don't want links to + * links, etc. + * 12- skip extraction, dir already exists. */ struct stat lsbuf; if(lstat(filename, &lsbuf) != 0) { -Dan
On Sun, Jul 15, 2007 at 12:09:37PM -0400, Dan McGee wrote:
About cases 6 and 10, I agree that conflict checks should catch this, but do they ? (they don't from my understanding of the code)
Then we should fix that in the conflict check code, probably. I haven't looked at it enough.
Ok, I just felt it was important to check that :)
By the way- is there a reason we even use the filelist provided with the package? Any package can lie about it, and since we go though the archive once anyway we might as well just use our own generated filelist. We can also gather information about each of the files at this time (dir, file, symlink, etc.).
I think we could do that. Also that's what Andrew suggested here : http://www.archlinux.org/pipermail/pacman-dev/2007-July/008867.html except we maybe don't even need to extract everything to /tmp first.
About cases 9 and 11, I think it depends on the nature of the symlink : * if the symlink points to a directory, then skip extraction * if not, then conflict
If you look at the case 9 code, then you'll see that this is what happens, or at least tries to. I'm not sure how effective returning 1 is by extract_single_file, however it at least spits a warning/error here.
Indeed, I just meant that the description in the chart wasn't very clear. And also, that in the case it doesn't point to a symlink, pacman should first detect it as a conflict, and simply fails, before it's too late.
For case 11, we don't care about the nature of the symlink- we want to skip extracting the symlink. See the explanation in dpkg code for details, which I have put below: * = If we are trying to install a symlink we do nothing - ie, * dpkg will never replace a directory tree with a symlink. This * is to avoid embarrassing effects such as replacing a directory * tree with a link to a link to the original directory tree.
I think this is valid. Perhaps I should put a note like this in the code.
Ah ok, I guess that makes sense.
In addition, I just realized 10 needs a bit more code to get it to work properly, because we would have to unlink an entire directory tree. However, this is quite dangerous as well in that it could interfere with other packages files. Perhaps we should never allow a file to overwrite a directory, even in a force situation? Comments welcome.
Indeed, I think we should never allow that. But unlink and libarchive probably don't as well, they won't remove a whole directory afaik. So without --force , pacman should really just fail because of a conflict. With --force, it doesn't really matter, because unlink and libarchive should just fail.
New code comments: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index 2c1aa2c..9019c29 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -392,9 +392,14 @@ static int extract_single_file(struct archive *archive, * D | 10 | 11 | 12 * * 1,2,3- just extract, no magic necessary. lstat will fail here. - * 4,5,6,7,8,10- conflict checks should have caught this. overwrite/backup. - * 9- follow the symlink, hopefully it is a directory - * 11,12- skip extraction, dir already exists + * 4,5,6,7,8- conflict checks should have caught this. either overwrite + * or backup the file.
So as said above, 6 needs to be checked so that is true.
+ * 9- follow the symlink, hopefully it is a directory, check it. + * 10- file replacing directory. a simple unlink probably won't work here. + * TODO handle this case, it is a *very* dangerous one.
Just like 6, this should also fail with a conflict. And if --force flag is used, unlink and libarchive shouldn't do anything as dangerous as deleting a whole directory (but maybe double check that).
+ * 11- don't extract symlink- a dir exists here. we don't want links to + * links, etc.
Hm, now that I'm looking at this again, I think 10 and 11 should be the same (file or symlink replacing a directory).
+ * 12- skip extraction, dir already exists. */ struct stat lsbuf; if(lstat(filename, &lsbuf) != 0) {
On 7/15/07, Xavier <shiningxc@gmail.com> wrote:
On Sun, Jul 15, 2007 at 12:09:37PM -0400, Dan McGee wrote:
About cases 6 and 10, I agree that conflict checks should catch this, but do they ? (they don't from my understanding of the code)
Then we should fix that in the conflict check code, probably. I haven't looked at it enough.
Ok, I just felt it was important to check that :)
By the way- is there a reason we even use the filelist provided with the package? Any package can lie about it, and since we go though the archive once anyway we might as well just use our own generated filelist. We can also gather information about each of the files at this time (dir, file, symlink, etc.).
I think we could do that. Also that's what Andrew suggested here : http://www.archlinux.org/pipermail/pacman-dev/2007-July/008867.html
except we maybe don't even need to extract everything to /tmp first.
About cases 9 and 11, I think it depends on the nature of the symlink : * if the symlink points to a directory, then skip extraction * if not, then conflict
If you look at the case 9 code, then you'll see that this is what happens, or at least tries to. I'm not sure how effective returning 1 is by extract_single_file, however it at least spits a warning/error here.
Indeed, I just meant that the description in the chart wasn't very clear. And also, that in the case it doesn't point to a symlink, pacman should first detect it as a conflict, and simply fails, before it's too late.
For case 11, we don't care about the nature of the symlink- we want to skip extracting the symlink. See the explanation in dpkg code for details, which I have put below: * = If we are trying to install a symlink we do nothing - ie, * dpkg will never replace a directory tree with a symlink. This * is to avoid embarrassing effects such as replacing a directory * tree with a link to a link to the original directory tree.
I think this is valid. Perhaps I should put a note like this in the code.
Ah ok, I guess that makes sense.
In addition, I just realized 10 needs a bit more code to get it to work properly, because we would have to unlink an entire directory tree. However, this is quite dangerous as well in that it could interfere with other packages files. Perhaps we should never allow a file to overwrite a directory, even in a force situation? Comments welcome.
Indeed, I think we should never allow that. But unlink and libarchive probably don't as well, they won't remove a whole directory afaik. So without --force , pacman should really just fail because of a conflict. With --force, it doesn't really matter, because unlink and libarchive should just fail.
New code comments: diff --git a/lib/libalpm/add.c b/lib/libalpm/add.c index 2c1aa2c..9019c29 100644 --- a/lib/libalpm/add.c +++ b/lib/libalpm/add.c @@ -392,9 +392,14 @@ static int extract_single_file(struct archive *archive, * D | 10 | 11 | 12 * * 1,2,3- just extract, no magic necessary. lstat will fail here. - * 4,5,6,7,8,10- conflict checks should have caught this. overwrite/backup. - * 9- follow the symlink, hopefully it is a directory - * 11,12- skip extraction, dir already exists + * 4,5,6,7,8- conflict checks should have caught this. either overwrite + * or backup the file.
So as said above, 6 needs to be checked so that is true.
+ * 9- follow the symlink, hopefully it is a directory, check it. + * 10- file replacing directory. a simple unlink probably won't work here. + * TODO handle this case, it is a *very* dangerous one.
Just like 6, this should also fail with a conflict. And if --force flag is used, unlink and libarchive shouldn't do anything as dangerous as deleting a whole directory (but maybe double check that).
+ * 11- don't extract symlink- a dir exists here. we don't want links to + * links, etc.
Hm, now that I'm looking at this again, I think 10 and 11 should be the same (file or symlink replacing a directory).
+ * 12- skip extraction, dir already exists. */ struct stat lsbuf; if(lstat(filename, &lsbuf) != 0) {
I made a few changes after thinking about the above comments to my patch, and it is now on my working branch. More explicit handling of case 10 is done. -Dan
participants (3)
-
Andrew Fyfe
-
Dan McGee
-
Xavier