[pacman-dev] Changing how repo dbs are updated
Hi, This message is more of a sounding board for me to get the issues surrounding this sorted and point out what I am planning to do. But any comments on this would also be appreciated. Especially #3 below. Issue I see currently and in the future with signed databases: 1) Currently the repo dbs are updated just like downloading a package file. If the update is started and canceled part way though, you get a repo.db.part file which pacman attempts to continue downloading. However, unlike package files, this file is not static content and so we should never continue the download. See https://bugs.archlinux.org/task/15657 . This can be handled by just deleting the repo.db.part file if present, but it might be better just never create .part files in the first place for repo dbs by downloading to a temporary location and moving/deleting based on successful completion. That would mean having a different download function for repo dbs and packages. See #2 for additional reasons to split this... 2) Database signing. Currently the code downloads the database, deletes the old now invalid signature, then downloads the new signature. If the signature is valid, then all is fine. However, if it fails to download or is invalid, pacman issues an error about failing to update the database. The database on your system is now not correctly signed (which is bad given its signature is only checked on update...). I think that the old database and signature should only be overwritten if the new database download is successful _and_ its signature is valid. This requires downloading the database and its signature to a temporary location and then moving the files only once they are confirmed valid. That would require a different download interface for package and database downloads, but that is a good thing as we can get rid of the force crap from the one used for packages. 3) pacman -Syy behavior. Instead of adding a "force" flag to overwrite the old database, would it be better to just delete the old database first? Currently, if you use pacman -Syy and a database download fails, you are left with the old sync database you told pacman to get rid of. Is leaving pacman with no database for that repo a better solution? I'm not sure about #3... But to fix #1 and #2, I think we need to split the download handling for dbs and packages slightly unless someone has a better idea of how to deal with those? Allan
On Wed, Dec 1, 2010 at 12:12 AM, Allan McRae <allan@archlinux.org> wrote:
3) pacman -Syy behavior. Instead of adding a "force" flag to overwrite the old database, would it be better to just delete the old database first? Currently, if you use pacman -Syy and a database download fails, you are left with the old sync database you told pacman to get rid of. Is leaving pacman with no database for that repo a better solution?
Allan
I'm not currently very familiar with pacmans codebase or operations on the dev side, but would it be better to backup the existing db, overwrite the existing db, and then delete the old db? That way if a database download fails, it can be rolled back.
On 02/12/10 00:40, Jeremiah Dodds wrote:
On Wed, Dec 1, 2010 at 12:12 AM, Allan McRae<allan@archlinux.org> wrote:
3) pacman -Syy behavior. Instead of adding a "force" flag to overwrite the old database, would it be better to just delete the old database first? Currently, if you use pacman -Syy and a database download fails, you are left with the old sync database you told pacman to get rid of. Is leaving pacman with no database for that repo a better solution?
Allan
I'm not currently very familiar with pacmans codebase or operations on the dev side, but would it be better to backup the existing db, overwrite the existing db, and then delete the old db? That way if a database download fails, it can be rolled back.
That is essentially what I proposed for #1 and #2 in my original email. The point with "-Syy" is that you have said you really do not what the old database, so is keeping it as a fallback the correct thing to do in this case? Allan
On Thu, 2010-12-02 at 08:54 +1000, Allan McRae wrote:
On 02/12/10 00:40, Jeremiah Dodds wrote:
On Wed, Dec 1, 2010 at 12:12 AM, Allan McRae<allan@archlinux.org> wrote:
3) pacman -Syy behavior. Instead of adding a "force" flag to overwrite the old database, would it be better to just delete the old database first? Currently, if you use pacman -Syy and a database download fails, you are left with the old sync database you told pacman to get rid of. Is leaving pacman with no database for that repo a better solution?
Allan
I'm not currently very familiar with pacmans codebase or operations on the dev side, but would it be better to backup the existing db, overwrite the existing db, and then delete the old db? That way if a database download fails, it can be rolled back.
That is essentially what I proposed for #1 and #2 in my original email. The point with "-Syy" is that you have said you really do not what the old database, so is keeping it as a fallback the correct thing to do in this case?
Allan
I'm not a pacman developer, but wasn't there to be a method for updating packages via binary patches?
On 02/12/10 10:58, Yaro Kasear wrote:
I'm not a pacman developer, but wasn't there to be a method for updating packages via binary patches?
Umm.. yes... that has been around for ages. But it has nothing to do with this. Allan
On Wed, Dec 1, 2010 at 5:54 PM, Allan McRae <allan@archlinux.org> wrote:
That is essentially what I proposed for #1 and #2 in my original email. The point with "-Syy" is that you have said you really do not what the old database, so is keeping it as a fallback the correct thing to do in this case?
Allan
I can't imagine too many users would prefer to be left with no db versus their old one they were attempting to replace, given the choice.
On Tue, Nov 30, 2010 at 11:12 PM, Allan McRae <allan@archlinux.org> wrote:
Hi,
This message is more of a sounding board for me to get the issues surrounding this sorted and point out what I am planning to do. But any comments on this would also be appreciated. Especially #3 below.
Issue I see currently and in the future with signed databases:
1) Currently the repo dbs are updated just like downloading a package file. If the update is started and canceled part way though, you get a repo.db.part file which pacman attempts to continue downloading. However, unlike package files, this file is not static content and so we should never continue the download. See https://bugs.archlinux.org/task/15657 . This can be handled by just deleting the repo.db.part file if present, but it might be better just never create .part files in the first place for repo dbs by downloading to a temporary location and moving/deleting based on successful completion. That would mean having a different download function for repo dbs and packages. See #2 for additional reasons to split this...
False? At least if the remote server is not broken, commit d2dbb04a9a should have definitely fixed this. With that said, the only (external of dload.c) user of download_single_file() is the db download code at the moment.
2) Database signing. Currently the code downloads the database, deletes the old now invalid signature, then downloads the new signature. If the signature is valid, then all is fine. However, if it fails to download or is invalid, pacman issues an error about failing to update the database. The database on your system is now not correctly signed (which is bad given its signature is only checked on update...).
I think that the old database and signature should only be overwritten if the new database download is successful _and_ its signature is valid. This requires downloading the database and its signature to a temporary location and then moving the files only once they are confirmed valid. That would require a different download interface for package and database downloads, but that is a good thing as we can get rid of the force crap from the one used for packages.
But you can't, unless you are required to provide two callbacks for downloading files. :/ And don't forget that it would be good to support standalone package sigs; e.g. if I do pacman -U http://example.com/mypkgs/foobar-1.0-arch.pkg.tar.xz I would expect it to "do the right thing" and also look for a .sig there as well.
3) pacman -Syy behavior. Instead of adding a "force" flag to overwrite the old database, would it be better to just delete the old database first? Currently, if you use pacman -Syy and a database download fails, you are left with the old sync database you told pacman to get rid of. Is leaving pacman with no database for that repo a better solution?
I don't think so- this seems similar to #1 and leaving the user in a worse situation than they originally had. We really interpret -yy as "download the remote DB regardless of whether you think it has been updated", but if we wanted to change that, I guess I'm not completely opposed if we document it as so and change peoples expectations. A failed -Syy (by a superuser) keeps any non-superuser from doing -Si, etc. so that is no good.
I'm not sure about #3... But to fix #1 and #2, I think we need to split the download handling for dbs and packages slightly unless someone has a better idea of how to deal with those?
Allan
On 03/12/10 02:14, Dan McGee wrote:
On Tue, Nov 30, 2010 at 11:12 PM, Allan McRae<allan@archlinux.org> wrote:
Hi,
This message is more of a sounding board for me to get the issues surrounding this sorted and point out what I am planning to do. But any comments on this would also be appreciated. Especially #3 below.
Issue I see currently and in the future with signed databases:
1) Currently the repo dbs are updated just like downloading a package file. If the update is started and canceled part way though, you get a repo.db.part file which pacman attempts to continue downloading. However, unlike package files, this file is not static content and so we should never continue the download. See https://bugs.archlinux.org/task/15657 . This can be handled by just deleting the repo.db.part file if present, but it might be better just never create .part files in the first place for repo dbs by downloading to a temporary location and moving/deleting based on successful completion. That would mean having a different download function for repo dbs and packages. See #2 for additional reasons to split this...
False? At least if the remote server is not broken, commit d2dbb04a9a should have definitely fixed this. With that said, the only (external of dload.c) user of download_single_file() is the db download code at the moment.
Correct. Looks like that bug can be closed...
2) Database signing. Currently the code downloads the database, deletes the old now invalid signature, then downloads the new signature. If the signature is valid, then all is fine. However, if it fails to download or is invalid, pacman issues an error about failing to update the database. The database on your system is now not correctly signed (which is bad given its signature is only checked on update...).
I think that the old database and signature should only be overwritten if the new database download is successful _and_ its signature is valid. This requires downloading the database and its signature to a temporary location and then moving the files only once they are confirmed valid. That would require a different download interface for package and database downloads, but that is a good thing as we can get rid of the force crap from the one used for packages.
But you can't, unless you are required to provide two callbacks for downloading files. :/ And don't forget that it would be good to support standalone package sigs; e.g. if I do
pacman -U http://example.com/mypkgs/foobar-1.0-arch.pkg.tar.xz
I would expect it to "do the right thing" and also look for a .sig there as well.
I had not thought of the remote pacman -U case. I need to think how best to download signatures some more and come up with a new plan.
3) pacman -Syy behavior. Instead of adding a "force" flag to overwrite the old database, would it be better to just delete the old database first? Currently, if you use pacman -Syy and a database download fails, you are left with the old sync database you told pacman to get rid of. Is leaving pacman with no database for that repo a better solution?
I don't think so- this seems similar to #1 and leaving the user in a worse situation than they originally had. We really interpret -yy as "download the remote DB regardless of whether you think it has been updated", but if we wanted to change that, I guess I'm not completely opposed if we document it as so and change peoples expectations. A failed -Syy (by a superuser) keeps any non-superuser from doing -Si, etc. so that is no good.
OK - I'm now convinced that this is a bad idea. :P
participants (4)
-
Allan McRae
-
Dan McGee
-
Jeremiah Dodds
-
Yaro Kasear