[arch-general] Ye Olde Package Manager
On Thu, Jan 13, 2011 at 12:31 AM, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
On 01/12/2011 10:51 PM, C Anthony Risinger wrote:
I want new thoughts about what the meaning of a package is ... and how it can be better represented for derivation from it's integrals; source code and user state. the tarball has served us well for a long time, but ... in the now-time of cheap encryption, distributed clouds, parallel peers and ...
So you want to talk calculus do you? Perhaps the method of strained time and strained coordinates would would give a better approximation? Poincare-Linstedt maybe?
ehm, it was more a play on words; but yeah, a package is a living target, derived from and bound to higher order constraints. they don't need to be the static, rigid snapshots they currently are, but it requires a different way of thinking about what a package is, and the purpose it fulfills.
Seriously, I've been very impressed with Arch packaging compared to rpm or deb. KISS has served Arch well.
oh sure it works well enough, else i probably wouldn't be here right now having this conversation... but that doesn't mean there is anything novel or even remotely interesting about pacman. they all suck equally IMO, but if your willing to part with the various management features pacman drops and the complexities it adds to be "KISS", then it can usually drag you where you want to be. if you think a package's purpose is to simply add some files to your system in a revocable way then you've missed the point; computers work for us. _i_ chose to install that package for a purpose; _i_ want the computer to assume some initial state, any react to system events. _i_ don't really give a !@#$ about files, how it accomplishes my directive, or what the computer thinks it wants.
The only thoughts I've had on the issue were with the gzipped package directory format of the repository indexes, but I don't see anything wrong with it. It works as well as it would in a flat file format or in some record oriented db format. I'm sure there is a faster way to do it, but I haven't noticed any slowness with the current system.
it's less about speed, and more about creating a package structure that's able to track the source it's derived from, merge with custom "code sinks"/overlays, and integrate itself with the behavioral requirements of the target system, as defined by the integrator/user. it's about creating a structure that is at it's core cross-distro; a structure with the plasticity to absorb modifications from many sources, re-configure and trim itself as needed, and acquire the bits it needs to assume the state asked of it.
A package in any distro provides the necessary installables along with a way to check for conflicts and resolve dependencies. The current packages do that well.
sure, they do it well if, during each release iteration, every little step has been perfectly spelled out for them by a human "packager", you don't deviate from the "official" monolithic package pools, and you never want to do anything interesting or custom. like i said, a bulk download + extraction is the part the manager performs ... ie. brain dead ... the rest of the logic was hand crafted by humans.
The only differences that I've seen with pacman vs. some of the others were in areas of handling or anticipating previously installed config or font files that cause rare install failures. But, I think the KISS vs. 'try to anticipate everything' argument ways in favor of not radically changing how pacman handles these issues -- unless there is just a wealth of unused resources laying around to experiment with.
it's not about anticipation of the future/infinity, but rather guidance from the source, peer heuristics, and input from the integrator. technologies like systemd + augeas + git/DVCS + DBUS + ... + ... provide 99% of the tools/foundational ideas, they just need to be combined in a intelligent way and applied to the problem of state management.
Great question. I'd also be interested in what other ideas there are about positive and realistic improvement can be made to the current system.
create DBUS bindings to pacman core. impl the CLI/[GUI] interfaces on top of common dynamic runtimes like python using those bindings. leverage a revisioning system for package files instead of tarballs, even if only locally. store metadata in a non-relational engine like couchdb (peer replicaition), or at least something like sqlite, for sane access. C Anthony
On 01/13/2011 12:12 PM, C Anthony Risinger wrote:
leverage a revisioning system for package files instead of tarballs, even if only locally. store metadata in a non-relational engine like couchdb (peer replicaition), or at least something like sqlite, for sane access.
A relational engine is actually really helpful for packages. A while ago I tried writing a package manager like pacman but using sqlite, and it's MUCH faster and still easy to use. The huge pauses every time you need metadata are incredibly annoying, and they completely disappear when you store things in a real database. The problem is that it has to be used by the official package manager, because having package data stored in two formats causes issues (because any time you use pacman, the other database doesn't know what changed). Revisioning package files is also interesting; I don't see the point of doing it locally though. Once you have the package, installing it is fast. Checking if files are the same first seems like a waste of effort. There already is a mechanism for creating those .pacnew files (and I think auto-merging those into the existing file would mess with the "knowing what your system is doing" part of Arch). Using deltas for packages would be helpful though, especially in the case of huge packages with minor changes. The rest of your changes sound like things that would make packaging harder, and you should know that some (most?) of us like Arch's packages because they're easy to make. It make seem overly simple, but that's exactly what I want out of a package: Give a name, version number, source, and dependencies, then the commands I'd use to build it, and I'm done. If Arch every became like Debian with it's fancy, huge-time-sink packaging, I'd find a different distro.
On 16/01/11 07:34, Brendan Long wrote:
On 01/13/2011 12:12 PM, C Anthony Risinger wrote:
leverage a revisioning system for package files instead of tarballs, even if only locally. store metadata in a non-relational engine like couchdb (peer replicaition), or at least something like sqlite, for sane access.
A relational engine is actually really helpful for packages. A while ago I tried writing a package manager like pacman but using sqlite, and it's MUCH faster and still easy to use. The huge pauses every time you need metadata are incredibly annoying, and they completely disappear when you store things in a real database. The problem is that it has to be used by the official package manager, because having package data stored in two formats causes issues (because any time you use pacman, the other database doesn't know what changed).
It is not so much having the data in a real database, but not having it spread over hundreds of small files. This has been largely fixed in the developmental branch of pacman, which is a lot faster. It could probably be improved further, but the complaints to patches ratio is really poor.
Revisioning package files is also interesting; I don't see the point of doing it locally though. Once you have the package, installing it is fast. Checking if files are the same first seems like a waste of effort. There already is a mechanism for creating those .pacnew files (and I think auto-merging those into the existing file would mess with the "knowing what your system is doing" part of Arch). Using deltas for packages would be helpful though, especially in the case of huge packages with minor changes.
Using binary deltas has been available in pacman for ages. Arch just does not use them... Allan
On 01/15/2011 03:14 PM, Allan McRae wrote:
It is not so much having the data in a real database, but not having it spread over hundreds of small files. This has been largely fixed in the developmental branch of pacman, which is a lot faster. It could probably be improved further, but the complaints to patches ratio is really poor.
I would've written a patch for pacman, but these threads[1][2] lead me to believe that the Arch devs weren't interested. The "solution" given was to use a different filesystem (which sounds more like a temporary work-around to me). [1] https://bbs.archlinux.org/viewtopic.php?id=45077 [2] http://mailman.archlinux.org/pipermail/pacman-dev/2008-January/005521.html
On 16/01/11 09:56, Brendan Long wrote:
On 01/15/2011 03:14 PM, Allan McRae wrote:
It is not so much having the data in a real database, but not having it spread over hundreds of small files. This has been largely fixed in the developmental branch of pacman, which is a lot faster. It could probably be improved further, but the complaints to patches ratio is really poor.
I would've written a patch for pacman, but these threads[1][2] lead me to believe that the Arch devs weren't interested. The "solution" given was to use a different filesystem (which sounds more like a temporary work-around to me).
[1] https://bbs.archlinux.org/viewtopic.php?id=45077 [2] http://mailman.archlinux.org/pipermail/pacman-dev/2008-January/005521.html
As I said, it is not having the data in a real database format that was needed. It was reducing the numbers of files that pacman had to read. Implementing a tar based backend was well known as an acceptable option to achieve this. In fact the bug tracker task for that was opened by the lead pacman developer so it was very likely to be accepted once coded... There has been low interest in a real database solution due to potential issues recovering from corrupt databases and with the additional dependencies. Also no complete database solution was ever submitted (only very incomplete proof-of-concepts afaik) and no-one had shown that a database solution would be markedly faster than the tar based one where the issues of many small files are removed. Allan
Le dimanche 16 à 2:01, Allan McRae a écrit :
There has been low interest in a real database solution due to potential issues recovering from corrupt databases and with the additional dependencies.
A question from a random user (one that won't submit more patches than the average user): Why would a database be more subject to corruption than a tar file, and harder to recover? -- Frédéric Perrin -- http://tar-jx.bz
On Sun, 2011-01-16 at 12:22 +0100, Frédéric Perrin wrote:
Le dimanche 16 à 2:01, Allan McRae a écrit :
There has been low interest in a real database solution due to potential issues recovering from corrupt databases and with the additional dependencies.
A question from a random user (one that won't submit more patches than the average user):
Why would a database be more subject to corruption than a tar file, and harder to recover?
An answer from an average user: I believe in tar files corruption would only affect the files stored at that 'place' in the tar files. So for example if there's a failing HD and 20 sections are affected, 'only' that information is lost. For a database which is necessarily more complicated, the whole database may be a write-off. Of course, if there's a failing HD you've got bigger problems than just pacman's db =)
Le dimanche 16 janvier 2011 à 20:52 +0800, Ng Oon-Ee a écrit :
For a database which is necessarily more complicated, the whole database may be a write-off.
Database are more complicated and also more sophisticated. With mechanism to prevent losing data if corruption of file. So I would say, there are at the same level if not better than a tar file. You failed to mention than tar file are always compressed so this adds a layer of failure. I just picked up /var/lib/pacman/extra.tar.gz and changes a few bytes here and there with hexedit and I can only retrieve 15 directory starting with 'a'. so that's pretty bad, I would say.
You failed to mention than tar file are always compressed so this adds a layer of failure. tar files are generally used in compression, but they do have other
On 01/16/2011 08:16 AM, solsTiCe d'Hiver wrote: purposes. In this case the tar file is being used for the sole purpose of grouping files. Pacman's bad performance in some cases is because it has to read thousands of tiny text files. Putting those tiny text files in an uncompressed tar greatly improves performance just by reducing the number of random reads.
Le dimanche 16 à 13:52, Ng Oon-Ee a écrit :
On Sun, 2011-01-16 at 12:22 +0100, Frédéric Perrin wrote:
Why would a database be more subject to corruption than a tar file, and harder to recover?
An answer from an average user: I believe in tar files corruption would only affect the files stored at that 'place' in the tar files. So for example if there's a failing HD and 20 sections are affected, 'only' that information is lost.
tar files are stored with a header (a 512-bytes block) that describes the file name, the size of the file, etc., then (filesize / 512) blocks of data, then 2 blocks of \0, and repeat for the next file. If one file header is corrupted and the file size can't be read or is wrong, everything after can't be read. Now, in our case, two blocks of \0 are unlikely in a text file, so this can be used to detect this case. Hum, maybe you're right, but still tar files seem rather bad prepared to deal with corruption. -- Frédéric Perrin -- http://tar-jx.bz
On Sat, Jan 15, 2011 at 3:34 PM, Brendan Long <korin43@gmail.com> wrote:
On 01/13/2011 12:12 PM, C Anthony Risinger wrote:
leverage a revisioning system for package files instead of tarballs, even if only locally. store metadata in a non-relational engine like couchdb (peer replicaition), or at least something like sqlite, for sane access.
A relational engine is actually really helpful for packages. A while ago I tried writing a package manager like pacman but using sqlite, and it's MUCH faster and still easy to use. The huge pauses every time you need metadata are incredibly annoying, and they completely disappear when you store things in a real database. The problem is that it has to be used by the official package manager, because having package data stored in two formats causes issues (because any time you use pacman, the other database doesn't know what changed).
yeah it would help with speed, but that's not really my focus; i seek greater intelligence. i find the "package manager" to be an interesting _core_ commonality between all Linuxes, with the potential to really open up and become an incredible tool for developing/sharing/swapping state with other users in a secure, ad-hoc, distributed, and mostly distro-agnostic manner. think "social network", at the developer/distribution/packager/user level, but where the focus is on creation and knowledge expansion vs. spamming your friends all day and playing pointless games. a lofty goal, but nothing's unreachable when everything's already right next to you.
Revisioning package files is also interesting; I don't see the point of doing it locally though. Once you have the package, installing it is fast. Checking if files are the same first seems like a waste of effort. There already is a mechanism for creating those .pacnew files (and I think auto-merging those into the existing file would mess with the "knowing what your system is doing" part of Arch). Using deltas for packages would be helpful though, especially in the case of huge packages with minor changes.
again speed is not relevant (though the tar backend thing from Allan's msg sounds kinda cool for this). awhile back i had a small script that would update and "eat" my pacman cache nightly into a packed git repository. with this i could export an immediately installable package or list of packages, from any day. so even though pacman itself was oblivious, having it local was still useful. nowadays i just use my btrfs hooks and rsync scripts to snapshot my systems at various points of interest (ie. before pacman upgrades, on reboot, nightly, etc) to ensure stability. i see no problem with "auto merging" if it's done in a controlled, consistent, and reproducible manner. look into redhat's `augeas` -- it loads any defined native config format into a tree than can be merged/manipulated/inspected before being written back out in the native format. they'll use this to provide clean control over VM host networks from virt-manager, on any distro with definitions. i'd trust that over sed any day; these sorts of config "suggestions" could be made visible before commit, similar to gentoo. ultimately, package managers should let the user decide their own level of involvement. for instance, all i want to do is verify correctness... i really don't care to manually fuxxor around doing what my system could handle on it's own. i'm a jelly filled human doughnut; my error rate is rather high.
The rest of your changes sound like things that would make packaging harder, and you should know that some (most?) of us like Arch's packages because they're easy to make. It make seem overly simple, but that's exactly what I want out of a package: Give a name, version number, source, and dependencies, then the commands I'd use to build it, and I'm done. If Arch every became like Debian with it's fancy, huge-time-sink packaging, I'd find a different distro.
yeah i've been around long enough to get the general vibe... but really... do you actually _enjoy_ making packages? do you like it when things break (even if not often, i'm not bashing arch developers or anyone else here) because of small version mismatches/typos/etc. due to the constant requirement for human interaction every step of the way? do you appreciate the system requiring an unknown amount of your (limited) time each day you decide to update? don't you ever wish you could just say "hey computer 'ol pal, aggressively follow upstream source for package X and merge remote user Y's with the local configuration, unless either requires changes to package Z -- then ask me first, cuz i run the show here"? i've written several packages and maintain numerous arch based hosts/VMs servicing my network, with other distros preceding that; maintaining packages is 100% annoying. it's an _absolute_ waste of time from a productivity standpoint; you're neither improving the source nor utilizing it... your just trying your best to make it play nice with itself and others. we normally use computers to perform these sorts of tedious tasks as much as possible, but package managers have barely changed since they we first conceived.... kind of like "PID 1" (until recently! oh how ye welcometh new thought like upstart and _especially_ systemd)... we grant these applications the most awesomest superpowers then give it the IQ of 17 or so. ehm... sense? i'm simply calling for more vision on this front. i see packages as vectors that collectively determine your system's "plane"; they should be guided as needed, not break down every time the upstream/peerstream changes. bash is great and all, but a solid architecture implementing next-gen system/package management, along with maybe enabling upstreams to embed "hints" as needed (with verification/fallback via analysis and local/peer heuristics), will be more reliable + maintainable than your PKGBUILD will ever be. possibly even simpler. look to system state managers like puppet for some inspiration behind these concepts. ------------------------------------------------------------------------------- please note this is not something i expect arch or pacman to realize; arch and the team behind it does a good job and i am pleased with the results and most decisions. i simply tend to rant now and again on the topic because i think it's a major, if not the primary, barrier of entry to Linux-based OSes at large. bleh... i'll create my own distro soon enough and add to the problem, just need to finish the state manager to manage it :-) progress be slow, but it's regular; got an 19mo child at large around here. well that was fun! i'm out. see you all next time. /end whatever you want to call the 4AM writings above this line/ C Anthony
On 01/16/2011 03:10 AM, C Anthony Risinger wrote:
yeah i've been around long enough to get the general vibe... but really... do you actually _enjoy_ making packages? do you like it when things break (even if not often, i'm not bashing arch developers or anyone else here) because of small version mismatches/typos/etc. due to the constant requirement for human interaction every step of the way? do you appreciate the system requiring an unknown amount of your (limited) time each day you decide to update? don't you ever wish you could just say "hey computer 'ol pal, aggressively follow upstream source for package X and merge remote user Y's with the local configuration, unless either requires changes to package Z -- then ask me first, cuz i run the show here"?
I actually hate making packages, which is why I like the Arch system. I like how if you know how to install a program from source, you know how to make an Arch package. In general, if I need to make a package, I copy a random PKGBUILD from abs, change the top couple lines, and then set the build section to: ./configure --some-options make install It's not fancy, but it has two advantages: * It's fast to write * It installs the program in the way that upstream designed it to be Also, no matter how good a program you can write to automatically follow upstream, I don't always trust upstream ;) It's nice to have the Arch devs make sure something works for me..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/16/2011 12:43 PM, Brendan Long wrote: {snip}
In general, if I need to make a package, I copy a random PKGBUILD from abs, change the top couple lines, and then set the build section to:
Now sure why you would copy a existing (possibly O.O.D.) PKGBUILD when you have options like newpkg OR just copy /usr/share/pacman/PKGBUILD{-*}.proto that fits your needs. Just my whinny, unasked for observation/comments. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNM2o7AAoJEFQ8wHCGr8hkLQoH/R1HFFNXnajZwTivQGrY4Njr /aq6STTvg9jFWaLcLGZ55gy2CtGB/ySw9ky5D/SGrgBVZ80MJpOcZS9SKZr/RHYG sLJGDYYO2nyR5cYwou1926w+Px1H94wipIOLTK1TpjcqUKANh5Zgj1+Om59I62pl u5vGo+7vUAPCIg7jeO4x4ShEA2msUaBgYXhxZBvuNbRxRAJuRMi5fWISmAbWYHOl Capsvzf6ZkksWUuxDXFFqAt3oeulkeCqgKYPiw/wY8h5eWYu2HuwmYn5Z11KHIVz tn4fRyxwdPy+aYh7rq0ZYvyTUWyTIzWbp7K10yWJVhBi8pvQ9DFeGSGLEghuHrs= =9gkC -----END PGP SIGNATURE-----
On Sun, 2011-01-16 at 14:59 -0700, jwbirdsong wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/16/2011 12:43 PM, Brendan Long wrote: {snip}
In general, if I need to make a package, I copy a random PKGBUILD from abs, change the top couple lines, and then set the build section to:
Now sure why you would copy a existing (possibly O.O.D.) PKGBUILD when you have options like newpkg OR just copy /usr/share/pacman/PKGBUILD{-*}.proto that fits your needs. Just my whinny, unasked for observation/comments.
More OT - I've actually never used newpkg or the .proto files =p
On Sun, Jan 16, 2011 at 1:43 PM, Brendan Long <korin43@gmail.com> wrote:
On 01/16/2011 03:10 AM, C Anthony Risinger wrote:
... do you actually _enjoy_ making packages? do you like it when things break (even if not often, i'm not bashing arch developers or anyone else here) because of small version mismatches/typos/etc. due to the constant requirement for human interaction every step of the way? do you appreciate the system requiring an unknown amount of your (limited) time each day you decide to update? don't you ever wish you could just say "hey computer 'ol pal, aggressively follow upstream source for package X and merge remote user Y's with the local configuration, unless either requires changes to package Z -- then ask me first, cuz i run the show here"?
I actually hate making packages, which is why I like the Arch system. I like how if you know how to install a program from source, you know how to make an Arch package.
and i am with you 100% here. it's the total no-nonsense approach to package management, and one of the things that drew me here. i can't remember where i read it but someone once said "Arch is the swiss army knife of distributions" and that stuck with me; it's a great base to branch from for many learning and practical solutions. i'm only trying to make that knife a thunderous foundry ... which brings me to the other reason i'm here and the solution to said problem: AUR, peers, and networks of trust.
Also, no matter how good a program you can write to automatically follow upstream, I don't always trust upstream ;) It's nice to have the Arch devs make sure something works for me..
and therein lies the problem. you rely on the Arch developers (who are few, no matter how individually talented) to make decisions for you based on other groups of developers (upstream, peerstream/other distros, etc) which in turn make decisions based on more developers (dependencies, etc) ... ... a "package manager" is the only visible part of this process; the last link. what if you don't agree with Arch developers? and, what of those that don't agree with you? or neither of you? every other distro and it's gobs of users are doing the nearly the same things, in parallel, isolated from us, share nothing, with high degrees of overlap. so ... let's dump all of us into the same pool, shoot for a 1-to-1 app-to-package relationship, with adaptive dependency structures depending on your personal trust network. let's make it a nice flexible platform. let's make it really easy for the headwaters to participate, and even easier for the downstreams and confluences ... ... implementation and ubiquity means you lose your bash PKGBUILDs, but you gain the mass/force of entire {Linux,GNU, ... }-based OS ecosystems. let's add some "depth" to the words we use ;-) C Anthony
participants (7)
-
Allan McRae
-
Brendan Long
-
C Anthony Risinger
-
Frédéric Perrin
-
jwbirdsong
-
Ng Oon-Ee
-
solsTiCe d'Hiver