[pacman-dev] Fw: Pacman support for IPFS
This message was originally send to Allan, who recommended to move the discussion to the pacman-dev mailing list: I tested a bit around and setup a mirror (via a cluster-configuration) in IPFS for the packages - but it's kind of ugly. I was wondering if there's interest from you or the team to implement native IPFS support in pacman? The idea is, to extend the current database definition, by an additional field, which holds the Content-ID necessary to fetch a package from IPFS. If there's an API-endpoint configured in the pacman config, pacman will query the IPFS API to resolve a IPNS-public key[0]. The public key will lead to a folder listing, which holds the latest databases, like this path[1] does. The Content-ID of each database could be saved as symbolic link in the db-patch. If one of the Content-IDs changed, the new version is fetched. With the Content-IDs the updated packages can be fetched from the network. So if there's no API-endpoint configured, the feature wont be active. If it's been activated, pacman can fetch packages much quicker, since IPFS can fetch from multiple sources, like BitTorrent and will also connect to other peers on the local network via mDNS. That's especially interesting if you run a large amount of computers and currently need a rsync mirror server or a cache in the network or you're a home user with extremely limited bandwidth. --- To maintain a reliably redundancy of packages there's a need for a cluster installation. A IPFS cluster[2] is basically just a small daemon which runs and holds a database which Content-IDs should be maintained on which cluster peer. So if a package gets added to the database, the Content-ID would be added to the dataset of the cluster and the Content-ID of the old package get's removed. And the new version of the folder with the databases would be replaced as well. The cluster definition isn't static, everybody can join the cluster and can hold the cluster data. There are some examples of clusters which are open for copies here[3]. You can set on each element how many copies the cluster should hold, at maximum and minimum. The cluster will start to replicate the maximum amount and if the redundancy drops below minimum additional copies will be made. You can also specify -1 which means every cluster member will hold a copy - this makes sense for example for the databases. --- How to add new package versions: A package maintainer would do the normal sha256/md5/signing and additionally add the file to a IPFS client (for example on a server) or his computer. The Content-ID generated while adding the file to IPFS. The content ID will be added to the database definition. The new database file is then added to to IPFS as well, and the folder of the current database listing (in IPFS) is altered to reference the new Content-ID of the database file. This will alter the Content-ID of the folder, since the references has changed. The new Content-ID of the folder and the package then get added to the cluster, which will fetch both new files from the computer of the maintainer or the server where they was added. Now the IPNS-Record can be switched to the new Content-ID of the folder, any everyone can fetch the new database. [0] https://docs-beta.ipfs.io/concepts/ipns/ [1] https://ipfs.io/ipns/pkg.pacman.store/arch/x86_64/default/db [2] https://cluster.ipfs.io/ [3] https://collab.ipfscluster.io/ Best regards, Ruben
On 4/5/20 1:19 PM, @RubenKelevra wrote:
I was wondering if there's interest from you or the team to implement native IPFS support in pacman?
The idea is, to extend the current database definition, by an additional field, which holds the Content-ID necessary to fetch a package from IPFS.
If the database gets extended by an additional field for every new network layer people come up with, where do we draw the line? This needs a solution that does not require the database format to be altered to suit protocol-specific metadata.
Best regards,
Ruben
-R
Hello Robin, On 2020-04-05 13:38:43 +0200 wrote Robin Broda <robin@broda.me>:
On 4/5/20 1:19 PM, @RubenKelevra wrote:
I was wondering if there's interest from you or the team to implement native IPFS support in pacman?
The idea is, to extend the current database definition, by an additional field, which holds the Content-ID necessary to fetch a package from IPFS.
If the database gets extended by an additional field for every new network layer people come up with, where do we draw the line?
This needs a solution that does not require the database format to be altered to suit protocol-specific metadata.
I understand the concerns. The Content-ID itself is build around the 'multihash' idea, where the string not only contains a checksum, but also which checksum algorithm was used - to support multiple checksum algorithms at the same time and also to be compatible with new checksum algorithms. The definition of the Content-ID[0]. Behind the Content-ID is a JSON file, which get's fetched from IPFS. This contains entries of the individual blocks of a file, with a size and a Content-ID for this block. So the data-structure is extremely simple and could be used with any protocol. [0] https://github.com/multiformats/cid Best regards, Ruben
On Sun, Apr 05, 2020 at 01:38:43PM +0200, Robin Broda wrote:
If the database gets extended by an additional field for every new network layer people come up with, where do we draw the line?
This needs a solution that does not require the database format to be altered to suit protocol-specific metadata.
This was the first point that stood out to me, too. Can IPFS IDs have some representation as a URI? I'm spitballing here, but I'd far rather see e.g. the existing %FILENAME% registry extended to support some format like ipfs://baBbysFirStIPFScOmMitId;foo=bar That still feels a little shoehorn-y to me, but more comfortable than buying a new shoe for each new storage protocol supported in future (magnet anyone? ;)). BR, David
On Tue, 7 Apr 2020 17:44:07 +1200 David Phillips <david@sighup.nz> wrote:
On Sun, Apr 05, 2020 at 01:38:43PM +0200, Robin Broda wrote:
If the database gets extended by an additional field for every new network layer people come up with, where do we draw the line?
This needs a solution that does not require the database format to be altered to suit protocol-specific metadata.
This was the first point that stood out to me, too.
Can IPFS IDs have some representation as a URI? I'm spitballing here, but I'd far rather see e.g. the existing %FILENAME% registry extended to support some format like ipfs://baBbysFirStIPFScOmMitId;foo=bar That still feels a little shoehorn-y to me, but more comfortable than buying a new shoe for each new storage protocol supported in future (magnet anyone? ;)).
Sorry, the mail slipped by - or I had answered sooner. Yes, that's possible. The url is indeed ipfs://$CID If you got the browser-plugin installed for IPFS, those links will be automatically converted to something browser can understand, like http://127.0.0.1:8080/ipfs/$CID which then requests the file from the local http-gateway of the ipfs-node. If you have Opera for Android (which got build in IPFS-Support) those ipfs://$CID links will natively work. Maybe we could define something like the DLAGENTS in PKGBUILDS, this would allow to extend the field in the future: ipfs::ipfs://$CID Best regards, Ruben
On 4/12/20 9:19 PM, @RubenKelevra wrote:
Sorry, the mail slipped by - or I had answered sooner.
Yes, that's possible. The url is indeed ipfs://$CID
If you got the browser-plugin installed for IPFS, those links will be automatically converted to something browser can understand, like http://127.0.0.1:8080/ipfs/$CID which then requests the file from the local http-gateway of the ipfs-node.
If you have Opera for Android (which got build in IPFS-Support) those ipfs://$CID links will natively work.
Maybe we could define something like the DLAGENTS in PKGBUILDS, this would allow to extend the field in the future:
ipfs::ipfs://$CID
As far as pacman is concerned, *all* urls will be directly downloaded with curl. And curl in turn supports many other protocols, though not AFAICT ipfs. I'd probably recommend running a local http->ipfs proxy for this. I guess you could also use a custom XferCommand which only supports ipfs, once you switch your mirrorlist to only use an ipfs server? How is this content id generated? Is it deterministically generated based on the file contents, so that repo-add can generate it the same way it generates the checksum hash, and do so *offline*? Or can different people end up with different content ids when uploading the same file? -- Eli Schwartz Bug Wrangler and Trusted User
On Mon, 13 Apr 2020 03:23:35 -0400 Eli Schwartz <eschwartz@archlinux.org> wrote:
How is this content id generated? Is it deterministically generated based on the file contents, so that repo-add can generate it the same way it generates the checksum hash, and do so *offline*? Or can different people end up with different content ids when uploading the same file?
The Content-ID is deterministic. There are default settings which sets 256 KByte chunks for the files, which gets hashed with SHA256. Those hashes get stored in a JSON file and the JSON file gets than hashed again, to get the final hash for the file. The hashes are just extended by some meta information, like which hash algorithm was used and get coded as base32 to make them shorter than a hex representation.
As far as pacman is concerned, *all* urls will be directly downloaded with curl. And curl in turn supports many other protocols, though not AFAICT ipfs. I'd probably recommend running a local http->ipfs proxy for this. I guess you could also use a custom XferCommand which only supports ipfs, once you switch your mirrorlist to only use an ipfs server?
IPFS actually do run a HTTP->ipfs proxy by default. So if the CID is known for a file in the pacman database, we can access it via curl. So accessing the files via IPFS would just require a IP-Address/Port definition in the pacman settings, which pacman uses to access the CIDs via the locally running IPFS. An URL for the local Web-Gateway would look like: http://localhost:8080/ipfs/$CID IPFS allows also to publicize changing CIDs via a public/private key system. So the databases could be published signed on IPFS. Pacman would just need to have an public key for each database file. An URL to access this 'IPFS namesystem' would look like this and never change: http://localhost:8080/ipns/$hash --- The advantage of using IPFS is, that each computer accessing the updates via IPFS will also share the data locally via mDNS without any need for configurations. For larger networks some centralized update servers could be installed and the administrator would just have to configure the DNS of those servers into pacman as IPFS gateways. Since IPFS does cache the downloaded files, this accelerates the update time significantly, since the updates just have to be fetched once for each network. The distribution of the update on mirror servers requires to setup a cluster, which is just a daemon which makes sure, that certain content is moved to the right IPFS peers. Setting up a mirror just requires the IPFS daemon and running a 'cluster follower' which takes care of the full setup by just fetching a small config file. --- Since the protocol just addresses the content, there's no need to care about which mirror is online or holds the latest data anymore - so a longer mirror maintenance is no issue at all. When it's turned on again, it will catch up with the cluster and delete/add all content which has been changed. Hope this makes the concept a bit clearer. :) Best regards Ruben
participants (4)
-
@RubenKelevra
-
David Phillips
-
Eli Schwartz
-
Robin Broda