On Mon, 13 Apr 2020 03:23:35 -0400 Eli Schwartz <eschwartz@archlinux.org> wrote:
How is this content id generated? Is it deterministically generated based on the file contents, so that repo-add can generate it the same way it generates the checksum hash, and do so *offline*? Or can different people end up with different content ids when uploading the same file?
The Content-ID is deterministic. There are default settings which sets 256 KByte chunks for the files, which gets hashed with SHA256. Those hashes get stored in a JSON file and the JSON file gets than hashed again, to get the final hash for the file. The hashes are just extended by some meta information, like which hash algorithm was used and get coded as base32 to make them shorter than a hex representation.
As far as pacman is concerned, *all* urls will be directly downloaded with curl. And curl in turn supports many other protocols, though not AFAICT ipfs. I'd probably recommend running a local http->ipfs proxy for this. I guess you could also use a custom XferCommand which only supports ipfs, once you switch your mirrorlist to only use an ipfs server?
IPFS actually do run a HTTP->ipfs proxy by default. So if the CID is known for a file in the pacman database, we can access it via curl. So accessing the files via IPFS would just require a IP-Address/Port definition in the pacman settings, which pacman uses to access the CIDs via the locally running IPFS. An URL for the local Web-Gateway would look like: http://localhost:8080/ipfs/$CID IPFS allows also to publicize changing CIDs via a public/private key system. So the databases could be published signed on IPFS. Pacman would just need to have an public key for each database file. An URL to access this 'IPFS namesystem' would look like this and never change: http://localhost:8080/ipns/$hash --- The advantage of using IPFS is, that each computer accessing the updates via IPFS will also share the data locally via mDNS without any need for configurations. For larger networks some centralized update servers could be installed and the administrator would just have to configure the DNS of those servers into pacman as IPFS gateways. Since IPFS does cache the downloaded files, this accelerates the update time significantly, since the updates just have to be fetched once for each network. The distribution of the update on mirror servers requires to setup a cluster, which is just a daemon which makes sure, that certain content is moved to the right IPFS peers. Setting up a mirror just requires the IPFS daemon and running a 'cluster follower' which takes care of the full setup by just fetching a small config file. --- Since the protocol just addresses the content, there's no need to care about which mirror is online or holds the latest data anymore - so a longer mirror maintenance is no issue at all. When it's turned on again, it will catch up with the cluster and delete/add all content which has been changed. Hope this makes the concept a bit clearer. :) Best regards Ruben