[pacman-dev] Fw: Pacman support for IPFS

Tue Apr 14 07:35:41 UTC 2020

On Mon, 13 Apr 2020 03:23:35 -0400
Eli Schwartz <eschwartz at archlinux.org> wrote:

> How is this content id generated? Is it deterministically generated
> based on the file contents, so that repo-add can generate it the same
> way it generates the checksum hash, and do so *offline*? Or can
> different people end up with different content ids when uploading the
> same file?

The Content-ID is deterministic. There are default settings which sets
256 KByte chunks for the files, which gets hashed with SHA256.

Those hashes get stored in a JSON file and the JSON file gets than
hashed again, to get the final hash for the file.

The hashes are just extended by some meta information, like which
hash algorithm was used and get coded as base32 to make them shorter
than a hex representation.

> As far as pacman is concerned, *all* urls will be directly downloaded
> with curl. And curl in turn supports many other protocols, though not
> AFAICT ipfs. I'd probably recommend running a local http->ipfs proxy
> for this. I guess you could also use a custom XferCommand which only
> supports ipfs, once you switch your mirrorlist to only use an ipfs
> server?

IPFS actually do run a HTTP->ipfs proxy by default. So if the CID is
known for a file in the pacman database, we can access it via curl.

So accessing the files via IPFS would just require a IP-Address/Port
definition in the pacman settings, which pacman uses to access the CIDs
via the locally running IPFS.

An URL for the local Web-Gateway would look like:

http://localhost:8080/ipfs/$CID

IPFS allows also to publicize changing CIDs via a public/private key
system. So the databases could be published signed on IPFS.

Pacman would just need to have an public key for each database file.

An URL to access this 'IPFS namesystem' would look like this and never
change:

http://localhost:8080/ipns/$hash

---

The advantage of using IPFS is, that each computer accessing the
updates via IPFS will also share the data locally via mDNS without any
need for configurations. For larger networks some centralized update
servers could be installed and the administrator would just have to
configure the DNS of those servers into pacman as IPFS gateways.

Since IPFS does cache the downloaded files, this accelerates the
update time significantly, since the updates just have to be fetched
once for each network.

The distribution of the update on mirror servers requires to setup a
cluster, which is just a daemon which makes sure, that certain content
is moved to the right IPFS peers.

Setting up a mirror just requires the IPFS daemon and running a
'cluster follower' which takes care of the full setup by just fetching
a small config file.

---

Since the protocol just addresses the content, there's no need to care
about which mirror is online or holds the latest data anymore - so a
longer mirror maintenance is no issue at all. When it's turned on
again, it will catch up with the cluster and delete/add all content
which has been changed.

Hope this makes the concept a bit clearer. :)

Best regards

Ruben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 228 bytes
Desc: OpenPGP digital signature
URL: <https://lists.archlinux.org/pipermail/pacman-dev/attachments/20200414/4313273b/attachment-0001.sig>