[aur-general] Auto-generated Github tarballs format change (Was: TU Application: Daniel M. Capella)

Levente Polyak anthraxx at archlinux.org
Thu Nov 15 10:23:45 UTC 2018


On 11/15/18 10:52 AM, Baptiste Jonglez wrote:
> On 15-11-18, Eli Schwartz via aur-general wrote:
>> On 11/14/18 11:50 PM, Daniel M. Capella via aur-general wrote:
>>> Quoting Levente Polyak via aur-general (2018-11-14 17:00:38)
>>>> - tests are awesome <3 run them whenever possible! more is better!
>>>>   pulling sources from github is favorable when you get free tests
>>>>   and sometimes manpages/docs
>>>
>>> Will work with the upstreams to distribute these. I prefer to use published
>>> offerings as they are what the authors intend to be used. GitHub autogenerated
>>> tarballs are also subject to change:
>>> https://marc.info/?l=openbsd-ports&m=151973450514279&w=2
>>
>> I've seen the occasional *claim* that this happens, but I've yet to see
>> any actual case where this happens and it isn't because of upstream
>> force-pushing a tag.
> 
> See https://bugs.archlinux.org/task/60382 for an example.
> 
> I still had the old archive around so I spent some time comparing it with
> the new one:
> 
> - I compared the checksum of each individual file in the archives, and
>   they were all identical
> 
> - I compared the raw tar files after decompressing, and there were just a
>   few bytes that were moved around
> 
> This really suggests a slight format change in the way the tarball was
> generated (could be file ordering).
> 
> If you want to double check, here they are:
> 
> - old archive from May 2017: https://files.polyno.me/arch/kashmir-20150805-20170525.tar.gz
> 
> - new archive: https://files.polyno.me/arch/kashmir-20150805.tar.gz
> 
> Baptiste
> 

GitHub invalidating caches is not the problem here, they should be
allowed to do it whenever they wish. The root of the issue is
unreproduciblility as already pointed out here.

The tarballs are stable per se if no weird magic applies via git export
rules like dates being exported into files or no force pushes are done
to the tree, they use git archive via tar which itself is reproducible.
In fact, detatched pre-generated tarballs sometimes changes as well so
blame upstream for any such happening (at least nowadays :P).

Anyway, the differences we see here are just our digital legacy where
the format was not reproducible yet.

The example tarball indeed only contains metadata changes related to
ordering of filenames inside the structure. This is definitively stable
today.


PS: You can simply use diffoscope for such analysis, it has been
invented for this very purpose and is not only content but also
meta-data aware.

cheers,
Levente

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.archlinux.org/pipermail/aur-general/attachments/20181115/c5fa4aae/attachment-0001.asc>


More information about the aur-general mailing list