Re: [aur-general] New AUR Metadata Archives
On 11/11/21 06:22, aur-general-request@lists.archlinux.org wrote:
Hello AUR users,
In addition to pre-existing archives, we've introduced two new archives that can be used instead of bulk queries against the RPC.
Pre-existing archives:
- packages.gz - Listing of all packages separated by line break. - pkgbase.gz - Listing of all package bases separated by line break. - users.gz - Listening of all users separated by line break.
Metadata archives:
- packages-meta-v1.json.gz - A complete `type=search` formatted JSON package archive. - packages-meta-ext-v1.json.gz - A complete `type=multiinfo` formatted JSON package archive.
All archives support Last-Modified and ETag. Each archive is updated on a rough ~5 minute interval. For any bulk users of the RPC, we ask that you consider these archives as a solution to repeated searches or bulk multiinfo requests.
All archives are available for download athttps://aur.archlinux.org/archive-name.gz
Using these archives will drastically help the AUR with the amount of traffic required for API clients. Particularly with clients who are able to query by themselves en masse.
We thank you all for contributing to the world of AUR and helping those who can use your maintained software as a result.
Regards, Kevin
-- Kevin Morris Software Developer
That's awesome kevin, I got into a problem while downloading the 'packages-meta-ext-v1.json.gz' with wget. I somehow corrupted it ... so, is there any hash I can check against ?? like '.sig' file for '.iso' files but, as a hash is a text file maybe ??... sorry if I'm being dumb. yours, zoorat.
Hi zoorat, The files are downloadable in gzip format; when you receive the file, you'll need to uncompress it. Example: $ curl --output packages-meta-ext-v1.json.gz \ 'https://aur.archlinux.org/packages-meta-ext-v1.json.gz' # zcat is cat for gz files; it uncompresses and cats the content. $ zcat packages-meta-ext-v1.json.gz > packages-meta-ext-v1.json You do bring up a nice point about the signature; perhaps we should provide sigs for all of these archives. Regards, Kevin On Thu, Nov 11, 2021 at 06:34:31AM +0000, zoorat via aur-general wrote:
On 11/11/21 06:22, aur-general-request@lists.archlinux.org wrote:
Hello AUR users,
In addition to pre-existing archives, we've introduced two new archives that can be used instead of bulk queries against the RPC.
Pre-existing archives:
- packages.gz - Listing of all packages separated by line break. - pkgbase.gz - Listing of all package bases separated by line break. - users.gz - Listening of all users separated by line break.
Metadata archives:
- packages-meta-v1.json.gz - A complete `type=search` formatted JSON package archive. - packages-meta-ext-v1.json.gz - A complete `type=multiinfo` formatted JSON package archive.
All archives support Last-Modified and ETag. Each archive is updated on a rough ~5 minute interval. For any bulk users of the RPC, we ask that you consider these archives as a solution to repeated searches or bulk multiinfo requests.
All archives are available for download athttps://aur.archlinux.org/archive-name.gz
Using these archives will drastically help the AUR with the amount of traffic required for API clients. Particularly with clients who are able to query by themselves en masse.
We thank you all for contributing to the world of AUR and helping those who can use your maintained software as a result.
Regards, Kevin
-- Kevin Morris Software Developer
That's awesome kevin, I got into a problem while downloading the 'packages-meta-ext-v1.json.gz' with wget.
I somehow corrupted it ...
so, is there any hash I can check against ?? like '.sig' file for '.iso' files but, as a hash is a text file maybe ??... sorry if I'm being dumb.
yours, zoorat.
-- Kevin Morris Software Developer Identities: - kevr @ Libera
* Kevin Morris via aur-general (aur-general@lists.archlinux.org) wrote:
The files are downloadable in gzip format; when you receive the file, you'll need to uncompress it.
There might in fact be an issue here, as I've ran into it. The file might be unexpectedly returned uncompressed by some clients, for instance python requests: % python3 -c 'import requests; print(requests.get("https://aur.archlinux.org/packages-meta-ext-v1.json.gz").content[:50])' b'[\n{"ID":208446,"Name":"bubblemon","PackageBaseID":' If I'm not mistaken it's caused by excessive `content-encoding: gzip` header: % curl -I https://aur.archlinux.org/packages-meta-ext-v1.json.gz HTTP/2 200 server: nginx date: Thu, 11 Nov 2021 21:05:03 GMT content-type: application/gzip content-length: 7410251 last-modified: Thu, 11 Nov 2021 21:05:03 GMT etag: "618d857f-71124b" expires: Thu, 11 Nov 2021 21:10:03 GMT cache-control: max-age=300 content-encoding: gzip accept-ranges: bytes which basically says that the transferred gzip file is additionally encoded with gzip (e.g. doubly compressed, which is probably not the case), so client uncompresses it upon retrieval, in fact uncompressing the original gzip and returning plain json. -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amdmi3@amdmi3.ru ..: https://github.com/AMDmi3 https://amdmi3.ru
As per the previous archive spec in the project (PHP), we've intended to persist the `Content-Type: text/plain` + `Content-Encoding: gzip` headers with our new archives, as should be supported with all archives found in the AUR. This wasn't being handled correctly by aur.al's nginx frontend and a patch has been merged in which resolves this issue you were seeing (application/gzip encoded with gzip). That being said, we do want to handle mimes better for these archives, especially since we've got these new .json.gz archives; supplying both encoded gzip transports as well as raw application/gzip transports. It won't happen immediately, but here's an issue I've just put up in regards to this: https://gitlab.archlinux.org/archlinux/aurweb/-/issues/175 Thanks for the heads up; didn't realize this was mismatched on live. Regards, Kevin On Fri, Nov 12, 2021 at 12:16:21AM +0300, Dmitry Marakasov wrote:
* Kevin Morris via aur-general (aur-general@lists.archlinux.org) wrote:
The files are downloadable in gzip format; when you receive the file, you'll need to uncompress it.
There might in fact be an issue here, as I've ran into it. The file might be unexpectedly returned uncompressed by some clients, for instance python requests:
% python3 -c 'import requests; print(requests.get("https://aur.archlinux.org/packages-meta-ext-v1.json.gz").content[:50])' b'[\n{"ID":208446,"Name":"bubblemon","PackageBaseID":'
If I'm not mistaken it's caused by excessive `content-encoding: gzip` header:
% curl -I https://aur.archlinux.org/packages-meta-ext-v1.json.gz HTTP/2 200 server: nginx date: Thu, 11 Nov 2021 21:05:03 GMT content-type: application/gzip content-length: 7410251 last-modified: Thu, 11 Nov 2021 21:05:03 GMT etag: "618d857f-71124b" expires: Thu, 11 Nov 2021 21:10:03 GMT cache-control: max-age=300 content-encoding: gzip accept-ranges: bytes
which basically says that the transferred gzip file is additionally encoded with gzip (e.g. doubly compressed, which is probably not the case), so client uncompresses it upon retrieval, in fact uncompressing the original gzip and returning plain json.
-- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amdmi3@amdmi3.ru ..: https://github.com/AMDmi3 https://amdmi3.ru
-- Kevin Morris Software Developer Identities: - kevr @ Libera
hi kevin, thanks for replying but look at this output... ``` ❯ zcat packages-meta-ext-v1.json.broken.gz > packages-meta-ext-v1.broken.json gzip: packages-meta-ext-v1.json.broken.gz: invalid compressed data--crc error gzip: packages-meta-ext-v1.json.broken.gz: invalid compressed data--length error ❯ jq . packages-meta-ext-v1.broken.json parse error: Invalid numeric literal at line 56106, column 838 ❯ sed -n -e '56106,56106p' packages-meta-ext-v1.broken.json {"ID":702872,"Name":"pulseaudio-bluedio","PackageBaseID":149681,"PackageBase":"pulseaudio-bluedio","Version":"13.0-3","Description":"A featureful, general-purpose sound server","URL":"https://www.freedesktop.org/wiki/Software/PulseAudio/","NumVotes":0,"Popularity":0.0,"OutOfDate":null,"Maintainer":"sdrik","FirstSubmitted":1583056828,"LastModified":1583066523,"URLPath":"/cgit/aur.git/snapshot/pulseaudio-bluedio.tar.gz","License":["GPL"],"Keywords":[],"Depends":["libpulse-bluedio=13.0-3","rtkit","libltdl","speexdsp","tdb","orc","libsoxr","webrtc-audio-processing"],"MakeDepends":["libasyncns","libcap","attr","libxtst","libsm","libsndfile","rtkit","libsoxr","speexdsp","tdb","systemd","dbus","avahi","bluez","bluez-libs","jack2","sbc","lirc","openssl","fftw","orc","gtk3","webrtc-audio-processing","check","git","meson","xmltomp,"git","i-xbrtrdmmm/lets-cli/lets"kdh7rtrdmmm/lGPL"],"Kprof2-git"],"Prcheck","gitxen<=9],"Decheck","gitends"<=1ackds":["goldendictMakeDepends":,"License":["ools"]}, ❯ sed -n -e '56106,56106p' packages-meta-ext-v1.broken.json | jq parse error: Invalid numeric literal at line 1, column 838 ``` before sending that first mail, I tried to uncompress it with '7z' and 'jq' gave me the same error. but, '7z' worked without any problem... I noticed something wrong when I tried to prettyprint the original json file with 'jq'. yours, zoorat. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, November 11th, 2021 at 23:27, Kevin Morris <kevr@0cost.org> wrote:
Hi zoorat,
The files are downloadable in gzip format; when you receive
the file, you'll need to uncompress it.
Example:
$ curl --output packages-meta-ext-v1.json.gz \
'https://aur.archlinux.org/packages-meta-ext-v1.json.gz'
# zcat is cat for gz files; it uncompresses and cats the content.
$ zcat packages-meta-ext-v1.json.gz > packages-meta-ext-v1.json
You do bring up a nice point about the signature; perhaps we should
provide sigs for all of these archives.
Regards,
Kevin
On Thu, Nov 11, 2021 at 06:34:31AM +0000, zoorat via aur-general wrote:
On 11/11/21 06:22, aur-general-request@lists.archlinux.org wrote:
Hello AUR users,
In addition to pre-existing archives, we've introduced two new
archives that can be used instead of bulk queries against the RPC.
Pre-existing archives:
- packages.gz - Listing of all packages separated by line break. - pkgbase.gz - Listing of all package bases separated by line break. - users.gz - Listening of all users separated by line break.
Metadata archives:
- packages-meta-v1.json.gz - A complete `type=search` formatted JSON package archive. - packages-meta-ext-v1.json.gz - A complete `type=multiinfo` formatted JSON package archive.
All archives support Last-Modified and ETag. Each archive is updated
on a rough ~5 minute interval. For any bulk users of the RPC, we ask
that you consider these archives as a solution to repeated searches
or bulk multiinfo requests.
All archives are available for download athttps://aur.archlinux.org/archive-name.gz
Using these archives will drastically help the AUR with the amount
of traffic required for API clients. Particularly with clients who
are able to query by themselves en masse.
We thank you all for contributing to the world of AUR and helping
those who can use your maintained software as a result.
Regards,
Kevin
--
Kevin Morris
Software Developer
That's awesome kevin,
I got into a problem while downloading the
'packages-meta-ext-v1.json.gz' with wget.
I somehow corrupted it ...
so, is there any hash I can check against ??
like '.sig' file for '.iso' files but, as a hash is a text file maybe ??...
sorry if I'm being dumb.
yours,
zoorat.
Kevin Morris
Software Developer
Identities:
- kevr @ Libera
participants (3)
-
Dmitry Marakasov
-
Kevin Morris
-
zoorat