[arch-mirrors] Mirror Stats Project - Using weblogs to gather useful information

Tyler Dence tyzoid.d at gmail.com
Tue Aug 22 11:23:07 UTC 2017


My best guess is that it's some monitoring software package. I'm seeing the
user agent of a lot of these requests as Python-urllib/3.6, which is
different than the user agent for pacman:

$ grep cryptsetup access.log |tail -n 5| cut -f4- -d' '
[22/Aug/2017:07:14:01 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:17:34 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:18:15 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:18:43 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:19:30 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"

Not shown is the IP of each request, but they are all unique.

vs

$ grep archlinux-keyring access.log |tail -n 5| cut -f4- -d' '
[21/Aug/2017:15:02:03 -0400] "GET
/core/os/x86_64/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677725 "-" "python-requests/2.18.1"
[21/Aug/2017:16:12:31 -0400] "GET
/core/os/x86_64/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677725 "-" "python-requests/2.18.1"
[21/Aug/2017:17:21:07 -0400] "GET
/core/os/x86_64/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677669 "-" "pacman/5.0.2 (Linux x86_64) libalpm/10.0.2"
[22/Aug/2017:03:48:54 -0400] "GET
/core/os/i686/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677669 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
[22/Aug/2017:03:53:22 -0400] "GET
/core/os/i686/archlinux-keyring-20170611-1-any.pkg.tar.xz.sig HTTP/1.1" 200
554 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"

That said, this is still something I'm very much interested in finding out
what the root cause is.

On Tue, Aug 22, 2017 at 12:33 AM, Miłosz Tyborowski <milosz at tyborek.pl>
wrote:

> The sole fact that cryptsetup has been downloaded over 12 times more often
> that linux package was is interesting, to say the least.
>
> 2017-08-22 4:53 GMT+02:00 Tyler Dence <tyzoid.d at gmail.com>:
>
>> I've made a post about this topic on the arch forums, but I thought I'd
>> bring it to the attention of the mailing list here.
>>
>> So I run a mirror over at https://arlm.tyzoid.com/, and I decided it
>> would be cool to see if I can't get some interesting information back from
>> the logs generated.
>>
>> Currently, I have a list of the most downloaded packages (compiled
>> nightly):
>>
>>    - https://stats.arlm.tyzoid.com/pkgstats.json
>>    - https://stats.arlm.tyzoid.com/pkgstats.txt
>>
>>
>> And a graph of network traffic:
>>
>>    - https://stats.arlm.tyzoid.com/
>>
>>
>> I'm wondering if there's any other information that might be interesting
>> to make available, or if anyone else here is interested in contributing
>> some of their own collected data for the stats project. If so, we should
>> get a project up on github sometime to have a unified method/web viewer.
>>
>> Some caveats that I impose for now, to protect privacy:
>>
>>    - IP addresses will be available, nor any IP Prefix.
>>    - Geographic data will be made available at no more than a weekly
>>    granularity (exception for summary data which spans multiple weeks)
>>    - Geographic data will not be made available with more granularity
>>    than state/province.
>>
>> Perhaps there should also be some discussion regarding what logs are
>> kept/deleted?
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.archlinux.org/pipermail/arch-mirrors/attachments/20170822/cfb36839/attachment.html>


More information about the arch-mirrors mailing list