On Tue, Aug 17, 2010 at 9:28 AM, Thomas Bächler <thomas@archlinux.org> wrote:
Am 17.08.2010 16:12, schrieb Dan McGee:
Hey guys,
A package went in so big today that it made reporead blow up on my local database due to the installed size being > 2GB: http://www.archlinux.org/packages/community/i686/sage-mathematics/ http://www.archlinux.org/packages/community/x86_64/sage-mathematics/
File "/usr/lib/python2.6/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute return self.cursor.execute(query, args) django.db.utils.DatabaseError: integer out of range
You should be able to fix that, right?
Yes, and it doesn't blow up on MySQL so not a huge rush. I was just showing the error for the curious.
I'm wondering if we need to be more careful when it comes to these big packages entering our repositories. This one is especially suspect as of its 71096 files (and 71094 in the other architecture), a ton of them are things like *.py, *.pyc, *.html, or *.png files. This is ripe for splitting into a -data package (or not including some of this junk, if it is that, at all).
We partly discussed this on aur-general, and making sage smaller is a bit of a long-term task, if even possible. Anyway, dbscripts lack support for having split packages with one package being architecture-independent, so even splitting these data files away won't be easy.
That's news to me: archweb=# select p.id, pkgname, a.name, r.name, installed_size, compressed_size from packages p left join arches a on a.id = p.arch_id left join repos r on r.id = p.repo_id where pkgname like 'nexuiz%' or pkgname like 'vdrift%' or pkgname like 'sauerbraten%' or pkgname like 'openarena%' or pkgname like 'flightgear%' order by compressed_size desc; id | pkgname | name | name | installed_size | compressed_size -------+------------------+--------+-----------+----------------+----------------- 9562 | nexuiz-data | any | Community | 891768832 | 882807981 16811 | vdrift-data | any | Community | 593498112 | 523473616 15925 | sauerbraten-data | any | Community | 538304512 | 443680072 16882 | openarena-data | any | Community | 345866240 | 333849916 13406 | flightgear-data | any | Community | 572440576 | 317831046 5490 | nexuiz | x86_64 | Community | 6164480 | 2757201 4879 | flightgear | x86_64 | Community | 10555392 | 2579360 711 | flightgear | i686 | Community | 10272768 | 2548000 1260 | nexuiz | i686 | Community | 5562368 | 2314704 6033 | sauerbraten | x86_64 | Community | 3518464 | 1142808 1822 | sauerbraten | i686 | Community | 3420160 | 1017648 16830 | vdrift | x86_64 | Community | 2994176 | 820512 16812 | vdrift | i686 | Community | 2899968 | 787368 16900 | openarena | x86_64 | Community | 1937408 | 601684 16886 | openarena | i686 | Community | 1593344 | 493280 4880 | flightgear-atlas | x86_64 | Community | 901120 | 354938 712 | flightgear-atlas | i686 | Community | 811008 | 329020 (17 rows)
tl;dr: I think we need some standards with these huge packages, and people need to be a lot more cognizant as to how big they are. We have lost more than one mirror due to complaints over needed space and stuff like this doesn't help.
If a mirror cannot cope with a few GB, then it should be dropped anyway. Our repos will get bigger, one way or the other.
It isn't "a few GB"- it is one package taking up 1.5 GB between the two architectures. That is to me, a bit out of control considering we used to not even ship info pages to save package size. I'm not "OMG take it out of the repos", but we need to at least not let 10 more of these in without some serious thought as to what we intend to package and distribute. -Dan