[arch-dev-public] Large packages in repositories
Hey guys, A package went in so big today that it made reporead blow up on my local database due to the installed size being > 2GB: http://www.archlinux.org/packages/community/i686/sage-mathematics/ http://www.archlinux.org/packages/community/x86_64/sage-mathematics/ File "/usr/lib/python2.6/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute return self.cursor.execute(query, args) django.db.utils.DatabaseError: integer out of range I'm wondering if we need to be more careful when it comes to these big packages entering our repositories. This one is especially suspect as of its 71096 files (and 71094 in the other architecture), a ton of them are things like *.py, *.pyc, *.html, or *.png files. This is ripe for splitting into a -data package (or not including some of this junk, if it is that, at all). mysql> select count(*), substring_index(path, '.', -1) from package_files where pkg_id = 49860 and path like '%.%' group by substring_index(path, '.', -1) order by count(*) desc limit 25; +----------+--------------------------------+ | count(*) | substring_index(path, '.', -1) | +----------+--------------------------------+ | 29787 | png | | 9410 | py | | 6602 | pyc | | 1804 | h | | 1722 | html | | 1659 | txt | | 1168 | pyo | | 988 | doctree | | 988 | rst | | 886 | hpp | tl;dr: I think we need some standards with these huge packages, and people need to be a lot more cognizant as to how big they are. We have lost more than one mirror due to complaints over needed space and stuff like this doesn't help. If any TU's would like to forward this along and solicit their thoughts I'd appreciate it. -Dan
Am 17.08.2010 16:12, schrieb Dan McGee:
Hey guys,
A package went in so big today that it made reporead blow up on my local database due to the installed size being > 2GB: http://www.archlinux.org/packages/community/i686/sage-mathematics/ http://www.archlinux.org/packages/community/x86_64/sage-mathematics/
File "/usr/lib/python2.6/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute return self.cursor.execute(query, args) django.db.utils.DatabaseError: integer out of range
You should be able to fix that, right?
I'm wondering if we need to be more careful when it comes to these big packages entering our repositories. This one is especially suspect as of its 71096 files (and 71094 in the other architecture), a ton of them are things like *.py, *.pyc, *.html, or *.png files. This is ripe for splitting into a -data package (or not including some of this junk, if it is that, at all).
We partly discussed this on aur-general, and making sage smaller is a bit of a long-term task, if even possible. Anyway, dbscripts lack support for having split packages with one package being architecture-independent, so even splitting these data files away won't be easy.
tl;dr: I think we need some standards with these huge packages, and people need to be a lot more cognizant as to how big they are. We have lost more than one mirror due to complaints over needed space and stuff like this doesn't help.
If a mirror cannot cope with a few GB, then it should be dropped anyway. Our repos will get bigger, one way or the other.
On Tue, Aug 17, 2010 at 9:28 AM, Thomas Bächler <thomas@archlinux.org> wrote:
Am 17.08.2010 16:12, schrieb Dan McGee:
Hey guys,
A package went in so big today that it made reporead blow up on my local database due to the installed size being > 2GB: http://www.archlinux.org/packages/community/i686/sage-mathematics/ http://www.archlinux.org/packages/community/x86_64/sage-mathematics/
File "/usr/lib/python2.6/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute return self.cursor.execute(query, args) django.db.utils.DatabaseError: integer out of range
You should be able to fix that, right?
Yes, and it doesn't blow up on MySQL so not a huge rush. I was just showing the error for the curious.
I'm wondering if we need to be more careful when it comes to these big packages entering our repositories. This one is especially suspect as of its 71096 files (and 71094 in the other architecture), a ton of them are things like *.py, *.pyc, *.html, or *.png files. This is ripe for splitting into a -data package (or not including some of this junk, if it is that, at all).
We partly discussed this on aur-general, and making sage smaller is a bit of a long-term task, if even possible. Anyway, dbscripts lack support for having split packages with one package being architecture-independent, so even splitting these data files away won't be easy.
That's news to me: archweb=# select p.id, pkgname, a.name, r.name, installed_size, compressed_size from packages p left join arches a on a.id = p.arch_id left join repos r on r.id = p.repo_id where pkgname like 'nexuiz%' or pkgname like 'vdrift%' or pkgname like 'sauerbraten%' or pkgname like 'openarena%' or pkgname like 'flightgear%' order by compressed_size desc; id | pkgname | name | name | installed_size | compressed_size -------+------------------+--------+-----------+----------------+----------------- 9562 | nexuiz-data | any | Community | 891768832 | 882807981 16811 | vdrift-data | any | Community | 593498112 | 523473616 15925 | sauerbraten-data | any | Community | 538304512 | 443680072 16882 | openarena-data | any | Community | 345866240 | 333849916 13406 | flightgear-data | any | Community | 572440576 | 317831046 5490 | nexuiz | x86_64 | Community | 6164480 | 2757201 4879 | flightgear | x86_64 | Community | 10555392 | 2579360 711 | flightgear | i686 | Community | 10272768 | 2548000 1260 | nexuiz | i686 | Community | 5562368 | 2314704 6033 | sauerbraten | x86_64 | Community | 3518464 | 1142808 1822 | sauerbraten | i686 | Community | 3420160 | 1017648 16830 | vdrift | x86_64 | Community | 2994176 | 820512 16812 | vdrift | i686 | Community | 2899968 | 787368 16900 | openarena | x86_64 | Community | 1937408 | 601684 16886 | openarena | i686 | Community | 1593344 | 493280 4880 | flightgear-atlas | x86_64 | Community | 901120 | 354938 712 | flightgear-atlas | i686 | Community | 811008 | 329020 (17 rows)
tl;dr: I think we need some standards with these huge packages, and people need to be a lot more cognizant as to how big they are. We have lost more than one mirror due to complaints over needed space and stuff like this doesn't help.
If a mirror cannot cope with a few GB, then it should be dropped anyway. Our repos will get bigger, one way or the other.
It isn't "a few GB"- it is one package taking up 1.5 GB between the two architectures. That is to me, a bit out of control considering we used to not even ship info pages to save package size. I'm not "OMG take it out of the repos", but we need to at least not let 10 more of these in without some serious thought as to what we intend to package and distribute. -Dan
Am 17.08.2010 16:39, schrieb Dan McGee:
That's news to me:
archweb=# select p.id, pkgname, a.name, r.name, installed_size, compressed_size from packages p left join arches a on a.id = p.arch_id left join repos r on r.id = p.repo_id where pkgname like 'nexuiz%' or pkgname like 'vdrift%' or pkgname like 'sauerbraten%' or pkgname like 'openarena%' or pkgname like 'flightgear%' order by compressed_size desc; id | pkgname | name | name | installed_size | compressed_size -------+------------------+--------+-----------+----------------+----------------- 9562 | nexuiz-data | any | Community | 891768832 | 882807981 16811 | vdrift-data | any | Community | 593498112 | 523473616 15925 | sauerbraten-data | any | Community | 538304512 | 443680072 16882 | openarena-data | any | Community | 345866240 | 333849916 13406 | flightgear-data | any | Community | 572440576 | 317831046 5490 | nexuiz | x86_64 | Community | 6164480 | 2757201 4879 | flightgear | x86_64 | Community | 10555392 | 2579360 711 | flightgear | i686 | Community | 10272768 | 2548000 1260 | nexuiz | i686 | Community | 5562368 | 2314704 6033 | sauerbraten | x86_64 | Community | 3518464 | 1142808 1822 | sauerbraten | i686 | Community | 3420160 | 1017648 16830 | vdrift | x86_64 | Community | 2994176 | 820512 16812 | vdrift | i686 | Community | 2899968 | 787368 16900 | openarena | x86_64 | Community | 1937408 | 601684 16886 | openarena | i686 | Community | 1593344 | 493280 4880 | flightgear-atlas | x86_64 | Community | 901120 | 354938 712 | flightgear-atlas | i686 | Community | 811008 | 329020 (17 rows)
These have all been built using separate PKGBUILDs, not split ones. You won't be able to do this in sage as easily. However, I do agree that we should split that - somehow.
It isn't "a few GB"- it is one package taking up 1.5 GB between the two architectures. That is to me, a bit out of control considering we used to not even ship info pages to save package size.
It's a big package. I was the first one to upload nexuiz (i686 only, 800MB maybe?) to community back in 2005. I considered that a "huge" package, and nobody ever noticed or complained.
I'm not "OMG take it out of the repos", but we need to at least not let 10 more of these in without some serious thought as to what we intend to package and distribute.
I can try to work with td123 to try and make the sage package saner, but it's nothing that will happen quickly.
On Tue, 17 Aug 2010 16:28:17 +0200, Thomas Bächler <thomas@archlinux.org> wrote:
We partly discussed this on aur-general, and making sage smaller is a bit of a long-term task, if even possible. Anyway, dbscripts lack support for having split packages with one package being architecture-independent, so even splitting these data files away won't be easy.
Just a side note: I fear there wont be support for split packages with different arches in the near future due to some design problems mainly with our svn repo layout. (sorry for lack of details but I'll send a separate mail about it some day) So for now its to sue a separate PKGBUILD if an "any" package is really needed. Greetings, Pierre -- Pierre Schmitz, https://users.archlinux.de/~pierre
Am 17.08.2010 17:12, schrieb Pierre Schmitz:
On Tue, 17 Aug 2010 16:28:17 +0200, Thomas Bächler <thomas@archlinux.org> wrote:
We partly discussed this on aur-general, and making sage smaller is a bit of a long-term task, if even possible. Anyway, dbscripts lack support for having split packages with one package being architecture-independent, so even splitting these data files away won't be easy.
Just a side note: I fear there wont be support for split packages with different arches in the near future due to some design problems mainly with our svn repo layout. (sorry for lack of details but I'll send a separate mail about it some day)
So for now its to sue a separate PKGBUILD if an "any" package is really needed.
If you do it all manually, you could just archrelease your PKGBUILD to repo-any and repo-{i686,x86_64}. db-move and db-remove might fail, but at least db-update should work fine. For some packages, such workarounds might be acceptable for the time being.
participants (3)
-
Dan McGee
-
Pierre Schmitz
-
Thomas Bächler