[arch-dev-public] backward timestamps in py/pyc/pyo
Background: https://bugs.archlinux.org/task/37006 Packages with weirdness: http://pkgbuild.com/~kkeen/misc/backwards_list.txt Script to try yourself: http://pkgbuild.com/~kkeen/misc/find_backwards.py If the argument is a directory it will scan every package in the directory. If the argument is a single file it will report every py file with backwards timestamps it can find. Felixonmars did the first script and found a few packages. I reworked it with some advice from Bluewind and ran it against the entire repo. The full scan of 24136 packages took 146 minutes and found some 60 packages with issues. (Could be faster, that was wrapped in nice and ionice.) Basically python will generate pyc/pyo files from py files. If the py file is newer it will re-generate and overwrite. In $home this is nice. But /usr is read-only as far as users are concerned and this causes delays when the py is updated relative to the pyc/pyo. It is possible for the times to get backwards by editing the py file after dropping it in $pkgdir. There are a number of packages that do this, most often to fix shebangs after the files were installed. That really should happen in prepare() or build() and not package(). Instead of trying to analyse the pkgbuilds I decided to analyse the tarballs. This found a reasonably large number of packages with py/pyc/pyo out of order. At the advice of Bluewind, the .MTREE files were analyzed too. This doubled the number found. (Tar stores mtime at 1-second granularity, mtree at microseconds.) This found several general classes of weirdness. There are the packages where we did something silly in the pkgbuild and the error appears in the tarball. These packages will be slow for people. Some examples are cited in the bug thread. There are packages where we do something silly but we get lucky. Gimp is one of these. The mtree times are backwards however they are sub-second so they end up identical in the tarball. The mtree times tell the whole story. One day we will not get lucky and a build will happen where the files straddle a second instead, spontaneously causing the error. Some are seemingly upstream's fault. We do everything properly, but the timestamps are still backwards. For example, pitivi. Possibly the make install copies the files in the wrong order? I have not looked into these in great detail. What should we do with these packages? Should we add a check to Namcap for backwards timestamps? My script is halfway there, just needs to be reworked as a namcap test. Footnote: If you are wondering why I care about this issue, I do have an ulterior motive beyond my usual pedantry. Personally I'd like to see pyc/pyo files removed in 95% of packages. 20 years ago CPUs and HDDs were both pretty slow and pre-parsing scripts was a sensible way to gain speed. Note however that pyc/pyo do not make your code run any faster. They are meant to improve speed during the initial loading/importing stage. For long-running processes they do nothing. CPUs are insanely fast compared to 20 years ago. Hard drives are still relatively pokey. Even on "weak" modern computers, recompiling takes no longer than hunting down three chunks of metadata on the drive. Across a wide variety of packages, removing pyc/pyo either makes the program start up faster or makes no difference at all. Most obviously, removing these files can reduce the size of a library by 60%. The only case I've seen where this does not hold is the sympy libary. It is a huge library (40MB installed) written entirely in python. The initial import was around 30% slower without pyc/pyo. But small programs and thin wrappers to big C libraries seem to be unaffected and sometime benefit from the streamlining. -Kyle http://kmkeen.com
On 15/11/13 08:26, keenerd wrote:
Background: https://bugs.archlinux.org/task/37006
Packages with weirdness: http://pkgbuild.com/~kkeen/misc/backwards_list.txt
Script to try yourself: http://pkgbuild.com/~kkeen/misc/find_backwards.py If the argument is a directory it will scan every package in the directory. If the argument is a single file it will report every py file with backwards timestamps it can find.
Felixonmars did the first script and found a few packages. I reworked it with some advice from Bluewind and ran it against the entire repo. The full scan of 24136 packages took 146 minutes and found some 60 packages with issues. (Could be faster, that was wrapped in nice and ionice.)
Basically python will generate pyc/pyo files from py files. If the py file is newer it will re-generate and overwrite. In $home this is nice. But /usr is read-only as far as users are concerned and this causes delays when the py is updated relative to the pyc/pyo.
It is possible for the times to get backwards by editing the py file after dropping it in $pkgdir. There are a number of packages that do this, most often to fix shebangs after the files were installed. That really should happen in prepare() or build() and not package().
Instead of trying to analyse the pkgbuilds I decided to analyse the tarballs. This found a reasonably large number of packages with py/pyc/pyo out of order. At the advice of Bluewind, the .MTREE files were analyzed too. This doubled the number found. (Tar stores mtime at 1-second granularity, mtree at microseconds.)
This found several general classes of weirdness. There are the packages where we did something silly in the pkgbuild and the error appears in the tarball. These packages will be slow for people. Some examples are cited in the bug thread.
There are packages where we do something silly but we get lucky. Gimp is one of these. The mtree times are backwards however they are sub-second so they end up identical in the tarball. The mtree times tell the whole story. One day we will not get lucky and a build will happen where the files straddle a second instead, spontaneously causing the error.
Some are seemingly upstream's fault. We do everything properly, but the timestamps are still backwards. For example, pitivi. Possibly the make install copies the files in the wrong order? I have not looked into these in great detail.
What should we do with these packages? Should we add a check to Namcap for backwards timestamps? My script is halfway there, just needs to be reworked as a namcap test.
Add a check to namcap. Create a rebuild list.
Footnote:
If you are wondering why I care about this issue, I do have an ulterior motive beyond my usual pedantry. Personally I'd like to see pyc/pyo files removed in 95% of packages. 20 years ago CPUs and HDDs were both pretty slow and pre-parsing scripts was a sensible way to gain speed. Note however that pyc/pyo do not make your code run any faster. They are meant to improve speed during the initial loading/importing stage. For long-running processes they do nothing.
CPUs are insanely fast compared to 20 years ago. Hard drives are still relatively pokey. Even on "weak" modern computers, recompiling takes no longer than hunting down three chunks of metadata on the drive. Across a wide variety of packages, removing pyc/pyo either makes the program start up faster or makes no difference at all. Most obviously, removing these files can reduce the size of a library by 60%.
The only case I've seen where this does not hold is the sympy libary. It is a huge library (40MB installed) written entirely in python. The initial import was around 30% slower without pyc/pyo. But small programs and thin wrappers to big C libraries seem to be unaffected and sometime benefit from the streamlining.
That results in people generating these files whenever they run as root. This leaves untracked files in the filesystem, which is bad on its own, but after an update causes the slowdown noted here. We know this is an issue from the number of bug reports we get about conflicts whenever some .pyc/.pyo files are added to a package. Also, doesn't python check for the .pyc and .pyo files anyway? So the disk read overhead is (at least partially) there. I *strongly* advocate for any file ending in .py having associated .pyc and .pyo file. No files in /usr/bin should ever end in .py. Either the suffix needs stripped, or then need packaged in /usr/lib/$pkgname and a symlink added to /usr/bin. There are currently 32 packages in violation of this. Looking at other distributions, Fedora includes them in the package while Debian, openSUSE and Gentoo all generate them in post_install() and remove them in pre_remove(). So "everyone" generates these. I'd prefer to generate and track them. Allan
On 11/14/13, Allan McRae <allan@archlinux.org> wrote:
Add a check to namcap. Create a rebuild list.
As a rough draft of a namcap check, how does this look: https://github.com/keenerd/namcap It is missing a test case, I am still figuring out how to do that properly. -Kyle http://kmkeen.com
participants (2)
-
Allan McRae
-
keenerd