[arch-dev-public] backward timestamps in py/pyc/pyo

keenerd keenerd at gmail.com
Thu Nov 14 17:26:46 EST 2013


Background: https://bugs.archlinux.org/task/37006

Packages with weirdness: http://pkgbuild.com/~kkeen/misc/backwards_list.txt

Script to try yourself: http://pkgbuild.com/~kkeen/misc/find_backwards.py
If the argument is a directory it will scan every package in the directory.
If the argument is a single file it will report every py file with
backwards timestamps it can find.

Felixonmars did the first script and found a few packages.  I reworked
it with some advice from Bluewind and ran it against the entire repo.
The full scan of 24136 packages took 146 minutes and found some 60
packages with issues.  (Could be faster, that was wrapped in nice and
ionice.)

Basically python will generate pyc/pyo files from py files.  If the py
file is newer it will re-generate and overwrite.  In $home this is
nice.  But /usr is read-only as far as users are concerned and this
causes delays when the py is updated relative to the pyc/pyo.

It is possible for the times to get backwards by editing the py file
after dropping it in $pkgdir.  There are a number of packages that do
this, most often to fix shebangs after the files were installed.  That
really should happen in prepare() or build() and not package().

Instead of trying to analyse the pkgbuilds I decided to analyse the
tarballs.  This found a reasonably large number of packages with
py/pyc/pyo out of order.  At the advice of Bluewind, the .MTREE files
were analyzed too.  This doubled the number found.  (Tar stores mtime
at 1-second granularity, mtree at microseconds.)

This found several general classes of weirdness.  There are the
packages where we did something silly in the pkgbuild and the error
appears in the tarball.  These packages will be slow for people.  Some
examples are cited in the bug thread.

There are packages where we do something silly but we get lucky.  Gimp
is one of these.  The mtree times are backwards however they are
sub-second so they end up identical in the tarball.  The mtree times
tell the whole story.  One day we will not get lucky and a build will
happen where the files straddle a second instead, spontaneously
causing the error.

Some are seemingly upstream's fault.  We do everything properly, but
the timestamps are still backwards.  For example, pitivi.  Possibly
the make install copies the files in the wrong order?  I have not
looked into these in great detail.

What should we do with these packages?  Should we add a check to
Namcap for backwards timestamps?  My script is halfway there, just
needs to be reworked as a namcap test.



Footnote:

If you are wondering why I care about this issue, I do have an
ulterior motive beyond my usual pedantry.  Personally I'd like to see
pyc/pyo files removed in 95% of packages.  20 years ago CPUs and HDDs
were both pretty slow and pre-parsing scripts was a sensible way to
gain speed.  Note however that pyc/pyo do not make your code run any
faster.  They are meant to improve speed during the initial
loading/importing stage.  For long-running processes they do nothing.

CPUs are insanely fast compared to 20 years ago.  Hard drives are
still relatively pokey.  Even on "weak" modern computers, recompiling
takes no longer than hunting down three chunks of metadata on the
drive.  Across a wide variety of packages, removing pyc/pyo either
makes the program start up faster or makes no difference at all.  Most
obviously, removing these files can reduce the size of a library by
60%.

The only case I've seen where this does not hold is the sympy libary.
It is a huge library (40MB installed) written entirely in python.  The
initial import was around 30% slower without pyc/pyo.  But small
programs and thin wrappers to big C libraries seem to be unaffected
and sometime benefit from the streamlining.

-Kyle
http://kmkeen.com


More information about the arch-dev-public mailing list