[arch-dev-public] backward timestamps in py/pyc/pyo
allan at archlinux.org
Thu Nov 14 18:32:24 EST 2013
On 15/11/13 08:26, keenerd wrote:
> Background: https://bugs.archlinux.org/task/37006
> Packages with weirdness: http://pkgbuild.com/~kkeen/misc/backwards_list.txt
> Script to try yourself: http://pkgbuild.com/~kkeen/misc/find_backwards.py
> If the argument is a directory it will scan every package in the directory.
> If the argument is a single file it will report every py file with
> backwards timestamps it can find.
> Felixonmars did the first script and found a few packages. I reworked
> it with some advice from Bluewind and ran it against the entire repo.
> The full scan of 24136 packages took 146 minutes and found some 60
> packages with issues. (Could be faster, that was wrapped in nice and
> Basically python will generate pyc/pyo files from py files. If the py
> file is newer it will re-generate and overwrite. In $home this is
> nice. But /usr is read-only as far as users are concerned and this
> causes delays when the py is updated relative to the pyc/pyo.
> It is possible for the times to get backwards by editing the py file
> after dropping it in $pkgdir. There are a number of packages that do
> this, most often to fix shebangs after the files were installed. That
> really should happen in prepare() or build() and not package().
> Instead of trying to analyse the pkgbuilds I decided to analyse the
> tarballs. This found a reasonably large number of packages with
> py/pyc/pyo out of order. At the advice of Bluewind, the .MTREE files
> were analyzed too. This doubled the number found. (Tar stores mtime
> at 1-second granularity, mtree at microseconds.)
> This found several general classes of weirdness. There are the
> packages where we did something silly in the pkgbuild and the error
> appears in the tarball. These packages will be slow for people. Some
> examples are cited in the bug thread.
> There are packages where we do something silly but we get lucky. Gimp
> is one of these. The mtree times are backwards however they are
> sub-second so they end up identical in the tarball. The mtree times
> tell the whole story. One day we will not get lucky and a build will
> happen where the files straddle a second instead, spontaneously
> causing the error.
> Some are seemingly upstream's fault. We do everything properly, but
> the timestamps are still backwards. For example, pitivi. Possibly
> the make install copies the files in the wrong order? I have not
> looked into these in great detail.
> What should we do with these packages? Should we add a check to
> Namcap for backwards timestamps? My script is halfway there, just
> needs to be reworked as a namcap test.
Add a check to namcap. Create a rebuild list.
> If you are wondering why I care about this issue, I do have an
> ulterior motive beyond my usual pedantry. Personally I'd like to see
> pyc/pyo files removed in 95% of packages. 20 years ago CPUs and HDDs
> were both pretty slow and pre-parsing scripts was a sensible way to
> gain speed. Note however that pyc/pyo do not make your code run any
> faster. They are meant to improve speed during the initial
> loading/importing stage. For long-running processes they do nothing.
> CPUs are insanely fast compared to 20 years ago. Hard drives are
> still relatively pokey. Even on "weak" modern computers, recompiling
> takes no longer than hunting down three chunks of metadata on the
> drive. Across a wide variety of packages, removing pyc/pyo either
> makes the program start up faster or makes no difference at all. Most
> obviously, removing these files can reduce the size of a library by
> The only case I've seen where this does not hold is the sympy libary.
> It is a huge library (40MB installed) written entirely in python. The
> initial import was around 30% slower without pyc/pyo. But small
> programs and thin wrappers to big C libraries seem to be unaffected
> and sometime benefit from the streamlining.
That results in people generating these files whenever they run as root.
This leaves untracked files in the filesystem, which is bad on its own,
but after an update causes the slowdown noted here. We know this is an
issue from the number of bug reports we get about conflicts whenever
some .pyc/.pyo files are added to a package.
Also, doesn't python check for the .pyc and .pyo files anyway? So the
disk read overhead is (at least partially) there.
I *strongly* advocate for any file ending in .py having associated .pyc
and .pyo file. No files in /usr/bin should ever end in .py. Either the
suffix needs stripped, or then need packaged in /usr/lib/$pkgname and a
symlink added to /usr/bin. There are currently 32 packages in violation
Looking at other distributions, Fedora includes them in the package
while Debian, openSUSE and Gentoo all generate them in post_install()
and remove them in pre_remove(). So "everyone" generates these. I'd
prefer to generate and track them.
More information about the arch-dev-public