On 15 October 2013 02:01, Jeremy Heiner <scalaprotractor@gmail.com> wrote:
On Sun, Oct 13, 2013 at 7:55 PM, Allan McRae <allan@archlinux.org> wrote:
I am going to merge all these patches apart from this one and the final patch. If a consensus can be found on how to deal with this issue, I will pull it in - I am not familiar enough with python issues to make the decision myself.
Thanks, Allan. I'm gratified that I can help make (some small) improvements.
Sorry I got delayed, but I said I would explain how the Python 2 string gotchas impact the pacman testing framework. I think I found a way to shorten this from what I had anticipated, so hopefully it won't be completely boring...
There are two pmtests with non-7bit-ascii chars: remove071 and sync600. remove071 creates one pmpkg (p2) and adds it to the "local" pmdb. sync600 copy-n-pastes that same p2 pmpkg setup, but also creates and adds sp2 to the "sync" pmdb.
The framework does very different things for the pmdbs: "local" stuff get written to the filesystem (simulating in Python code what pacman would do to install), while "sync" stuff get written to a tarfile (for later processing by the pacman binary being tested). That is the key difference and stumbling block (and also why this can't be dealt with in sync600).
Python's filesystem write API gracefully handles strings of all sorts, automatically converting char-to-byte as needed, so the "local" pmpkg p2 (in both pmtests) works great, but...
The tarfile.addfile API requires a fileobject, so the caller of that API is responsible for handling the low-level char-to-byte conversion. Python 2.7's StringIO meets that need. But in 3.x there aren't just fileobjects, there's RawIOBase (the parent class for BytesIO) which reads and writes bytes, and TextIOBase (the parent class for StringIO) which reads and writes chars. tarfile.addfile writes bytes, so in 3.x it fails when it tries to read bytes from a TextIOBase.
So how do we feed tarfile.addfile what it wants without special-casing for the Python runtime version? Rather than typing up a long explanation of why there is no way to meet that goal I've attached a Python script that tries all the options I could think of and produces a nice printout of the reason for failure in each case. The last line of the printout lists the successful options - those that work for that particular Python runtime. Running it on 2.7 and 3.3 shows no single option is successful for both.
The attached script covers what Martin suggested (assuming I haven't misunderstood what he meant). And if anyone can think of an option that I didn't please post a reply - I love learning new things.
Here are two suggestions: 1. If you put the “u” prefix on the Chinese string, it becomes a Unicode string in Python 2, and encode("utf-8") then works for 2.7 and 3.3+. You can also have the “u” prefix on the other two ASCII strings but that is optional: # . . . for entry in ["ascii", u"错误", u"7bit"]: # . . . 2. Put the following __future__ line right at the top of the file, and remove all the “u” prefixes on the strings. This effectively makes them all Unicode strings in Python 2 and 3. The encode("utf-8") call should work for 2.7, and since the “u” prefix can be removed, it should also work for Python 3.2 and earlier. # coding=utf8 from __future__ import unicode_literals import io import os # . . . for entry in ["ascii", "错误", "7bit"]: # . . . I think the second option is the best option since it uses proper Python 3 syntax and should allow compatibility with Python 3.2. I am going away from the Internet for a week, but if you are still having trouble coming up with a good solution after that I might be motivated to actually run the test suite myself and come up with a patch :P