On 8/2/19 1:24 PM, John Z. wrote:
Could you verify that the encoding of the filepath is, in fact, UTF8? Filepaths in linux are free to be arbitrary bytes despite the locale settings. Most tools don't care, though I would expect the filepath to display incorrectly in the terminal and file browser if it were not UTF8. So it is probably a long shot but perhaps worth checking.
Hi, thank you for the suggestion. I tried running your script, and all filenames are decoded correctly, no exception was thrown (I also tried without try/except just in case something else gets thrown)
However, you might be onto something here because, interestingly enough: while BASH prompt and autocompletition feature both decode the character correctly, `ls` does not and outputs a sequence of escape codes:
Proc'$'\303\251''dures
instead of
Procedures (where first 'e' is the unicode char, and has french accent)
The ls command will by default escape the character into its numeric code if it thinks the character is invalid in your locale. I can get ls to print the same thing as you (using shell-escaped $'\303\251') *iff* I first export LC_ALL=C (which is not a UTF-8 locale and therefore cannot print unicode characters). This indicates something is wrong with your locale, because at the very least, your shell cannot parse the character correctly -- maybe neither can libreoffice. -- Eli Schwartz Bug Wrangler and Trusted User