[arch-general] Opening a document with unicode in path
johnz at pleasantnightmare.com
Fri Aug 2 17:24:00 UTC 2019
> Could you verify that the encoding of the filepath is, in fact, UTF8?
> Filepaths in linux are free to be arbitrary bytes despite the locale
> settings. Most tools don't care, though I would expect the filepath to
> display incorrectly in the terminal and file browser if it were not UTF8.
> So it is probably a long shot but perhaps worth checking.
Hi, thank you for the suggestion. I tried running your script, and all
filenames are decoded correctly, no exception was thrown (I also tried
without try/except just in case something else gets thrown)
However, you might be onto something here because, interestingly enough:
while BASH prompt and autocompletition feature both decode the character
correctly, `ls` does not and outputs a sequence of escape codes:
Procedures (where first 'e' is the unicode char, and has french accent)
> The following Python script, run in the directory containing the
> file/directory containing the french character should tell you if it it
> valid UTF8:
> import os
> for item in os.listdir(b'.'):
> except UnicodeDecodeError:
> print(item, "is not valid UTF8")
> On Fri, Aug 2, 2019 at 12:48 PM Eli Schwartz via arch-general <
> arch-general at archlinux.org> wrote:
> > On 8/2/19 8:59 AM, John Z. wrote:
> > > Hi everyone,
> > > there's a document on Dropbox, that has unicode character in its
> > > path (french character). Trying to open this document with libre
> > > office (Plasma is running) fails with 'file not found', and the path
> > > shown with error clearly presents the path with that unicode
> > > character replaced by '??'
> > >
> > > What I tried:
> > > * copy the document in a path where there's no unicode - it opens
> > > * copy the document using shell - it works
> > > * copy the document using Dolphin (from Plasma) - it works
> > > * check $LANG - its set to `en_CA.UTF8`
> > > * search for 'libreoffice unicode path', 'archlinux unicode path'
> > > and plethora of similar search terms - not much came through
> > >
> > > This makes me think the issue is actually with LibreOffice, but the
> > > reason I ask here, and not in their forum, is that on another
> > > computer running Ubuntu - this works without fail, so I'm fairly
> > > certain the issue is in some local configuration.
> > >
> > > Could anyone shed some light on this, please, or at least point me
> > > in some direction where I could look?
> > Can you determine some steps that exactly reproduce the problem?
> > Assuming that the problem should manifest when opening the file using
> > /usr/bin/loffice /path/to/file, I tried creating a test file and opening
> > it, and it worked:
> > $ mkdir -p '/tmp/unicode paths are 💩/'
> > $ touch '/tmp/unicode paths are 💩/testfile.txt'
> > $ loffice '/tmp/unicode paths are 💩/testfile.txt'
> > $
> > I could successfully edit this file in libreoffice, save content, or
> > reopen it.
> > Tested with LANG=en_US.UTF-8 and the libreoffice-fresh package
> > --
> > Eli Schwartz
> > Bug Wrangler and Trusted User
"That gum you like is going to come back in style."
More information about the arch-general