[arch-general] Opening a document with unicode in path

John Z. johnz at pleasantnightmare.com
Fri Aug 2 17:24:00 UTC 2019


> Could you verify that the encoding of the filepath is, in fact, UTF8?
> Filepaths in linux are free to be arbitrary bytes despite the locale
> settings. Most tools don't care, though I would expect the filepath to
> display incorrectly in the terminal and file browser if it were not UTF8.
> So it is probably a long shot but perhaps worth checking.

Hi, thank you for the suggestion. I tried running your script, and all
filenames are decoded correctly, no exception was thrown (I also tried
without try/except just in case something else gets thrown)

However, you might be onto something here because, interestingly enough:
while BASH prompt and autocompletition feature both decode the character
correctly, `ls` does not and outputs a sequence of escape codes:

    Proc'$'\303\251''dures

instead of

    Procedures (where first 'e' is the unicode char, and has french accent)


> 
> The following Python script, run in the directory containing the
> file/directory containing the french character should tell you if it it
> valid UTF8:
> 
> import os
> for item in os.listdir(b'.'):
>     try:
>         item.decode('utf8')
>     except UnicodeDecodeError:
>         print(item, "is not valid UTF8")
>         raise
> 
> On Fri, Aug 2, 2019 at 12:48 PM Eli Schwartz via arch-general <
> arch-general at archlinux.org> wrote:
> 
> > On 8/2/19 8:59 AM, John Z. wrote:
> > > Hi everyone,
> > >     there's a document on Dropbox, that has unicode character in its
> > >     path (french character). Trying to open this document with libre
> > >     office (Plasma is running) fails with 'file not found', and the path
> > >     shown with error clearly presents the path with that unicode
> > >     character replaced by '??'
> > >
> > >     What I tried:
> > >     * copy the document in a path where there's no unicode - it opens
> > >     * copy the document using shell - it works
> > >     * copy the document using Dolphin (from Plasma) - it works
> > >     * check $LANG - its set to `en_CA.UTF8`
> > >     * search for 'libreoffice unicode path', 'archlinux unicode path'
> > >       and plethora of similar search terms - not much came through
> > >
> > >     This makes me think the issue is actually with LibreOffice, but the
> > >     reason I ask here, and not in their forum, is that on another
> > >     computer running Ubuntu - this works without fail, so I'm fairly
> > >     certain the issue is in some local configuration.
> > >
> > >     Could anyone shed some light on this, please, or at least point me
> > >     in some direction where I could look?
> >
> > Can you determine some steps that exactly reproduce the problem?
> > Assuming that the problem should manifest when opening the file using
> > /usr/bin/loffice /path/to/file, I tried creating a test file and opening
> > it, and it worked:
> >
> > $ mkdir -p '/tmp/unicode paths are 💩/'
> > $ touch '/tmp/unicode paths are 💩/testfile.txt'
> > $ loffice '/tmp/unicode paths are 💩/testfile.txt'
> > $
> >
> > I could successfully edit this file in libreoffice, save content, or
> > reopen it.
> > Tested with LANG=en_US.UTF-8 and the libreoffice-fresh package
> >
> > --
> > Eli Schwartz
> > Bug Wrangler and Trusted User
> >
> >

-- 
"That gum you like is going to come back in style."


More information about the arch-general mailing list