[arch-dev-public] packaging hunspell dictionaries converted for qt5-webengine

Eli Schwartz eschwartz at archlinux.org
Tue Aug 13 15:43:19 UTC 2019


On August 13, 2019 5:17:27 AM EDT, Florian Bruhin <me at the-compiler.org> wrote:
> Hey,
> 
> My $0.02 as qutebrowser maintainer (off-list because I can't send to
> arch-dev-public):

Forwarded back to a-d-p with inline comments. :)

> On Mon, Aug 12, 2019 at 07:50:44PM -0400, Eli Schwartz via
> arch-dev-public wrote:
> > QtWebEngine supports spellchecking:
> > https://doc.qt.io/qt-5/qtwebengine-features.html#spellchecker
> > 
> > However, they have helpfully decided (steered by upstream chromium)
> to
> > *not* use hunspell dictionaries, and instead to use... hunspell
> > dictionaries stored in /usr/share/qt/qtwebengine_dictionaries/ as
> > ".bdic" files, because this is supposedly "more efficiently read by
> > chromium".
> 
> The actual spell checking is implemented inside Chromium, all
> QtWebEngine does
> with the dictionaries is passing them to Chromium. I don't think
> they're happy
> with bdic files either, but the alternatives aren't really an option
> (completely reimplementing spell checking support by patching their
> copy of
> Chromium, with a lot of added friction each time they want to update
> their
> Chromium snapshot).
> 
> So it pretty much boils down to "blame Google/Chromium" ;)

Yeah, but I still wanna blame them for npt patching it for the purpose of integrating well. :p

> > (Actually QtWebEngine's spell-checking infrastructure is entirely
> > willing to read dictionaries in /usr/bin/qtwebengine_dictionaries
> before
> > looking in /usr/share because clearly they've put great thought into
> how
> > this is all supposed to work on a conceptual design level especially
> for
> > distro packaging.)
> 
> Agreed this doesn't make much sense for Linux distributions. It
> happens because
> it looks next to the executable, which probably *does* make a lot of
> sense for
> Windows, macOS, embedded scenarios, bundled apps, etc. It doesn't help
> much for
> distributions, but it also doesn't hurt.
> 
> > So I have a program -- pageedit -- which just added spellchecking
> > support via qtwebengine in the latest release, and I would like to
> > support that. And I don't want to see people being personally
> > responsible for installing their own stuff in /usr/share. While I'm
> at
> > it, Morten (Foxboron) pointed out to me that qutebrowser also
> supports
> > spellchecking, and it currently provides a user script which
> downloads
> > preconverted dictionaries from chromium's git repository into
> > $HOME/.local/share/qutebrowser/ ... because there's apparently no
> > guidance or precedent for actually distributing these dictionaries.
> (In
> > fact, currently only Fedora seems to make these dictionaries
> available
> > to users.)
> 
> Oh, I didn't know Fedora packages them! I opened a qutebrowser issue
> too:
> https://github.com/qutebrowser/qutebrowser/issues/4966
> 
> > It's possible to convert them yourself, using the
> > qwebengine_convert_dict tool shipped in the qt5-webengine package. I
> > think it would be nice if users were able to obtain these
> dictionaries
> > properly, but I'm not positive what the best way would be. Ideas:
> > 
> > - Ship a pacman hook to convert whatever the user has installed,
> >   implemented via the following libalpm script and hooks:
> >   https://paste.xinu.at/m-ydTjU/
> > - make every hunspell-* package makedepend on qt5-webengine and
> produce
> >   those dictionaries
> > - same thing but also make split packages for basically a tiny data
> file
> > - force users to install an out of date AUR package not kept in sync
> >   with hunspell-* (this one is just a joke)
> > 
> > The advantage of a hook is that users with webengine installed
> > automatically get magic google-approved dictionaries corresponding
> to
> > the hunspell dictionaries they have installed.
> > 
> > The advantage of modifying each hunspell-* package is saving about
> 0.38
> > seconds per file at installation time, plus users don't have weird
> > untracked files in some cloistered dir in /usr/
> > 
> > The advantage of doing anything other than possibility #3 is "avoid
> > adding another 34 packages to the repositories, which users need to
> > manually install in addition to the other dictionaries they
> explicitly
> > installed".
> 
> Depending on how big those dictionaries are, they all could be in a
> single
> qt5-webengine-dicts package? Though I guess they aren't much smaller
> than the
> hunspell ones, and there probably was a reason those were split.

Well, they are different source code with different versions, so I see no gain or practical way to implement a combined hunspell package. A combined webengine dicts package would need to makedepend on all hunspell dict packages, then get updated for any hunspell dict update.

> On Tue, Aug 13, 2019 at 09:22:39AM +0200, Jan Alexander Steffens via
> arch-dev-public wrote:
> > On Tue, Aug 13, 2019 at 9:04 AM Bartłomiej Piotrowski via
> arch-dev-public <
> > arch-dev-public at archlinux.org> wrote:
> > > I'd go with updating all packages to ship the converted files.
> > > Cluttering /usr with untracked files doesn't sound good.
> > 
> > Yeah, I agree. I think we should package convert_dict from the
> Chromium
> > sources as a new package to makedepend on.
> 
> I'm assuming those are compatible to each other? It does seem like it
> from the
> sources:
> 
> https://github.com/qt/qtwebengine/blob/v5.13.0/src/tools/qwebengine_convert_dict/main.cpp
> https://github.com/qt/qtwebengine-chromium/blob/75-based/chromium/chrome/tools/convert_dict/convert_dict.cc
> 
> > Assuming that WebEngine will not be the only consumer of .bdic
> > dictionaries, how about putting them in /usr/share/bdic, and then
> either
> > patching sources to use that dir or linking whatever engine-specific
> > dictionaries there?
> > 
> > We could also put them with the other dictionaries into
> > /usr/share/hunspell, assuming that won't cause problems.
> 
> I guess Qt wouldn't be opposed to a change (for Qt 5.14 I guess)
> adding one of
> those paths.
> 
> Maybe the Chromium package could load them as well from there?
> 
> Florian

I have no idea at all what chromium does do right now!
-- 
Eli Schwartz
Bug Wrangler and Trusted User
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 858 bytes
Desc: not available
URL: <https://lists.archlinux.org/pipermail/arch-dev-public/attachments/20190813/42a474b8/attachment.sig>


More information about the arch-dev-public mailing list