[arch-dev-public] packaging hunspell dictionaries converted for qt5-webengine
QtWebEngine supports spellchecking: https://doc.qt.io/qt-5/qtwebengine-features.html#spellchecker However, they have helpfully decided (steered by upstream chromium) to *not* use hunspell dictionaries, and instead to use... hunspell dictionaries stored in /usr/share/qt/qtwebengine_dictionaries/ as ".bdic" files, because this is supposedly "more efficiently read by chromium". (Actually QtWebEngine's spell-checking infrastructure is entirely willing to read dictionaries in /usr/bin/qtwebengine_dictionaries before looking in /usr/share because clearly they've put great thought into how this is all supposed to work on a conceptual design level especially for distro packaging.) So I have a program -- pageedit -- which just added spellchecking support via qtwebengine in the latest release, and I would like to support that. And I don't want to see people being personally responsible for installing their own stuff in /usr/share. While I'm at it, Morten (Foxboron) pointed out to me that qutebrowser also supports spellchecking, and it currently provides a user script which downloads preconverted dictionaries from chromium's git repository into $HOME/.local/share/qutebrowser/ ... because there's apparently no guidance or precedent for actually distributing these dictionaries. (In fact, currently only Fedora seems to make these dictionaries available to users.) It's possible to convert them yourself, using the qwebengine_convert_dict tool shipped in the qt5-webengine package. I think it would be nice if users were able to obtain these dictionaries properly, but I'm not positive what the best way would be. Ideas: - Ship a pacman hook to convert whatever the user has installed, implemented via the following libalpm script and hooks: https://paste.xinu.at/m-ydTjU/ - make every hunspell-* package makedepend on qt5-webengine and produce those dictionaries - same thing but also make split packages for basically a tiny data file - force users to install an out of date AUR package not kept in sync with hunspell-* (this one is just a joke) The advantage of a hook is that users with webengine installed automatically get magic google-approved dictionaries corresponding to the hunspell dictionaries they have installed. The advantage of modifying each hunspell-* package is saving about 0.38 seconds per file at installation time, plus users don't have weird untracked files in some cloistered dir in /usr/ The advantage of doing anything other than possibility #3 is "avoid adding another 34 packages to the repositories, which users need to manually install in addition to the other dictionaries they explicitly installed". ... Prior art: Fedora uses rpm post-install filetriggers: https://src.fedoraproject.org/rpms/qt5-qtwebengine/blob/master/f/qt5-qtweben... Gentoo has a proposal for a package that runs the conversion tool on each file the user has installed in /usr/share/hunspell/ and packages the results. ... Thoughts on the best way forward to make these dictionaries available on Arch Linux? -- Eli Schwartz Bug Wrangler and Trusted User
I'd go with updating all packages to ship the converted files. Cluttering /usr with untracked files doesn't sound good. BP
On Tue, Aug 13, 2019 at 9:04 AM Bartłomiej Piotrowski via arch-dev-public < arch-dev-public@archlinux.org> wrote:
I'd go with updating all packages to ship the converted files. Cluttering /usr with untracked files doesn't sound good.
Yeah, I agree. I think we should package convert_dict from the Chromium sources as a new package to makedepend on. Assuming that WebEngine will not be the only consumer of .bdic dictionaries, how about putting them in /usr/share/bdic, and then either patching sources to use that dir or linking whatever engine-specific dictionaries there? We could also put them with the other dictionaries into /usr/share/hunspell, assuming that won't cause problems.
On August 13, 2019 3:22:39 AM EDT, Jan Alexander Steffens via arch-dev-public <arch-dev-public@archlinux.org> wrote:
On Tue, Aug 13, 2019 at 9:04 AM Bartłomiej Piotrowski via arch-dev-public < arch-dev-public@archlinux.org> wrote:
I'd go with updating all packages to ship the converted files. Cluttering /usr with untracked files doesn't sound good.
Yeah, I agree. I think we should package convert_dict from the Chromium sources as a new package to makedepend on.
Do we need that, really? We could just splitpkg the one in qt5-webengine as it only links to QtCore, and that would at least save on webengine in a build chroot. But no users actually suffer because of large makedeps.
Assuming that WebEngine will not be the only consumer of .bdic dictionaries, how about putting them in /usr/share/bdic, and then either patching sources to use that dir or linking whatever engine-specific dictionaries there?
I doubt much of anything uses this other than chromium derivatives. I have no idea how chromium handles this and couldn't find a packaged .bdic file to base assumptions on. I similarly have no clue what electron's story is. The grammalecte package does have _dictionaries/*.bdic in its python site-packages datadir. I think that probably somehow ties into kde things that use qt5-webengine.
We could also put them with the other dictionaries into /usr/share/hunspell, assuming that won't cause problems.
I don't think it will cause problems but I also don't think it will help. Things that expect hunspell dicts won't expect chromium bdics to be there and won't use them. And qt5-webengine won't look there. Maybe if they added support for that, it would make sense. -- Eli Schwartz Bug Wrangler and Trusted User
On 8/13/19 12:05 PM, Eli Schwartz wrote:
On August 13, 2019 3:22:39 AM EDT, Jan Alexander Steffens wrote:
Assuming that WebEngine will not be the only consumer of .bdic dictionaries, how about putting them in /usr/share/bdic, and then either patching sources to use that dir or linking whatever engine-specific dictionaries there?
I doubt much of anything uses this other than chromium derivatives. I have no idea how chromium handles this and couldn't find a packaged .bdic file to base assumptions on.
I similarly have no clue what electron's story is.
The grammalecte package does have _dictionaries/*.bdic in its python site-packages datadir. I think that probably somehow ties into kde things that use qt5-webengine.
We could also put them with the other dictionaries into /usr/share/hunspell, assuming that won't cause problems.
I don't think it will cause problems but I also don't think it will help. Things that expect hunspell dicts won't expect chromium bdics to be there and won't use them. And qt5-webengine won't look there. Maybe if they added support for that, it would make sense.
Status update: after discussion on IRC, I was able to convince heftig that this is indeed unnecessary (to try to reorganize the final installation locations of these dictionaries). ... I've gotten several positive responses and no negative ones so far. Although it's been noted that it might be nice to have a more lightweight convert tool packaged, I think for now we can stick with the one in qt5-webengine. Anyone else have any last-minute objections? Should I create a TODO list for all our dictionary packages? -- Eli Schwartz Bug Wrangler and Trusted User
I've gotten several positive responses and no negative ones so far. Although it's been noted that it might be nice to have a more lightweight convert tool packaged, I think for now we can stick with the one in qt5-webengine.
Anyone else have any last-minute objections? Should I create a TODO list for all our dictionary packages? Not that it is of our direct concern, but qt5-webengine seems to suffer from unresolved questionable licensing issues, which is why e.g. Parabola doesn't package it [1]. I don't know the specifics, but assume,
On 2019-09-02 23:56:39 (-0400), Eli Schwartz via arch-dev-public wrote: that it is due to the Chromium license [2]. Best, David [1] https://www.parabola.nu/packages/?q=qt5- [2] https://github.com/qt/qtwebengine/blob/5.12/LICENSE.Chromium -- https://sleepmap.de
On 9/3/19 4:47 AM, David Runge wrote:
Not that it is of our direct concern, but qt5-webengine seems to suffer from unresolved questionable licensing issues, which is why e.g. Parabola doesn't package it [1]. I don't know the specifics, but assume, that it is due to the Chromium license [2].
Best, David
[1] https://www.parabola.nu/packages/?q=qt5- [2] https://github.com/qt/qtwebengine/blob/5.12/LICENSE.Chromium
I mean, if we're concerned about that we should remove the chromium package first, *then* remove qt5-webengine (and electron). Since it is a complex issue, I will mostly drop links and expect interested people to read up on it, rather than giving a summary myself. The GNU FSDG considers chromium to be "not provably free": https://libreplanet.org/wiki/List_of_software_that_does_not_respect_the_Free... Original chromium project bug report: https://bugs.chromium.org/p/chromium/issues/detail?id=28291 Parabola meta-bug tracking their general stance on chromium and affected packages: https://labs.parabola.nu/issues/1167 What Qt developers think about this: https://bugs.kde.org/show_bug.cgi?id=374808#c4 https://lists.qt-project.org/pipermail/qtwebengine/2017-January/000409.html -- Eli Schwartz Bug Wrangler and Trusted User
On August 13, 2019 3:03:59 AM EDT, "Bartłomiej Piotrowski via arch-dev-public" <arch-dev-public@archlinux.org> wrote:
I'd go with updating all packages to ship the converted files. Cluttering /usr with untracked files doesn't sound good.
I agree, that's my preferred option too -- but I need buy-in from all hunspell-* maintainers, hence the mail. :D Failing that I'd go with the hook since it's the simplest way to efficiently do this without requiring me to maintain 34 new packages containing trivial modifications of other packages, something I do *not* want to do. -- Eli Schwartz Bug Wrangler and Trusted User
On Tue, 13 Aug 2019 at 18:50, Eli Schwartz via arch-dev-public < arch-dev-public@archlinux.org> wrote:
On August 13, 2019 3:03:59 AM EDT, "Bartłomiej Piotrowski via arch-dev-public" <arch-dev-public@archlinux.org> wrote:
I'd go with updating all packages to ship the converted files. Cluttering /usr with untracked files doesn't sound good.
I agree, that's my preferred option too -- but I need buy-in from all hunspell-* maintainers, hence the mail. :D
+1 for shipping .bdict files under /usr/share/qt/qtwebengine_dictionaries/ as part of hunspell-* packages. Using qwebengine_convert_dict for the conversion seems fine (pulling in qt5-webengine as a build dep).
participants (5)
-
Bartłomiej Piotrowski
-
David Runge
-
Eli Schwartz
-
Evangelos Foutras
-
Jan Alexander Steffens