[arch-general] Conflict python-html2text/html2text (both /usr/bin/html2text)?
David C. Rankin
drankinatty at suddenlinkmail.com
Tue Mar 10 14:54:10 UTC 2015
All,
I have been working on a project that retrieves and parses statutes basically
making use of the piped processes 'curl -s url | html2text -utf8' with the
read-end of the second pipe passed to getline. I moved the code from my laptop
(suse) to my servers (Arch) and the pipe process broke. Poking around, I found
it was due to Arch packaging python-html2text (as /usr/bin/html2text) instead of
the gcc-libs version (see: https://aur.archlinux.org/packages/html2text-with-utf8).
While not updated recently, the gcc-libs version is quite a bit more robust
and flexible (not to mention it actually provides a man page). See:
http://www.mbayer.de/html2text/
There are format shortcomings with the python version as well. One big one
being you cannot control the word wrap (--body-width) and prevent double
line-breaks after block elements. e.g. (--single-line-break requires
--body-width=0).
Is there any reason in particular Arch is packaging the python version
instead? If nothing else, is there any interest in at least renaming the
resulting executable to prevent direct conflict with the gcc-libs version.
('pyhtml2text' makes sense)
As an example, compare the output of:
(gcc-libs version)
$ curl -s http://www.statutes.legis.state.tx.us/Docs/TN/htm/TN.1.htm | html2text
-utf8
(python version)
$ curl -s http://www.statutes.legis.state.tx.us/Docs/TN/htm/TN.1.htm | html2text
This conflict can easily avoided in the python version with the rename:
$ diff -uNb --label PKGBUILD PKGBUILD.orig PKGBUILD
--- PKGBUILD
+++ PKGBUILD 2015-03-10 09:25:43.906168003 -0500
@@ -11,8 +11,8 @@
url="https://pypi.python.org/pypi/html2text/"
license=('GPL3')
depends=('python-setuptools')
-provides=('html2text')
-replaces=('html2text')
+provides=('pyhtml2text')
+replaces=('pyhtml2text')
source=(https://pypi.python.org/packages/source/h/html2text/html2text-$pkgver.tar.gz)
sha256sums=('c3977dfe6fd1ba0d4091f85963306488b3e9e236cfe60d8821158ce5a7fcb619')
@@ -29,5 +29,6 @@
package() {
cd "${srcdir}"/html2text-${pkgver}
python setup.py install --root="${pkgdir}"
+ mv "${pkgdir}"/usr/bin/html2text "${pkgdir}"/usr/bin/pyhtml2text
}
The install script could even check for the presence of /usr/bin/html2text,
and if absent, provide a soft-link.
I just found it very odd to find an executable in Arch named 'html2text' that
works completely different from the traditional 'html2text' found in many
distributions for years.
I don't know if there is any interest among the devs to rectify this. If
there is I'm happy to file a feature request, etc. Let me know. Thanks.
--
David C. Rankin, J.D.,P.E.
More information about the arch-general
mailing list