[arch-general] Conflict python-html2text/html2text (both /usr/bin/html2text)?

David C. Rankin drankinatty at suddenlinkmail.com
Tue Mar 10 14:54:10 UTC 2015


All,

   I have been working on a project that retrieves and parses statutes basically 
making use of the piped processes 'curl -s url | html2text -utf8' with the 
read-end of the second pipe passed to getline. I moved the code from my laptop 
(suse) to my servers (Arch) and the pipe process broke. Poking around, I found 
it was due to Arch packaging python-html2text (as /usr/bin/html2text) instead of 
the gcc-libs version (see: https://aur.archlinux.org/packages/html2text-with-utf8).

   While not updated recently, the gcc-libs version is quite a bit more robust 
and flexible (not to mention it actually provides a man page). See:

http://www.mbayer.de/html2text/

   There are format shortcomings with the python version as well. One big one 
being you cannot control the word wrap (--body-width) and prevent double 
line-breaks after block elements. e.g. (--single-line-break requires 
--body-width=0).

   Is there any reason in particular Arch is packaging the python version 
instead? If nothing else, is there any interest in at least renaming the 
resulting executable to prevent direct conflict with the gcc-libs version. 
('pyhtml2text' makes sense)

   As an example, compare the output of:

(gcc-libs version)

$ curl -s http://www.statutes.legis.state.tx.us/Docs/TN/htm/TN.1.htm | html2text 
-utf8

(python version)

$ curl -s http://www.statutes.legis.state.tx.us/Docs/TN/htm/TN.1.htm | html2text

   This conflict can easily avoided in the python version with the rename:

$ diff -uNb --label PKGBUILD PKGBUILD.orig PKGBUILD
--- PKGBUILD
+++ PKGBUILD    2015-03-10 09:25:43.906168003 -0500
@@ -11,8 +11,8 @@
  url="https://pypi.python.org/pypi/html2text/"
  license=('GPL3')
  depends=('python-setuptools')
-provides=('html2text')
-replaces=('html2text')
+provides=('pyhtml2text')
+replaces=('pyhtml2text')
 
source=(https://pypi.python.org/packages/source/h/html2text/html2text-$pkgver.tar.gz)
  sha256sums=('c3977dfe6fd1ba0d4091f85963306488b3e9e236cfe60d8821158ce5a7fcb619')

@@ -29,5 +29,6 @@
  package() {
    cd "${srcdir}"/html2text-${pkgver}
    python setup.py install --root="${pkgdir}"
+  mv "${pkgdir}"/usr/bin/html2text "${pkgdir}"/usr/bin/pyhtml2text
  }

   The install script could even check for the presence of /usr/bin/html2text, 
and if absent, provide a soft-link.

   I just found it very odd to find an executable in Arch named 'html2text' that 
works completely different from the traditional 'html2text' found in many 
distributions for years.

   I don't know if there is any interest among the devs to rectify this. If 
there is I'm happy to file a feature request, etc. Let me know. Thanks.

-- 
David C. Rankin, J.D.,P.E.


More information about the arch-general mailing list