[arch-releng] Use pandoc instead of make-doc.sh

Dario Giovannetti dariogiova at gmail.com
Mon May 28 09:47:54 EDT 2012


On 16/05/12 23:41, Dario Giovannetti wrote:
> On 15/05/12 22:40, Dieter Plaetinck wrote:
>> On Mon, 14 May 2012 23:32:35 +0200
>> Dario Giovannetti<dariogiova at gmail.com>  wrote:
>>
>>> I would like to propose using pandoc (
>>> http://johnmacfarlane.net/pandoc/
>>> ) instead of the make-doc.sh script for converting the installation
>>> guide (Markdown syntax) to the document hosted in the wiki (MediaWiki
>>> syntax).
>>> I've tested it and the result looks pretty good, with only a few minor
>>> manual refinements required (which I volunteer to perform, if needed):
>>> instead of the current script, which practically produces an html
>>> document, we would get a correctly-formed, much neater MediaWiki
>>> document.
>>> This would greatly simplify further improvements in the wikification of
>>> the the guide, like the adaptation to ArchWiki's style standars.
>>>
>>> Thank you
>>>
>>> Dario
>>>
>> If you have a patch that I can apply, I'll try it out.
>> if the code using pandoc is more elegant, and the output result is
>> comparable (and/or better), i'm up for it.
>>
>> Dieter
> As requested, here's the patch: it's quite radical also because the
> previous script was creating a header that is not used any longer in
> the wiki. As you can see I've rewritten almost everything in Python,
> since I'm much more comfortable in that language with regular
> expressions; besides, that way the code is much more readable and
> flexible.
>
> NOTE 1: you will require to install the "pandoc" package, currently in
> the AUR: http://aur.archlinux.org/packages.php?ID=32490
>
> NOTE 2: the patch has been committed on the "develop" branch.
>
>
> From 593842cd1182ae0342efc9356477b16739641455 Mon Sep 17 00:00:00 2001
> From: Dario Giovannetti <dariogiova at gmail.com>
> Date: Wed, 16 May 2012 23:12:50 +0200
> Subject: [PATCH 50/50] revise the automatic procedure for creating the
>  MediaWiki version of the installation guide
>
> ---
>  README                             |    3 +-
>  doc/official_installation_guide_en |    3 +-
>  make-doc.sh                        |   49 ++-------------
>  make_doc_fixes.py                  |  122
> ++++++++++++++++++++++++++++++++++++
>  4 files changed, 131 insertions(+), 46 deletions(-)
>  create mode 100755 make_doc_fixes.py
>
> diff --git a/README b/README
> index 6981de7..35456cd 100644
> --- a/README
> +++ b/README
> @@ -19,7 +19,8 @@ Homepage:    http://github.com/Dieterbe/aif
>   - libui-sh
>   - iproute2
>  Optionally:
> - - markdown: to generate the html installation guide
> + - pandoc: to generate the MediaWiki installation guide
> + - python: to generate the MediaWiki installation guide
>   - cryptsetup: for encryption support
>   - lvm2: for LVM support
>   - dhcpd: for dhcp networking support
> diff --git a/doc/official_installation_guide_en
> b/doc/official_installation_guide_en
> index 0faea07..5b2e580 100644
> --- a/doc/official_installation_guide_en
> +++ b/doc/official_installation_guide_en
> @@ -2,7 +2,7 @@
>
>  General installation documentation for the Arch Linux distribution.
>
> -This guide is only valid for release 2010.05 or newer.
> +This guide is only valid for release 2011.08 or newer.
>  This guide is maintained in [aif
> git](http://projects.archlinux.org/?p=aif.git)
>  Git pull requests, patches, comments are welcome on the arch
>  [releng mailing
> list](http://www.archlinux.org/mailman/listinfo/arch-releng)
> @@ -218,6 +218,7 @@ You can find more info on the wiki
>  [Community contributed
> documentation](http://wiki.archlinux.org/index.php/Archiso-as-pxe-server)
>
>  (this section could be a bit more elaborate)
> +
>  ### Client
>
>  Configure your system to try network booting (pxe) first.
> diff --git a/make-doc.sh b/make-doc.sh
> index 4e6c5a2..39914f5 100755
> --- a/make-doc.sh
> +++ b/make-doc.sh
> @@ -1,50 +1,11 @@
>  #!/bin/sh
> -which markdown &>/dev/null || echo "Need markdown utility!" >&2
> +which pandoc &>/dev/null || echo "Need pandoc utility!" >&2
> +
> +echo "generating mediawiki document..."
>
> -echo "generating html..."
>  for i in doc/official_installation_guide_??
>  do
>      echo $i
> -    # convert markdown to html, convert html links to wiki ones.
> -    cat $i | markdown | sed 's|<a
> href="\([^"]*\)"[^>]*>\([^<]*\)</a>|[\1 \2]|g' > $i.html
> -    # turn code markup into a syntax that mediawiki understands
> -    sed -i 's#<pre><code>#<pre>#g' $i.html
> -    sed -i 's#</code></pre>#</pre>#g' $i.html
> -
> +    # convert markdown to mediawiki and perform further adaptations
> +    cat $i | pandoc -f markdown -t mediawiki | xargs -0
> ./make_doc_fixes.py $i > $i.mw
>  done
> -
> -echo "adding special wiki thingies..."
> -
> -i=doc/official_installation_guide_en
> -echo $i
> -
> -
> -summary_begin='<p><strong>Article summary<\/strong><\/p>'
> -summary_end_plus_one='<p><strong>Related articles<\/strong><\/p>'
> -related_begin='<p><strong>Related articles<\/strong><\/p>'
> -related_end_plus_one='<h1>Introduction<\/h1>'
> -
> -summary=`sed -n "/$summary_begin/, /$summary_end_plus_one/p;" $i.html
> | sed "/$summary_begin/d; /$summary_end_plus_one/d"`
> -related=`sed -n "/$related_begin/, /$related_end_plus_one/p;" $i.html
> | sed "/$related_begin/d; /$related_end_plus_one/d"`
> -
> -# prepare $related for wikiing.
> -# note that like this we always keep the absulolute url's even if
> they are on the same subdomain eg: {{Article summary
> wiki|http://foo/bar bar}} (note).
> -# wiki renders absolute url a bit uglier.  always having absolute
> url's is not needed if the page can be looked up on the same wiki, but
> like this it was simplest to implement..
> -related=`echo "$related"| sed -e 's#<p>\[\(.*\)\]
> \(.*\)<\/p>#{{Article summary wiki|\1}} \2#'`
> -
> -# preare $summary for wiiking: replace email address by nice mailto
> links
> -summary=`echo "$summary" | sed 's/\([^"|,
> ]*@[-A-Za-z0-9_.]*\)/[mailto:\1 \1]/'`
> -
> -
> -echo -e "[[Category:Getting and installing Arch
> (English)]]\n[[Category:HOWTOs (English)]]
> -[[Category:Accessibility (English)]]
> -[[Category:Website Resources]]
> -{{Article summary start}}\n{{Article summary text|
> 1=$summary}}\n{{Article summary heading|Available Languages}}\n
> -{{i18n_entry|English|Official Arch Linux Install Guide}}\n
> -{{Article summary heading|Related articles}}
> -$related
> -{{Article summary end}}" | cat - $i.html > $i.html.tmp && mv
> $i.html.tmp $i.html
> -
> -# remove summary and related articles from actual content
> -sed "/$summary_end_plus_one/p; /$summary_begin/,
> /$summary_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html
> -sed "/$related_end_plus_one/p; /$related_begin/,
> /$related_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html
> diff --git a/make_doc_fixes.py b/make_doc_fixes.py
> new file mode 100755
> index 0000000..a7200a0
> --- /dev/null
> +++ b/make_doc_fixes.py
> @@ -0,0 +1,122 @@
> +#!/usr/bin/env python3
> +"""
> +This script is not meant to be run as a standalone application,
> instead it is
> +called by make-doc.sh to perform further adaptations to the MediaWiki
> version
> +of the installation guide.
> +"""
> +
> +import sys
> +import re
> +
> +FILENAME = sys.argv[1]
> +INPUT = sys.argv[2]
> +
> +# Used in fix_multiline_list_items
> +LIST_REGEXP = "^([\:\*#]+.+<br(?: /)?>)( *\n)"
> +LIST_REPLACE = "\g<1>"
> +
> +# Used in wikify_internal_links
> +LINK_REGEXP = "\[{baseurl}([^\]\s]+?) ([^\]\n]+?)\]"
> +LINK_REPLACE = "[[\g<1>|\g<2>]]"
> +
> +# If a translation of the guide is added, a proper entry should be
> added to
> +# this dictionary; the key names must be 2-character language tags
> +LANGFIXES = {
> +    "en": {
> +        "baseurl": "https?://wiki\.archlinux\.org/index\.php/",  #
> regexp
> +        "header": """\
> +[[Category:Getting and installing Arch]]
> +[[fr:Guide officiel de l'installation]]
> +[[ro:Ghid de instalare oficial]]
> +{{i18n|Official Installation Guide}}
> +""",  # string
> +        "intro": """The Official Installation Guide is maintained in
> [http://projects.archlinux.org/aif.git/ aif.git].
> +
> +The version included with the latest
> [http://www.archlinux.org/download/ release] (2011.08.19) can be found
> [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en?id=13c8c0813328eb8f52b03b3c53a32f1f40558021
> here].
> +
> +The latest version can be found
> [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en
> here].
> +
> +The (unofficial) [[Beginners' Guide]] provides a thorough walkthrough
> of the the installation and configuration process.
> +
> +""",
> +        "summary_heading": None,  # must be None only for English
> +        "summary": "'''Article summary'''",  # string
> +        "related": "'''Related articles'''",  # string
> +        "introduction": "= Introduction =",  # string
> +    },
> +}
> +
> +
> +def fix_multiline_list_items(text):
> +    """
> +    pandoc doesn't convert multiline list items correctly, so this
> function
> +    compensates for that.
> +    """
> +    test = ""
> +    # It's necessary to run this multiple times because of how the
> regular
> +    # expression is designed
> +    while text != test:
> +        test = text
> +        text = re.sub(LIST_REGEXP, LIST_REPLACE, text,
> flags=re.MULTILINE)
> +    return text
> +
> +
> +def wikify_internal_links(text, patches):
> +    """
> +    Turns external links that point to the local subdomain into
> proper internal
> +    links.
> +    """
> +    regexp = LINK_REGEXP.format(**patches)
> +    text = re.sub(regexp, LINK_REPLACE, text)
> +    return text
> +
> +
> +def insert_header(text, patches):
> +    """
> +    Inserts the standard article header.
> +    """
> +    text = patches["header"] + text
> +    return text
> +
> +
> +def assemble_summary(text, patches):
> +    """
> +    Converts the article summary and related links into a standard
> summary
> +    """
> +    # NOTE: this function requires some fixes if more languages are
> added
> +    part_a = text.partition(patches["summary"])
> +    part_b = part_a[2].partition(patches["related"])
> +    part_c = part_b[2].partition(patches["introduction"])
> +    related_links = part_c[0].strip().split("\n")
> +    summary_heading = ("|" + patches["summary_heading"]
> +                       if (patches["summary_heading"])
> +                       else "")
> +    summary_text = part_b[0].strip()
> +    related = "\n".join(["{{{{Article summary text|1={}}}}}".format(r)
> +                         for r in related_links])
> +    summary = """{{{{Article summary start{}}}}}
> +{{{{Article summary text|1={}}}}}
> +{{{{Article summary heading|Related articles}}}}
> +{}
> +{{{{Article summary end}}}}
> +
> +""".format(summary_heading , summary_text, related)
> +    text = part_a[0] + summary + patches["intro"] + part_c[1] +
> part_c[2]
> +    return text
> +
> +
> +def main(filename, text):
> +    """
> +    Main function
> +    """
> +    language = filename[-2:]
> +    text = fix_multiline_list_items(text)
> +    if language in LANGFIXES:
> +        patches = LANGFIXES[language]
> +        text = wikify_internal_links(text, patches)
> +        text = insert_header(text, patches)
> +        text = assemble_summary(text, patches)
> +    return text
> +
> +if __name__ == "__main__":
> +    print(main(FILENAME, INPUT))

I've opened a bug report for this request with an _updated_ patch: 
https://bugs.archlinux.org/task/30045?project=6

Patch: https://bugs.archlinux.org/task/30045?getfile=8834

Dario



More information about the arch-releng mailing list