[arch-releng] Use pandoc instead of make-doc.sh
Dario Giovannetti
dariogiova at gmail.com
Mon May 28 09:47:54 EDT 2012
On 16/05/12 23:41, Dario Giovannetti wrote:
> On 15/05/12 22:40, Dieter Plaetinck wrote:
>> On Mon, 14 May 2012 23:32:35 +0200
>> Dario Giovannetti<dariogiova at gmail.com> wrote:
>>
>>> I would like to propose using pandoc (
>>> http://johnmacfarlane.net/pandoc/
>>> ) instead of the make-doc.sh script for converting the installation
>>> guide (Markdown syntax) to the document hosted in the wiki (MediaWiki
>>> syntax).
>>> I've tested it and the result looks pretty good, with only a few minor
>>> manual refinements required (which I volunteer to perform, if needed):
>>> instead of the current script, which practically produces an html
>>> document, we would get a correctly-formed, much neater MediaWiki
>>> document.
>>> This would greatly simplify further improvements in the wikification of
>>> the the guide, like the adaptation to ArchWiki's style standars.
>>>
>>> Thank you
>>>
>>> Dario
>>>
>> If you have a patch that I can apply, I'll try it out.
>> if the code using pandoc is more elegant, and the output result is
>> comparable (and/or better), i'm up for it.
>>
>> Dieter
> As requested, here's the patch: it's quite radical also because the
> previous script was creating a header that is not used any longer in
> the wiki. As you can see I've rewritten almost everything in Python,
> since I'm much more comfortable in that language with regular
> expressions; besides, that way the code is much more readable and
> flexible.
>
> NOTE 1: you will require to install the "pandoc" package, currently in
> the AUR: http://aur.archlinux.org/packages.php?ID=32490
>
> NOTE 2: the patch has been committed on the "develop" branch.
>
>
> From 593842cd1182ae0342efc9356477b16739641455 Mon Sep 17 00:00:00 2001
> From: Dario Giovannetti <dariogiova at gmail.com>
> Date: Wed, 16 May 2012 23:12:50 +0200
> Subject: [PATCH 50/50] revise the automatic procedure for creating the
> MediaWiki version of the installation guide
>
> ---
> README | 3 +-
> doc/official_installation_guide_en | 3 +-
> make-doc.sh | 49 ++-------------
> make_doc_fixes.py | 122
> ++++++++++++++++++++++++++++++++++++
> 4 files changed, 131 insertions(+), 46 deletions(-)
> create mode 100755 make_doc_fixes.py
>
> diff --git a/README b/README
> index 6981de7..35456cd 100644
> --- a/README
> +++ b/README
> @@ -19,7 +19,8 @@ Homepage: http://github.com/Dieterbe/aif
> - libui-sh
> - iproute2
> Optionally:
> - - markdown: to generate the html installation guide
> + - pandoc: to generate the MediaWiki installation guide
> + - python: to generate the MediaWiki installation guide
> - cryptsetup: for encryption support
> - lvm2: for LVM support
> - dhcpd: for dhcp networking support
> diff --git a/doc/official_installation_guide_en
> b/doc/official_installation_guide_en
> index 0faea07..5b2e580 100644
> --- a/doc/official_installation_guide_en
> +++ b/doc/official_installation_guide_en
> @@ -2,7 +2,7 @@
>
> General installation documentation for the Arch Linux distribution.
>
> -This guide is only valid for release 2010.05 or newer.
> +This guide is only valid for release 2011.08 or newer.
> This guide is maintained in [aif
> git](http://projects.archlinux.org/?p=aif.git)
> Git pull requests, patches, comments are welcome on the arch
> [releng mailing
> list](http://www.archlinux.org/mailman/listinfo/arch-releng)
> @@ -218,6 +218,7 @@ You can find more info on the wiki
> [Community contributed
> documentation](http://wiki.archlinux.org/index.php/Archiso-as-pxe-server)
>
> (this section could be a bit more elaborate)
> +
> ### Client
>
> Configure your system to try network booting (pxe) first.
> diff --git a/make-doc.sh b/make-doc.sh
> index 4e6c5a2..39914f5 100755
> --- a/make-doc.sh
> +++ b/make-doc.sh
> @@ -1,50 +1,11 @@
> #!/bin/sh
> -which markdown &>/dev/null || echo "Need markdown utility!" >&2
> +which pandoc &>/dev/null || echo "Need pandoc utility!" >&2
> +
> +echo "generating mediawiki document..."
>
> -echo "generating html..."
> for i in doc/official_installation_guide_??
> do
> echo $i
> - # convert markdown to html, convert html links to wiki ones.
> - cat $i | markdown | sed 's|<a
> href="\([^"]*\)"[^>]*>\([^<]*\)</a>|[\1 \2]|g' > $i.html
> - # turn code markup into a syntax that mediawiki understands
> - sed -i 's#<pre><code>#<pre>#g' $i.html
> - sed -i 's#</code></pre>#</pre>#g' $i.html
> -
> + # convert markdown to mediawiki and perform further adaptations
> + cat $i | pandoc -f markdown -t mediawiki | xargs -0
> ./make_doc_fixes.py $i > $i.mw
> done
> -
> -echo "adding special wiki thingies..."
> -
> -i=doc/official_installation_guide_en
> -echo $i
> -
> -
> -summary_begin='<p><strong>Article summary<\/strong><\/p>'
> -summary_end_plus_one='<p><strong>Related articles<\/strong><\/p>'
> -related_begin='<p><strong>Related articles<\/strong><\/p>'
> -related_end_plus_one='<h1>Introduction<\/h1>'
> -
> -summary=`sed -n "/$summary_begin/, /$summary_end_plus_one/p;" $i.html
> | sed "/$summary_begin/d; /$summary_end_plus_one/d"`
> -related=`sed -n "/$related_begin/, /$related_end_plus_one/p;" $i.html
> | sed "/$related_begin/d; /$related_end_plus_one/d"`
> -
> -# prepare $related for wikiing.
> -# note that like this we always keep the absulolute url's even if
> they are on the same subdomain eg: {{Article summary
> wiki|http://foo/bar bar}} (note).
> -# wiki renders absolute url a bit uglier. always having absolute
> url's is not needed if the page can be looked up on the same wiki, but
> like this it was simplest to implement..
> -related=`echo "$related"| sed -e 's#<p>\[\(.*\)\]
> \(.*\)<\/p>#{{Article summary wiki|\1}} \2#'`
> -
> -# preare $summary for wiiking: replace email address by nice mailto
> links
> -summary=`echo "$summary" | sed 's/\([^"|,
> ]*@[-A-Za-z0-9_.]*\)/[mailto:\1 \1]/'`
> -
> -
> -echo -e "[[Category:Getting and installing Arch
> (English)]]\n[[Category:HOWTOs (English)]]
> -[[Category:Accessibility (English)]]
> -[[Category:Website Resources]]
> -{{Article summary start}}\n{{Article summary text|
> 1=$summary}}\n{{Article summary heading|Available Languages}}\n
> -{{i18n_entry|English|Official Arch Linux Install Guide}}\n
> -{{Article summary heading|Related articles}}
> -$related
> -{{Article summary end}}" | cat - $i.html > $i.html.tmp && mv
> $i.html.tmp $i.html
> -
> -# remove summary and related articles from actual content
> -sed "/$summary_end_plus_one/p; /$summary_begin/,
> /$summary_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html
> -sed "/$related_end_plus_one/p; /$related_begin/,
> /$related_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html
> diff --git a/make_doc_fixes.py b/make_doc_fixes.py
> new file mode 100755
> index 0000000..a7200a0
> --- /dev/null
> +++ b/make_doc_fixes.py
> @@ -0,0 +1,122 @@
> +#!/usr/bin/env python3
> +"""
> +This script is not meant to be run as a standalone application,
> instead it is
> +called by make-doc.sh to perform further adaptations to the MediaWiki
> version
> +of the installation guide.
> +"""
> +
> +import sys
> +import re
> +
> +FILENAME = sys.argv[1]
> +INPUT = sys.argv[2]
> +
> +# Used in fix_multiline_list_items
> +LIST_REGEXP = "^([\:\*#]+.+<br(?: /)?>)( *\n)"
> +LIST_REPLACE = "\g<1>"
> +
> +# Used in wikify_internal_links
> +LINK_REGEXP = "\[{baseurl}([^\]\s]+?) ([^\]\n]+?)\]"
> +LINK_REPLACE = "[[\g<1>|\g<2>]]"
> +
> +# If a translation of the guide is added, a proper entry should be
> added to
> +# this dictionary; the key names must be 2-character language tags
> +LANGFIXES = {
> + "en": {
> + "baseurl": "https?://wiki\.archlinux\.org/index\.php/", #
> regexp
> + "header": """\
> +[[Category:Getting and installing Arch]]
> +[[fr:Guide officiel de l'installation]]
> +[[ro:Ghid de instalare oficial]]
> +{{i18n|Official Installation Guide}}
> +""", # string
> + "intro": """The Official Installation Guide is maintained in
> [http://projects.archlinux.org/aif.git/ aif.git].
> +
> +The version included with the latest
> [http://www.archlinux.org/download/ release] (2011.08.19) can be found
> [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en?id=13c8c0813328eb8f52b03b3c53a32f1f40558021
> here].
> +
> +The latest version can be found
> [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en
> here].
> +
> +The (unofficial) [[Beginners' Guide]] provides a thorough walkthrough
> of the the installation and configuration process.
> +
> +""",
> + "summary_heading": None, # must be None only for English
> + "summary": "'''Article summary'''", # string
> + "related": "'''Related articles'''", # string
> + "introduction": "= Introduction =", # string
> + },
> +}
> +
> +
> +def fix_multiline_list_items(text):
> + """
> + pandoc doesn't convert multiline list items correctly, so this
> function
> + compensates for that.
> + """
> + test = ""
> + # It's necessary to run this multiple times because of how the
> regular
> + # expression is designed
> + while text != test:
> + test = text
> + text = re.sub(LIST_REGEXP, LIST_REPLACE, text,
> flags=re.MULTILINE)
> + return text
> +
> +
> +def wikify_internal_links(text, patches):
> + """
> + Turns external links that point to the local subdomain into
> proper internal
> + links.
> + """
> + regexp = LINK_REGEXP.format(**patches)
> + text = re.sub(regexp, LINK_REPLACE, text)
> + return text
> +
> +
> +def insert_header(text, patches):
> + """
> + Inserts the standard article header.
> + """
> + text = patches["header"] + text
> + return text
> +
> +
> +def assemble_summary(text, patches):
> + """
> + Converts the article summary and related links into a standard
> summary
> + """
> + # NOTE: this function requires some fixes if more languages are
> added
> + part_a = text.partition(patches["summary"])
> + part_b = part_a[2].partition(patches["related"])
> + part_c = part_b[2].partition(patches["introduction"])
> + related_links = part_c[0].strip().split("\n")
> + summary_heading = ("|" + patches["summary_heading"]
> + if (patches["summary_heading"])
> + else "")
> + summary_text = part_b[0].strip()
> + related = "\n".join(["{{{{Article summary text|1={}}}}}".format(r)
> + for r in related_links])
> + summary = """{{{{Article summary start{}}}}}
> +{{{{Article summary text|1={}}}}}
> +{{{{Article summary heading|Related articles}}}}
> +{}
> +{{{{Article summary end}}}}
> +
> +""".format(summary_heading , summary_text, related)
> + text = part_a[0] + summary + patches["intro"] + part_c[1] +
> part_c[2]
> + return text
> +
> +
> +def main(filename, text):
> + """
> + Main function
> + """
> + language = filename[-2:]
> + text = fix_multiline_list_items(text)
> + if language in LANGFIXES:
> + patches = LANGFIXES[language]
> + text = wikify_internal_links(text, patches)
> + text = insert_header(text, patches)
> + text = assemble_summary(text, patches)
> + return text
> +
> +if __name__ == "__main__":
> + print(main(FILENAME, INPUT))
I've opened a bug report for this request with an _updated_ patch:
https://bugs.archlinux.org/task/30045?project=6
Patch: https://bugs.archlinux.org/task/30045?getfile=8834
Dario
More information about the arch-releng
mailing list