[arch-releng] Use pandoc instead of make-doc.sh

Dario Giovannetti dariogiova at gmail.com
Wed Jun 13 07:12:04 EDT 2012


On 28/05/12 15:45, Dario Giovannetti wrote:
> On 16/05/12 23:41, Dario Giovannetti wrote:
>> On 15/05/12 22:40, Dieter Plaetinck wrote:
>>> On Mon, 14 May 2012 23:32:35 +0200
>>> Dario Giovannetti<dariogiova at gmail.com>  wrote:
>>>
>>>> I would like to propose using pandoc ( 
>>>> http://johnmacfarlane.net/pandoc/
>>>> ) instead of the make-doc.sh script for converting the installation
>>>> guide (Markdown syntax) to the document hosted in the wiki (MediaWiki
>>>> syntax).
>>>> I've tested it and the result looks pretty good, with only a few minor
>>>> manual refinements required (which I volunteer to perform, if needed):
>>>> instead of the current script, which practically produces an html
>>>> document, we would get a correctly-formed, much neater MediaWiki 
>>>> document.
>>>> This would greatly simplify further improvements in the 
>>>> wikification of
>>>> the the guide, like the adaptation to ArchWiki's style standars.
>>>>
>>>> Thank you
>>>>
>>>> Dario
>>>>
>>> If you have a patch that I can apply, I'll try it out.
>>> if the code using pandoc is more elegant, and the output result is 
>>> comparable (and/or better), i'm up for it.
>>>
>>> Dieter
>> As requested, here's the patch: it's quite radical also because the 
>> previous script was creating a header that is not used any longer in 
>> the wiki. As you can see I've rewritten almost everything in Python, 
>> since I'm much more comfortable in that language with regular 
>> expressions; besides, that way the code is much more readable and 
>> flexible.
>>
>> NOTE 1: you will require to install the "pandoc" package, currently 
>> in the AUR: http://aur.archlinux.org/packages.php?ID=32490
>>
>> NOTE 2: the patch has been committed on the "develop" branch.
>>
>>
>> From 593842cd1182ae0342efc9356477b16739641455 Mon Sep 17 00:00:00 2001
>> From: Dario Giovannetti <dariogiova at gmail.com>
>> Date: Wed, 16 May 2012 23:12:50 +0200
>> Subject: [PATCH 50/50] revise the automatic procedure for creating the
>>  MediaWiki version of the installation guide
>>
>> ---
>>  README                             |    3 +-
>>  doc/official_installation_guide_en |    3 +-
>>  make-doc.sh                        |   49 ++-------------
>>  make_doc_fixes.py                  |  122 
>> ++++++++++++++++++++++++++++++++++++
>>  4 files changed, 131 insertions(+), 46 deletions(-)
>>  create mode 100755 make_doc_fixes.py
>>
>> diff --git a/README b/README
>> index 6981de7..35456cd 100644
>> --- a/README
>> +++ b/README
>> @@ -19,7 +19,8 @@ Homepage:    http://github.com/Dieterbe/aif
>>   - libui-sh
>>   - iproute2
>>  Optionally:
>> - - markdown: to generate the html installation guide
>> + - pandoc: to generate the MediaWiki installation guide
>> + - python: to generate the MediaWiki installation guide
>>   - cryptsetup: for encryption support
>>   - lvm2: for LVM support
>>   - dhcpd: for dhcp networking support
>> diff --git a/doc/official_installation_guide_en 
>> b/doc/official_installation_guide_en
>> index 0faea07..5b2e580 100644
>> --- a/doc/official_installation_guide_en
>> +++ b/doc/official_installation_guide_en
>> @@ -2,7 +2,7 @@
>>
>>  General installation documentation for the Arch Linux distribution.
>>
>> -This guide is only valid for release 2010.05 or newer.
>> +This guide is only valid for release 2011.08 or newer.
>>  This guide is maintained in [aif 
>> git](http://projects.archlinux.org/?p=aif.git)
>>  Git pull requests, patches, comments are welcome on the arch
>>  [releng mailing 
>> list](http://www.archlinux.org/mailman/listinfo/arch-releng)
>> @@ -218,6 +218,7 @@ You can find more info on the wiki
>>  [Community contributed 
>> documentation](http://wiki.archlinux.org/index.php/Archiso-as-pxe-server)
>>
>>  (this section could be a bit more elaborate)
>> +
>>  ### Client
>>
>>  Configure your system to try network booting (pxe) first.
>> diff --git a/make-doc.sh b/make-doc.sh
>> index 4e6c5a2..39914f5 100755
>> --- a/make-doc.sh
>> +++ b/make-doc.sh
>> @@ -1,50 +1,11 @@
>>  #!/bin/sh
>> -which markdown &>/dev/null || echo "Need markdown utility!" >&2
>> +which pandoc &>/dev/null || echo "Need pandoc utility!" >&2
>> +
>> +echo "generating mediawiki document..."
>>
>> -echo "generating html..."
>>  for i in doc/official_installation_guide_??
>>  do
>>      echo $i
>> -    # convert markdown to html, convert html links to wiki ones.
>> -    cat $i | markdown | sed 's|<a 
>> href="\([^"]*\)"[^>]*>\([^<]*\)</a>|[\1 \2]|g' > $i.html
>> -    # turn code markup into a syntax that mediawiki understands
>> -    sed -i 's#<pre><code>#<pre>#g' $i.html
>> -    sed -i 's#</code></pre>#</pre>#g' $i.html
>> -
>> +    # convert markdown to mediawiki and perform further adaptations
>> +    cat $i | pandoc -f markdown -t mediawiki | xargs -0 
>> ./make_doc_fixes.py $i > $i.mw
>>  done
>> -
>> -echo "adding special wiki thingies..."
>> -
>> -i=doc/official_installation_guide_en
>> -echo $i
>> -
>> -
>> -summary_begin='<p><strong>Article summary<\/strong><\/p>'
>> -summary_end_plus_one='<p><strong>Related articles<\/strong><\/p>'
>> -related_begin='<p><strong>Related articles<\/strong><\/p>'
>> -related_end_plus_one='<h1>Introduction<\/h1>'
>> -
>> -summary=`sed -n "/$summary_begin/, /$summary_end_plus_one/p;" 
>> $i.html | sed "/$summary_begin/d; /$summary_end_plus_one/d"`
>> -related=`sed -n "/$related_begin/, /$related_end_plus_one/p;" 
>> $i.html | sed "/$related_begin/d; /$related_end_plus_one/d"`
>> -
>> -# prepare $related for wikiing.
>> -# note that like this we always keep the absulolute url's even if 
>> they are on the same subdomain eg: {{Article summary 
>> wiki|http://foo/bar bar}} (note).
>> -# wiki renders absolute url a bit uglier.  always having absolute 
>> url's is not needed if the page can be looked up on the same wiki, 
>> but like this it was simplest to implement..
>> -related=`echo "$related"| sed -e 's#<p>\[\(.*\)\] 
>> \(.*\)<\/p>#{{Article summary wiki|\1}} \2#'`
>> -
>> -# preare $summary for wiiking: replace email address by nice mailto 
>> links
>> -summary=`echo "$summary" | sed 's/\([^"|, 
>> ]*@[-A-Za-z0-9_.]*\)/[mailto:\1 \1]/'`
>> -
>> -
>> -echo -e "[[Category:Getting and installing Arch 
>> (English)]]\n[[Category:HOWTOs (English)]]
>> -[[Category:Accessibility (English)]]
>> -[[Category:Website Resources]]
>> -{{Article summary start}}\n{{Article summary text| 
>> 1=$summary}}\n{{Article summary heading|Available Languages}}\n
>> -{{i18n_entry|English|Official Arch Linux Install Guide}}\n
>> -{{Article summary heading|Related articles}}
>> -$related
>> -{{Article summary end}}" | cat - $i.html > $i.html.tmp && mv 
>> $i.html.tmp $i.html
>> -
>> -# remove summary and related articles from actual content
>> -sed "/$summary_end_plus_one/p; /$summary_begin/, 
>> /$summary_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp 
>> $i.html
>> -sed "/$related_end_plus_one/p; /$related_begin/, 
>> /$related_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp 
>> $i.html
>> diff --git a/make_doc_fixes.py b/make_doc_fixes.py
>> new file mode 100755
>> index 0000000..a7200a0
>> --- /dev/null
>> +++ b/make_doc_fixes.py
>> @@ -0,0 +1,122 @@
>> +#!/usr/bin/env python3
>> +"""
>> +This script is not meant to be run as a standalone application, 
>> instead it is
>> +called by make-doc.sh to perform further adaptations to the 
>> MediaWiki version
>> +of the installation guide.
>> +"""
>> +
>> +import sys
>> +import re
>> +
>> +FILENAME = sys.argv[1]
>> +INPUT = sys.argv[2]
>> +
>> +# Used in fix_multiline_list_items
>> +LIST_REGEXP = "^([\:\*#]+.+<br(?: /)?>)( *\n)"
>> +LIST_REPLACE = "\g<1>"
>> +
>> +# Used in wikify_internal_links
>> +LINK_REGEXP = "\[{baseurl}([^\]\s]+?) ([^\]\n]+?)\]"
>> +LINK_REPLACE = "[[\g<1>|\g<2>]]"
>> +
>> +# If a translation of the guide is added, a proper entry should be 
>> added to
>> +# this dictionary; the key names must be 2-character language tags
>> +LANGFIXES = {
>> +    "en": {
>> +        "baseurl": "https?://wiki\.archlinux\.org/index\.php/",  # 
>> regexp
>> +        "header": """\
>> +[[Category:Getting and installing Arch]]
>> +[[fr:Guide officiel de l'installation]]
>> +[[ro:Ghid de instalare oficial]]
>> +{{i18n|Official Installation Guide}}
>> +""",  # string
>> +        "intro": """The Official Installation Guide is maintained in 
>> [http://projects.archlinux.org/aif.git/ aif.git].
>> +
>> +The version included with the latest 
>> [http://www.archlinux.org/download/ release] (2011.08.19) can be 
>> found 
>> [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en?id=13c8c0813328eb8f52b03b3c53a32f1f40558021 
>> here].
>> +
>> +The latest version can be found 
>> [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en 
>> here].
>> +
>> +The (unofficial) [[Beginners' Guide]] provides a thorough 
>> walkthrough of the the installation and configuration process.
>> +
>> +""",
>> +        "summary_heading": None,  # must be None only for English
>> +        "summary": "'''Article summary'''",  # string
>> +        "related": "'''Related articles'''",  # string
>> +        "introduction": "= Introduction =",  # string
>> +    },
>> +}
>> +
>> +
>> +def fix_multiline_list_items(text):
>> +    """
>> +    pandoc doesn't convert multiline list items correctly, so this 
>> function
>> +    compensates for that.
>> +    """
>> +    test = ""
>> +    # It's necessary to run this multiple times because of how the 
>> regular
>> +    # expression is designed
>> +    while text != test:
>> +        test = text
>> +        text = re.sub(LIST_REGEXP, LIST_REPLACE, text, 
>> flags=re.MULTILINE)
>> +    return text
>> +
>> +
>> +def wikify_internal_links(text, patches):
>> +    """
>> +    Turns external links that point to the local subdomain into 
>> proper internal
>> +    links.
>> +    """
>> +    regexp = LINK_REGEXP.format(**patches)
>> +    text = re.sub(regexp, LINK_REPLACE, text)
>> +    return text
>> +
>> +
>> +def insert_header(text, patches):
>> +    """
>> +    Inserts the standard article header.
>> +    """
>> +    text = patches["header"] + text
>> +    return text
>> +
>> +
>> +def assemble_summary(text, patches):
>> +    """
>> +    Converts the article summary and related links into a standard 
>> summary
>> +    """
>> +    # NOTE: this function requires some fixes if more languages are 
>> added
>> +    part_a = text.partition(patches["summary"])
>> +    part_b = part_a[2].partition(patches["related"])
>> +    part_c = part_b[2].partition(patches["introduction"])
>> +    related_links = part_c[0].strip().split("\n")
>> +    summary_heading = ("|" + patches["summary_heading"]
>> +                       if (patches["summary_heading"])
>> +                       else "")
>> +    summary_text = part_b[0].strip()
>> +    related = "\n".join(["{{{{Article summary text|1={}}}}}".format(r)
>> +                         for r in related_links])
>> +    summary = """{{{{Article summary start{}}}}}
>> +{{{{Article summary text|1={}}}}}
>> +{{{{Article summary heading|Related articles}}}}
>> +{}
>> +{{{{Article summary end}}}}
>> +
>> +""".format(summary_heading , summary_text, related)
>> +    text = part_a[0] + summary + patches["intro"] + part_c[1] + 
>> part_c[2]
>> +    return text
>> +
>> +
>> +def main(filename, text):
>> +    """
>> +    Main function
>> +    """
>> +    language = filename[-2:]
>> +    text = fix_multiline_list_items(text)
>> +    if language in LANGFIXES:
>> +        patches = LANGFIXES[language]
>> +        text = wikify_internal_links(text, patches)
>> +        text = insert_header(text, patches)
>> +        text = assemble_summary(text, patches)
>> +    return text
>> +
>> +if __name__ == "__main__":
>> +    print(main(FILENAME, INPUT))
>
> I've opened a bug report for this request with an _updated_ patch: 
> https://bugs.archlinux.org/task/30045?project=6
>
> Patch: https://bugs.archlinux.org/task/30045?getfile=8834
>
> Dario
>
I've updated the patch to the latest ArchWiki internationalization 
standards:

https://bugs.archlinux.org/task/30045?getfile=8909

Dario



More information about the arch-releng mailing list