On 16/05/12 23:41, Dario Giovannetti wrote:
On Mon, 14 May 2012 23:32:35 +0200 Dario Giovannetti<dariogiova@gmail.com> wrote:
I would like to propose using pandoc ( http://johnmacfarlane.net/pandoc/ ) instead of the make-doc.sh script for converting the installation guide (Markdown syntax) to the document hosted in the wiki (MediaWiki syntax). I've tested it and the result looks pretty good, with only a few minor manual refinements required (which I volunteer to perform, if needed): instead of the current script, which practically produces an html document, we would get a correctly-formed, much neater MediaWiki document. This would greatly simplify further improvements in the wikification of the the guide, like the adaptation to ArchWiki's style standars.
Thank you
Dario
If you have a patch that I can apply, I'll try it out. if the code using pandoc is more elegant, and the output result is comparable (and/or better), i'm up for it.
Dieter As requested, here's the patch: it's quite radical also because the
On 15/05/12 22:40, Dieter Plaetinck wrote: previous script was creating a header that is not used any longer in the wiki. As you can see I've rewritten almost everything in Python, since I'm much more comfortable in that language with regular expressions; besides, that way the code is much more readable and flexible.
NOTE 1: you will require to install the "pandoc" package, currently in the AUR: http://aur.archlinux.org/packages.php?ID=32490
NOTE 2: the patch has been committed on the "develop" branch.
From 593842cd1182ae0342efc9356477b16739641455 Mon Sep 17 00:00:00 2001 From: Dario Giovannetti <dariogiova@gmail.com> Date: Wed, 16 May 2012 23:12:50 +0200 Subject: [PATCH 50/50] revise the automatic procedure for creating the MediaWiki version of the installation guide
--- README | 3 +- doc/official_installation_guide_en | 3 +- make-doc.sh | 49 ++------------- make_doc_fixes.py | 122 ++++++++++++++++++++++++++++++++++++ 4 files changed, 131 insertions(+), 46 deletions(-) create mode 100755 make_doc_fixes.py
diff --git a/README b/README index 6981de7..35456cd 100644 --- a/README +++ b/README @@ -19,7 +19,8 @@ Homepage: http://github.com/Dieterbe/aif - libui-sh - iproute2 Optionally: - - markdown: to generate the html installation guide + - pandoc: to generate the MediaWiki installation guide + - python: to generate the MediaWiki installation guide - cryptsetup: for encryption support - lvm2: for LVM support - dhcpd: for dhcp networking support diff --git a/doc/official_installation_guide_en b/doc/official_installation_guide_en index 0faea07..5b2e580 100644 --- a/doc/official_installation_guide_en +++ b/doc/official_installation_guide_en @@ -2,7 +2,7 @@
General installation documentation for the Arch Linux distribution.
-This guide is only valid for release 2010.05 or newer. +This guide is only valid for release 2011.08 or newer. This guide is maintained in [aif git](http://projects.archlinux.org/?p=aif.git) Git pull requests, patches, comments are welcome on the arch [releng mailing list](http://www.archlinux.org/mailman/listinfo/arch-releng) @@ -218,6 +218,7 @@ You can find more info on the wiki [Community contributed documentation](http://wiki.archlinux.org/index.php/Archiso-as-pxe-server)
(this section could be a bit more elaborate) + ### Client
Configure your system to try network booting (pxe) first. diff --git a/make-doc.sh b/make-doc.sh index 4e6c5a2..39914f5 100755 --- a/make-doc.sh +++ b/make-doc.sh @@ -1,50 +1,11 @@ #!/bin/sh -which markdown &>/dev/null || echo "Need markdown utility!" >&2 +which pandoc &>/dev/null || echo "Need pandoc utility!" >&2 + +echo "generating mediawiki document..."
-echo "generating html..." for i in doc/official_installation_guide_?? do echo $i - # convert markdown to html, convert html links to wiki ones. - cat $i | markdown | sed 's|<a href="\([^"]*\)"[^>]*>\([^<]*\)</a>|[\1 \2]|g' > $i.html - # turn code markup into a syntax that mediawiki understands - sed -i 's#<pre><code>#<pre>#g' $i.html - sed -i 's#</code></pre>#</pre>#g' $i.html - + # convert markdown to mediawiki and perform further adaptations + cat $i | pandoc -f markdown -t mediawiki | xargs -0 ./make_doc_fixes.py $i > $i.mw done - -echo "adding special wiki thingies..." - -i=doc/official_installation_guide_en -echo $i - - -summary_begin='<p><strong>Article summary<\/strong><\/p>' -summary_end_plus_one='<p><strong>Related articles<\/strong><\/p>' -related_begin='<p><strong>Related articles<\/strong><\/p>' -related_end_plus_one='<h1>Introduction<\/h1>' - -summary=`sed -n "/$summary_begin/, /$summary_end_plus_one/p;" $i.html | sed "/$summary_begin/d; /$summary_end_plus_one/d"` -related=`sed -n "/$related_begin/, /$related_end_plus_one/p;" $i.html | sed "/$related_begin/d; /$related_end_plus_one/d"` - -# prepare $related for wikiing. -# note that like this we always keep the absulolute url's even if they are on the same subdomain eg: {{Article summary wiki|http://foo/bar bar}} (note). -# wiki renders absolute url a bit uglier. always having absolute url's is not needed if the page can be looked up on the same wiki, but like this it was simplest to implement.. -related=`echo "$related"| sed -e 's#<p>\[\(.*\)\] \(.*\)<\/p>#{{Article summary wiki|\1}} \2#'` - -# preare $summary for wiiking: replace email address by nice mailto links -summary=`echo "$summary" | sed 's/\([^"|, ]*@[-A-Za-z0-9_.]*\)/[mailto:\1 \1]/'` - - -echo -e "[[Category:Getting and installing Arch (English)]]\n[[Category:HOWTOs (English)]] -[[Category:Accessibility (English)]] -[[Category:Website Resources]] -{{Article summary start}}\n{{Article summary text| 1=$summary}}\n{{Article summary heading|Available Languages}}\n -{{i18n_entry|English|Official Arch Linux Install Guide}}\n -{{Article summary heading|Related articles}} -$related -{{Article summary end}}" | cat - $i.html > $i.html.tmp && mv $i.html.tmp $i.html - -# remove summary and related articles from actual content -sed "/$summary_end_plus_one/p; /$summary_begin/, /$summary_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html -sed "/$related_end_plus_one/p; /$related_begin/, /$related_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html diff --git a/make_doc_fixes.py b/make_doc_fixes.py new file mode 100755 index 0000000..a7200a0 --- /dev/null +++ b/make_doc_fixes.py @@ -0,0 +1,122 @@ +#!/usr/bin/env python3 +""" +This script is not meant to be run as a standalone application, instead it is +called by make-doc.sh to perform further adaptations to the MediaWiki version +of the installation guide. +""" + +import sys +import re + +FILENAME = sys.argv[1] +INPUT = sys.argv[2] + +# Used in fix_multiline_list_items +LIST_REGEXP = "^([\:\*#]+.+<br(?: /)?>)( *\n)" +LIST_REPLACE = "\g<1>" + +# Used in wikify_internal_links +LINK_REGEXP = "\[{baseurl}([^\]\s]+?) ([^\]\n]+?)\]" +LINK_REPLACE = "[[\g<1>|\g<2>]]" + +# If a translation of the guide is added, a proper entry should be added to +# this dictionary; the key names must be 2-character language tags +LANGFIXES = { + "en": { + "baseurl": "https?://wiki\.archlinux\.org/index\.php/", # regexp + "header": """\ +[[Category:Getting and installing Arch]] +[[fr:Guide officiel de l'installation]] +[[ro:Ghid de instalare oficial]] +{{i18n|Official Installation Guide}} +""", # string + "intro": """The Official Installation Guide is maintained in [http://projects.archlinux.org/aif.git/ aif.git]. + +The version included with the latest [http://www.archlinux.org/download/ release] (2011.08.19) can be found [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_... here]. + +The latest version can be found [http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_... here]. + +The (unofficial) [[Beginners' Guide]] provides a thorough walkthrough of the the installation and configuration process. + +""", + "summary_heading": None, # must be None only for English + "summary": "'''Article summary'''", # string + "related": "'''Related articles'''", # string + "introduction": "= Introduction =", # string + }, +} + + +def fix_multiline_list_items(text): + """ + pandoc doesn't convert multiline list items correctly, so this function + compensates for that. + """ + test = "" + # It's necessary to run this multiple times because of how the regular + # expression is designed + while text != test: + test = text + text = re.sub(LIST_REGEXP, LIST_REPLACE, text, flags=re.MULTILINE) + return text + + +def wikify_internal_links(text, patches): + """ + Turns external links that point to the local subdomain into proper internal + links. + """ + regexp = LINK_REGEXP.format(**patches) + text = re.sub(regexp, LINK_REPLACE, text) + return text + + +def insert_header(text, patches): + """ + Inserts the standard article header. + """ + text = patches["header"] + text + return text + + +def assemble_summary(text, patches): + """ + Converts the article summary and related links into a standard summary + """ + # NOTE: this function requires some fixes if more languages are added + part_a = text.partition(patches["summary"]) + part_b = part_a[2].partition(patches["related"]) + part_c = part_b[2].partition(patches["introduction"]) + related_links = part_c[0].strip().split("\n") + summary_heading = ("|" + patches["summary_heading"] + if (patches["summary_heading"]) + else "") + summary_text = part_b[0].strip() + related = "\n".join(["{{{{Article summary text|1={}}}}}".format(r) + for r in related_links]) + summary = """{{{{Article summary start{}}}}} +{{{{Article summary text|1={}}}}} +{{{{Article summary heading|Related articles}}}} +{} +{{{{Article summary end}}}} + +""".format(summary_heading , summary_text, related) + text = part_a[0] + summary + patches["intro"] + part_c[1] + part_c[2] + return text + + +def main(filename, text): + """ + Main function + """ + language = filename[-2:] + text = fix_multiline_list_items(text) + if language in LANGFIXES: + patches = LANGFIXES[language] + text = wikify_internal_links(text, patches) + text = insert_header(text, patches) + text = assemble_summary(text, patches) + return text + +if __name__ == "__main__": + print(main(FILENAME, INPUT))
I've opened a bug report for this request with an _updated_ patch: https://bugs.archlinux.org/task/30045?project=6 Patch: https://bugs.archlinux.org/task/30045?getfile=8834 Dario