[arch-releng] Use pandoc instead of make-doc.sh

Dario Giovannetti dariogiova at gmail.com
Wed May 16 17:41:17 EDT 2012


On 15/05/12 22:40, Dieter Plaetinck wrote:
> On Mon, 14 May 2012 23:32:35 +0200
> Dario Giovannetti<dariogiova at gmail.com>  wrote:
>
>> I would like to propose using pandoc ( http://johnmacfarlane.net/pandoc/
>> ) instead of the make-doc.sh script for converting the installation
>> guide (Markdown syntax) to the document hosted in the wiki (MediaWiki
>> syntax).
>> I've tested it and the result looks pretty good, with only a few minor
>> manual refinements required (which I volunteer to perform, if needed):
>> instead of the current script, which practically produces an html
>> document, we would get a correctly-formed, much neater MediaWiki document.
>> This would greatly simplify further improvements in the wikification of
>> the the guide, like the adaptation to ArchWiki's style standars.
>>
>> Thank you
>>
>> Dario
>>
> If you have a patch that I can apply, I'll try it out.
> if the code using pandoc is more elegant, and the output result is comparable (and/or better), i'm up for it.
>
> Dieter
As requested, here's the patch: it's quite radical also because the 
previous script was creating a header that is not used any longer in the 
wiki. As you can see I've rewritten almost everything in Python, since 
I'm much more comfortable in that language with regular expressions; 
besides, that way the code is much more readable and flexible.

NOTE 1: you will require to install the "pandoc" package, currently in 
the AUR: http://aur.archlinux.org/packages.php?ID=32490

NOTE 2: the patch has been committed on the "develop" branch.


 From 593842cd1182ae0342efc9356477b16739641455 Mon Sep 17 00:00:00 2001
From: Dario Giovannetti <dariogiova at gmail.com>
Date: Wed, 16 May 2012 23:12:50 +0200
Subject: [PATCH 50/50] revise the automatic procedure for creating the
  MediaWiki version of the installation guide

---
  README                             |    3 +-
  doc/official_installation_guide_en |    3 +-
  make-doc.sh                        |   49 ++-------------
  make_doc_fixes.py                  |  122 
++++++++++++++++++++++++++++++++++++
  4 files changed, 131 insertions(+), 46 deletions(-)
  create mode 100755 make_doc_fixes.py

diff --git a/README b/README
index 6981de7..35456cd 100644
--- a/README
+++ b/README
@@ -19,7 +19,8 @@ Homepage:    http://github.com/Dieterbe/aif
   - libui-sh
   - iproute2
  Optionally:
- - markdown: to generate the html installation guide
+ - pandoc: to generate the MediaWiki installation guide
+ - python: to generate the MediaWiki installation guide
   - cryptsetup: for encryption support
   - lvm2: for LVM support
   - dhcpd: for dhcp networking support
diff --git a/doc/official_installation_guide_en 
b/doc/official_installation_guide_en
index 0faea07..5b2e580 100644
--- a/doc/official_installation_guide_en
+++ b/doc/official_installation_guide_en
@@ -2,7 +2,7 @@

  General installation documentation for the Arch Linux distribution.

-This guide is only valid for release 2010.05 or newer.
+This guide is only valid for release 2011.08 or newer.
  This guide is maintained in [aif 
git](http://projects.archlinux.org/?p=aif.git)
  Git pull requests, patches, comments are welcome on the arch
  [releng mailing 
list](http://www.archlinux.org/mailman/listinfo/arch-releng)
@@ -218,6 +218,7 @@ You can find more info on the wiki
  [Community contributed 
documentation](http://wiki.archlinux.org/index.php/Archiso-as-pxe-server)

  (this section could be a bit more elaborate)
+
  ### Client

  Configure your system to try network booting (pxe) first.
diff --git a/make-doc.sh b/make-doc.sh
index 4e6c5a2..39914f5 100755
--- a/make-doc.sh
+++ b/make-doc.sh
@@ -1,50 +1,11 @@
  #!/bin/sh
-which markdown &>/dev/null || echo "Need markdown utility!" >&2
+which pandoc &>/dev/null || echo "Need pandoc utility!" >&2
+
+echo "generating mediawiki document..."

-echo "generating html..."
  for i in doc/official_installation_guide_??
  do
      echo $i
-    # convert markdown to html, convert html links to wiki ones.
-    cat $i | markdown | sed 's|<a 
href="\([^"]*\)"[^>]*>\([^<]*\)</a>|[\1 \2]|g' > $i.html
-    # turn code markup into a syntax that mediawiki understands
-    sed -i 's#<pre><code>#<pre>#g' $i.html
-    sed -i 's#</code></pre>#</pre>#g' $i.html
-
+    # convert markdown to mediawiki and perform further adaptations
+    cat $i | pandoc -f markdown -t mediawiki | xargs -0 
./make_doc_fixes.py $i > $i.mw
  done
-
-echo "adding special wiki thingies..."
-
-i=doc/official_installation_guide_en
-echo $i
-
-
-summary_begin='<p><strong>Article summary<\/strong><\/p>'
-summary_end_plus_one='<p><strong>Related articles<\/strong><\/p>'
-related_begin='<p><strong>Related articles<\/strong><\/p>'
-related_end_plus_one='<h1>Introduction<\/h1>'
-
-summary=`sed -n "/$summary_begin/, /$summary_end_plus_one/p;" $i.html | 
sed "/$summary_begin/d; /$summary_end_plus_one/d"`
-related=`sed -n "/$related_begin/, /$related_end_plus_one/p;" $i.html | 
sed "/$related_begin/d; /$related_end_plus_one/d"`
-
-# prepare $related for wikiing.
-# note that like this we always keep the absulolute url's even if they 
are on the same subdomain eg: {{Article summary wiki|http://foo/bar 
bar}} (note).
-# wiki renders absolute url a bit uglier.  always having absolute url's 
is not needed if the page can be looked up on the same wiki, but like 
this it was simplest to implement..
-related=`echo "$related"| sed -e 's#<p>\[\(.*\)\] \(.*\)<\/p>#{{Article 
summary wiki|\1}} \2#'`
-
-# preare $summary for wiiking: replace email address by nice mailto links
-summary=`echo "$summary" | sed 's/\([^"|, 
]*@[-A-Za-z0-9_.]*\)/[mailto:\1 \1]/'`
-
-
-echo -e "[[Category:Getting and installing Arch 
(English)]]\n[[Category:HOWTOs (English)]]
-[[Category:Accessibility (English)]]
-[[Category:Website Resources]]
-{{Article summary start}}\n{{Article summary text| 
1=$summary}}\n{{Article summary heading|Available Languages}}\n
-{{i18n_entry|English|Official Arch Linux Install Guide}}\n
-{{Article summary heading|Related articles}}
-$related
-{{Article summary end}}" | cat - $i.html > $i.html.tmp && mv 
$i.html.tmp $i.html
-
-# remove summary and related articles from actual content
-sed "/$summary_end_plus_one/p; /$summary_begin/, 
/$summary_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html
-sed "/$related_end_plus_one/p; /$related_begin/, 
/$related_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp $i.html
diff --git a/make_doc_fixes.py b/make_doc_fixes.py
new file mode 100755
index 0000000..a7200a0
--- /dev/null
+++ b/make_doc_fixes.py
@@ -0,0 +1,122 @@
+#!/usr/bin/env python3
+"""
+This script is not meant to be run as a standalone application, instead 
it is
+called by make-doc.sh to perform further adaptations to the MediaWiki 
version
+of the installation guide.
+"""
+
+import sys
+import re
+
+FILENAME = sys.argv[1]
+INPUT = sys.argv[2]
+
+# Used in fix_multiline_list_items
+LIST_REGEXP = "^([\:\*#]+.+<br(?: /)?>)( *\n)"
+LIST_REPLACE = "\g<1>"
+
+# Used in wikify_internal_links
+LINK_REGEXP = "\[{baseurl}([^\]\s]+?) ([^\]\n]+?)\]"
+LINK_REPLACE = "[[\g<1>|\g<2>]]"
+
+# If a translation of the guide is added, a proper entry should be added to
+# this dictionary; the key names must be 2-character language tags
+LANGFIXES = {
+    "en": {
+        "baseurl": "https?://wiki\.archlinux\.org/index\.php/",  # regexp
+        "header": """\
+[[Category:Getting and installing Arch]]
+[[fr:Guide officiel de l'installation]]
+[[ro:Ghid de instalare oficial]]
+{{i18n|Official Installation Guide}}
+""",  # string
+        "intro": """The Official Installation Guide is maintained in 
[http://projects.archlinux.org/aif.git/ aif.git].
+
+The version included with the latest 
[http://www.archlinux.org/download/ release] (2011.08.19) can be found 
[http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en?id=13c8c0813328eb8f52b03b3c53a32f1f40558021 
here].
+
+The latest version can be found 
[http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en 
here].
+
+The (unofficial) [[Beginners' Guide]] provides a thorough walkthrough 
of the the installation and configuration process.
+
+""",
+        "summary_heading": None,  # must be None only for English
+        "summary": "'''Article summary'''",  # string
+        "related": "'''Related articles'''",  # string
+        "introduction": "= Introduction =",  # string
+    },
+}
+
+
+def fix_multiline_list_items(text):
+    """
+    pandoc doesn't convert multiline list items correctly, so this function
+    compensates for that.
+    """
+    test = ""
+    # It's necessary to run this multiple times because of how the regular
+    # expression is designed
+    while text != test:
+        test = text
+        text = re.sub(LIST_REGEXP, LIST_REPLACE, text, flags=re.MULTILINE)
+    return text
+
+
+def wikify_internal_links(text, patches):
+    """
+    Turns external links that point to the local subdomain into proper 
internal
+    links.
+    """
+    regexp = LINK_REGEXP.format(**patches)
+    text = re.sub(regexp, LINK_REPLACE, text)
+    return text
+
+
+def insert_header(text, patches):
+    """
+    Inserts the standard article header.
+    """
+    text = patches["header"] + text
+    return text
+
+
+def assemble_summary(text, patches):
+    """
+    Converts the article summary and related links into a standard summary
+    """
+    # NOTE: this function requires some fixes if more languages are added
+    part_a = text.partition(patches["summary"])
+    part_b = part_a[2].partition(patches["related"])
+    part_c = part_b[2].partition(patches["introduction"])
+    related_links = part_c[0].strip().split("\n")
+    summary_heading = ("|" + patches["summary_heading"]
+                       if (patches["summary_heading"])
+                       else "")
+    summary_text = part_b[0].strip()
+    related = "\n".join(["{{{{Article summary text|1={}}}}}".format(r)
+                         for r in related_links])
+    summary = """{{{{Article summary start{}}}}}
+{{{{Article summary text|1={}}}}}
+{{{{Article summary heading|Related articles}}}}
+{}
+{{{{Article summary end}}}}
+
+""".format(summary_heading , summary_text, related)
+    text = part_a[0] + summary + patches["intro"] + part_c[1] + part_c[2]
+    return text
+
+
+def main(filename, text):
+    """
+    Main function
+    """
+    language = filename[-2:]
+    text = fix_multiline_list_items(text)
+    if language in LANGFIXES:
+        patches = LANGFIXES[language]
+        text = wikify_internal_links(text, patches)
+        text = insert_header(text, patches)
+        text = assemble_summary(text, patches)
+    return text
+
+if __name__ == "__main__":
+    print(main(FILENAME, INPUT))
-- 
1.7.5.4



Dario



More information about the arch-releng mailing list