Converting documents to mediawiki markup

From 3kWiki
Revision as of 17:54, 7 April 2009 by imported>Floobity (→‎Excel Documents)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Intro

There appears to be few existing tools to automatically convert other document formats (e.g .doc, .xls, .ppt). The simplest approach right now appears to be to convert documents into html and from that to convert them to wiki markup as there are good html->wiki converters. Here are some good links.

More tools and techniques have been developed at Appropedia (the sustainable development wiki) including for converting PDFs: Porting formatted content to MediaWiki and Help:Porting PDF files to MediaWiki

HTML documents

html2wiki converter based on HTML::WikiConverter Perl module

Word Documents

Saving a relatively simple Word document (no images or tables) to html and then running that through the converter here produced good mediawiki formatting. A document including images, tables, and centered text did not work as well. The images would need to be added to the wiki separately, the table also didn't come out quite right and centered text was no longer centered.

A direct converter can be found here.

A series of Word macros for doing simple conversions (including tables) is here; they seem to work reasonably well but aren't designed for sophisticated layouts.

Also, with the release of OpenOffice, 2.4, OpenOffice can now export documents to mediawiki format. Since OpenOffice can also read MS Word documents, this allows OpenOffice to serve as a Word to MediaWiki converted.

Images

I have had good success with the following steps for porting images embeddded in word documents to MediaWiki format on a Mac:

  1. Click on the image in the word document and choose Edit->Copy from the menu (cmd-C)
  2. Go to the application GraphicConverter and choose File->New->Image with clipboard (cmd-J)
  3. Choose File->Save as and save as a JPEG/JFIF format (.jpg) file with 100% Quality.

Alternatively, if you want to take an image which has associated text boxes, it seems to come out well if you take a screenshot of a selection with Grab (in the /Applications/Utilities folder), save as a .tiff (your only option) and then open in GraphicConverter and save as a JPEG as described above.

Excel Documents

  • If you can export a data in comma separated variable (CSV) format, then a converter exists.
  • Simpler, less feature-rich script supporting "copy and paste" conversion: Excel to Wiki Table Converter