Xml Based Wiki

Moved from WhyDoesntWikiDoHtml

At one point last year, AlanFrancis and I simultaneously started writing Wiki clones based on servlets. When we discovered we were both doing the same thing, we traded code.

In mine, I had followed the lead of AtisWiki and designed for multiple page source formats (one of which was planned to be HTML). Alan had taken precisely the opposite approach, assuming only the wiki source format but multiple possible output formats.

Doing either of these is easy, but doing both is much harder (you have to design an intermediate abstract formatting model that's a superset of all the things that all of your source formats can do). I thought about it for five minutes and decided that Alan's way was clearly better.

-- GlennVanderburg

Glenn, I had a similar idea - have WikiWiki MarkupLanguage translated on the fly to an XML (DTD based on the constructs made possible by the TextFormattingRules). Then we could take advantage of XSL(T) tools as well as have an excuse to rewrap text before presenting it to the user. -- MattBehrens

Me too. I was thinking of using JavaHtmlTidy (or a simplified version of it) to fix up any embedded tags to HTML while I was at it. A previous wiki I wrote suffered greatly once the users learned presentational markup, (it allowed embedded HTML), since the meaning was lost and the styling could not be easily changed. My intention would be to convert wiki markup to XML, then in a second pass convert recognized presentational markup to document-structure markup. Finally, as you say, XSL can be used to both recover wiki markup or re-present it as HTML. I just came back here a few minutes ago to see if anyone had already done this... -- BrianEwins


Perhaps we can use this page for Schema/DTD sharing? A standardized XML Wikispec would be mucho useful, especially when trying to import/export wikipages from one engine to another.

The WikiTypeFramework used an XML format for all of its pages. I can dig into its spec if others are interested. (As it stands, WikiTypeFramework is now a defunct project)

Others are interested. Could you post the spec, or link to it?

Ok, this is under the GPL.

It seems that WTF uses 2 levels of xml, one level for the page metadata, and another for the page data. Here is an example of the Metadata:
	<!-- Home page -->
	<content:6>
		<classid>-20631383</classid>
		<objectid>-1561770585</objectid>
		<title>Wiki Type Framework</title>
		<version>4</version>
		<workspaceid>0</workspaceid>
		<creatorid>385153371</creatorid>
		<creatorName>Root</creatorName>
		<creatorHomeid>385153371</creatorHomeid>
		<creatorDatetime>2002-06-10 0:02:26</creatorDatetime>
		<updatorid>385153371</updatorid>
		<updatorName>Root</updatorName>
		<updatorHomeid>385153371</updatorHomeid>
		<updatorDatetime>2002-07-06 18:16:38</updatorDatetime>
		<viewGroup>Everyone</viewGroup>
		<editGroup>Editors</editGroup>
		<deleteGroup>Gods</deleteGroup>
		<adminGroup>Gods</adminGroup>
		<content><![CDATA[<p>Welcome to the Wiki Type Framework.</p>]]></content>
		<contentIsXML>1</contentIsXML>
		<attributes>
		</attributes>
	</content:6>

The content XML is a subset of HTML, and directly maps to HTML.

The Root element is <wtf>, and it is responsible for displaying the page header (and footer in the </wtf> tag)

These tags have a map directly to HTML tags:
	br, p, a, b, i, u, pre, ul, ol, li, img, hr

There are some higher level elements Also, there are some super-highlevel elements specific to WTF And finally some system elements If you have any questions don't hesitate to ask.

The abbreviation "WTF" seems rather unfortunate... -- KarlKnechtel

Or on purpose.. It wouldn't surprise me in the least.. ;)

WTF was really neet, because almost everything was a wikipage. Users were wikipages, the edit page was a wikipage, the "Not Found, please describe" page is a wikipage.


Another option, which I have recently implemented, involves storing pages in a RelationalDatabase (this is more out of laziness) and in their "raw" markup form. The Wiki runs on ApacheCocoon. When accessed, the page gets parsed with Chaperon into an XML stream, manipulated through various transformations and serialized to HTML or PDF (or whatever). Searching uses the DB. History uses a separate table that stores all previous versions. One improvement would be to cache the XML to avoid reparsing the syntax every time. But my priority is to save the page first, so if something goes wrong with the parsing, nothing is lost (or in an unstable state) and I avoid frustrating the user. -- AndreThenot?


EditText of this page (last edited September 24, 2004) or FindPage with title or text search