Print Body Text

We have the text of a page in one big string which we split into lines to be processed individually. This colors our TextFormattingRules, especially those dealing with bullet lists. Since we've now forced authors to be newline conscious, we give them the opportunity to escape newlines with a back-slash (\) which we substitute with a blank.

  sub PrintBodyText {
    s/\\\n/ /g;
    foreach (split(/\n/, $_)){

We begin by looking for in-place URLs, now, before we begin inserting our own generated URLs. We pull the URL text out of the line, substituting a TranslationToken, so that later steps won't MangleTheUrl.

[I added recognition for "news:..." URL's here. -- MarnixKlooster]

[Replaced this with a more nearly correct URL regex, from the spec. Theres a philosohpical difference too - the old regex excluded most, but not all illegal chars from an URL; the regex below only includes correct ones. Also it doesnt need changed when we start linking to ldap stuff. Last, and most makes Wiki capable of handling javascript URLs --BrianEwins]

[Does the regex below really work? It seems to be including the ' character which will conflict with emphasis, and excluding the # character. --?JohnBelmonte]

[Havent been back here in a while but I just noticed John's question. The answer on the quote mark is "oops" - my wiki didnt use that convention. Sorry! On octothorp/hash/pound signs - these are actually excluded in the URI spec and with good reason ( . I think I wrote the below from the earlier RFC1738) . The fragment identifier indicates the beginning of something which is not part of the URL, and in http URLs can include any text whatsoever, including spaces, making the job of figuring out what is and isnt an URL kinda hard unless you are using some other delimiter at the start of the URL, as you would in hrefs. As for 'does it work' generally, I cut and paste it from my own (working) wiki at the time. -- BrianEwins]

    while (s/\b\b([a-z]{3,}:[\$-:=\?-Z_a-z~]+[\$-+\/-Z_a-z~-])/$TranslationToken$InPlaceUrl$TranslationToken/) {
      $InPlaceUrl[$InPlaceUrl++] = $1

A picture of my neighborhood is at http://terraserver.homea ... amp;Z=11&W=2 and it wouldn't TurnBlue till I put in more ampersands and an asterisk
 #  while (s/\b\b([a-z]{3,}:[&\$-:=\?-Z_a-z~]+[.&\$-+\/-Z_a-z~-]*)/$TranslationToken$InPlaceUrl$TranslationToken/) {
Hmmm, this wiki made it TurnBlue without the extra characters. What have I missed? --ChrisGarrod

Various combinations of tabs, spaces and newlines are interpreted as paragraph breaks, various kinds of lists, or preformatted text. We emit sufficient codes to to raise (or lower) us to the desired level of the desired kind of list. If the line doesn't match any of these tests then we request the zeroth level of nothing in particular.

	$code = "";
	s/^\s*$/<p>/                  && ($code = '...');             
	s/^(\t+)(.+):\t/<dt>$2<dd>/   && &EmitCode(DL, length $1);
	s/^(\t+)\*/<li>/              && &EmitCode(UL, length $1);
	s/^(\t+)\d+\.?/<li>/          && &EmitCode(OL, length $1);
	 /^\s/                        && &EmitCode(PRE, 1);
	$code                         || &EmitCode("", 0);

We look for spans of repeated characters to denote various emphasis (italic and bold) and horizontal rules. (Unfortunately, the regular expressions match the longest possible spans so two independently quoted strings cannot appear on the same line.)

Assuming Perl5, here's how to handle multiple quoted strings per line. --DaveSmith

	s{ '{3} (.*?) '{3} }{<strong>$1</strong>}gx;
	s{ '{2} (.*?) '{2} }{<em>$1</em>}gx;

We look for text that should be hyperlinks. That would be either a span with unusual capitalization, a number in square brackets, or the TranslationToken we inserted above. We use subroutines to expand these to html and add the e and o options to warn the pattern matcher that we do so.


We look for special keyword substitutions, of which we only have one, inside square-brackets. (Why square-brackets? Well, I was thinking of it as another variation of reference when I wrote it. I suppose angle-brackets or ampersand-semicolon would be more in keeping with web tradition.)


Finally, we finish by printing the line; close our loop over all lines; and, when the loop finally terminates, we emit any codes necessary to be at list level zero.

	print "$_\n";
    &EmitCode("", 0);


Last edited August 31, 2003
Return to WelcomeVisitors