Mechanical Web Authoring - using m4 to write HTML and Perl.

1. Some limitations of HTML

It's amazing how easy it is to write simple HTML pages - and the availability of WYSIWYG HTML editors like NETSCAPE COMPOSER lulls one into a mood of "don't worry, be happy". However, managing multiple, interrelated pages of HTML rapidly gets very, very difficult. I recently had a slightly complex set of pages to put together and it started me thinking - "there has to be an easier way".

I immediately turned to the WWW and looked up all sorts of tools - but quite honestly I was rather disappointed. Mostly, they were what I would call Typing Aids - instead of having to remember arcane incantations like <a href="link">text</a>, you are given a button or a magic keychord like ALT-CTRL-j which remembers the syntax and does all that nasty typing for you.

Linux to the rescue! HTML is built as ordinary text files and therefore the normal Linux text management tools can be used. This includes the revision control tools such as RCS and the text manipulation tools like awk, perl, etc. These offer significant help in version control and managing development by multiple users as well as in automating the process of extracting from a database and displaying the results (the classic "grep |sort |awk" pipeline).

The use of these tools with HTML is documented elsewhere, e.g. see Jim Weinrich's article in Linux Journal Issue 36, April 1997, "Using Perl to Check Web Links" which I'd highly recommend as yet another way to really flex those Linux muscles when writing HTML.

What I will cover here is a little work I've done recently with using m4 in maintaining HTML. The ideas can probably be extended to the more general SGML case very easily.

I decided to use m4 after looking at various other pre-processors including cpp, the C front-end. While cpp is perhaps a little too C-specific to be very useful with HTML, m4 is a very generic and clean macro expansion program - and it's available under most Unices including Linux.

Instead of editing *.html files, I create *.m4 files with my favourite text editor. These look something like this:

m4_include(stdlib.m4)
_HEADER(`This is my header')
<P>This is some plain text<P>
_HEAD1(`This is a main heading')
<P>This is some more plain text<P>
_TRAILER

The format is simple - just HTML code but you can now include files and add macros rather like in C. I use a convention that my new macros are in capitals and start with "_" to make them stand out from HTML language and to avoid name-space collisions.

The m4 file is then processed as follows to create an .html file e.g.

m4 -P <file.m4 >file.html

This is especially easy if you create a "makefile" to automate this in the usual way. Something like:

.SUFFIXES: .m4 .html
.m4.html:
	m4 -P $*.m4 >$*.html
default: index.html
*.html: stdlib.m4
all: default PROJECT1 PROJECT2
PROJECT1:
	(cd project2; make all)
PROJECT2:
	(cd project2; make all)

The most useful commands in m4 include the following which are very similar to the cpp equivalents (shown in brackets):

m4_include:: includes a common file into your HTML (#include)
m4_define:: defines an m4 variable (#define)
m4_ifdef, m4_ifelse:: conditionals (#ifdef, #if)

Some other commands which are useful are:

m4_changecom:: change the m4 comment character (normally #)
m4_debugmode:: control error disgnostics
m4_traceon/off:: turn tracing on and off
m4_dnl:: comment
m4_incr, m4_decr:: simple arithmetic
m4_eval:: more general arithmetic
m4_esyscmd:: execute a Linux command and use the output
m4_divert(i):: This is a little complicated, so skip on first reading. It is a way of storing text for output at the end of normal processing - it will come in useful later, when we get to automatic numbering of headings. It sends output from m4 to a temporary (internal) file number i. At the end of processing, any text which was diverted is then output, in the order of the file number i. File number -1 is the bit bucket and can be used to comment out chunks. File number 0 is the normal output stream. Thus, for example, you can m4_divert text to file 1 and it will only be output at the end.

	Apples	Oranges	Lemons
England	100	250	300
France	200	500	100
Germany	500	50	90
Spain		23	2444
Denmark			20

Contents: