International Competition in the Programming

ICP '2000

Problem of correspondence stage

Simple browser

Write browser which can be used to view pages written in a simple markup language. The markup language allows using arbitrary named tags. The tag always denotes part of a document. For example:

<paragraph>The sample paragraph</paragraph>

Such marked part of document is called element. It is always surrounded by start tag (in form <name>) and end tag (in form </name>). There could be only characters of English alphabet in the name of element. Names are case sensitive.

The document can contain any number of elements. These can be embedded one in another. The whole document must be enclosed in a single element.

<article>
  <author>James Brown</author>
  <title>Sample doc</title>
  <para>First sample paragraph. It contains <em>some
	marked text</em> inside.</para>
  <para>Second very simple paragraph.</para>
  <para>Last paragraph contains link to
  	<link><target>second.sml</target>other file</link>.
	Paragraph then continues.</para>
</article>

The '<' character must be written in document as a character sequence '&lt;' because of its special meaning. Ampersand '&' is written as '&amp;'. E.g. document

<document>3 &lt; 5</document>

would be displayed as follows '3 < 5'.

Line ends are treated as spaces and multiple immediately following spaces are treated as a single space during displaying the content of document.

Display of the document will be specified by a style sheet which will be stored in a separate file with extension sty. Which style sheet to use will be specified by name of the root element. For example the style sheet for our sample document will be read from file article.sty because whole document is enclosed in the article element. Browser will look for style sheets in a directory which can be changed in configuration of your program. If the style sheet is not present in this directory, it will be read from same directory as displayed document. In case that style sheet could not be found, browser should alert error and stop displaying the document.

A file with the style sheet definition also uses syntax of our simple markup language. For each element which is present in the document, the style sheet contains definition of appearance. Style sheet contains for each element information necessary to resolve element's color, font size and type of display.

<style>
  <article>
    <display>block</display>
    <size>12</size>
    <color>
	  <red>0</red>
	  <green>0</green>
	  <blue>0</blue>
    </color>
  </article>
  <author>
    <display>block</display>
    <size>11</size>
    <color>
	  <red>0</red>
	  <green>255</green>
	  <blue>0</blue>
    </color>
  </author>
  <title>
    <display>block</display>
    <size>20</size>
    <color>
	  <red>0</red>
	  <green>0</green>
	  <blue>255</blue>
    </color>
  </title>
  <para>
    <display>block</display>
  </para>
  <em>
    <display>inline</display>
    <color>
	  <red>255</red>
	  <green>0</green>
	  <blue>0</blue>
    </color>
  </em>
  <link>
    <display>inline</display>
    <color>
	  <red>0</red>
	  <green>0</green>
	  <blue>255</blue>
    </color>
  </link>
  <target>
    <display>fileref</display>
    <color>
	  <red>0</red>
	  <green>0</green>
	  <blue>255</blue>
    </color>
  </target>
</style>

Each element in the style sheet may contain three elements -- display, size and color.

The content of display element defines way in which element will be displayed. The value block means that element will be formatted as standalone paragraph -- before and after them will occur line break. The value inline will cause displaying of elements' content as part of parent's element with possible change in font color and/or size. The value fileref means that element contains the filename of the linked document. Immediately surrounding element is then used to activate link -- for example by mouse clicking.

The content of size element is treated as font size in pixels. Font with this size is used to display an element.

Color of element is specified by the element color. Color is defined in RGB color space. Subelements red, green, blue contain values of corresponding color components as an integer value between 0 and 255.

Specifying color and size elements in the style sheet is optional. Value of these parameters is inherited from ascendant elements.

Browser will allow user to select file to show at any time. Traversal between files will be possible by links in documents. Presume that documents contain only characters from us-ascii character set. Filenames in links can contain characters '..', which have meaning of going to higher level in a directory structure. Use character '/' as directory separator.

Optionally you can add image support to your browser. In the style sheet it will be possible to use value picture in the element display. The element content then will be treated as the filename of a picture which should be stored in any format which you want to support.

You may also add support for HTTP protocol. Instead of using filenames in links, you can use URL addresses then.

You should focus on writing effective rendering engine for your browser. For rendering and displaying you can not use existing components (e.g. Internet Explorer). Our language is subset of XML, thus you can use existing XML libraries for reading of documents. But this is not necessary.