Crash course on writing HTML


Questo documento fa parte dei materiali del corso internet tenuto da dell' Infn - Bari
This document has been adapted for this course from the following original. Related material can be found in the Bibliography.

Table of Contents


The Absolute Essentials

If all that you want to do is to put in HTML format some text there's only a few things that you need to do.

  1. Write the ascii text with your normal editor naming the file something.htm
  2. Add a title at the beginning surrounding it with "<h1>" and "</h1>"
  3. If you want to insert a headline, surround it with "<h2>" and "</h2>"
  4. If there are any subheads in the text itself, surround the text of the subhead with "<h3>" and "</h3>"
  5. In front of each paragraph (not including the headings that you have marked already), type "<p>"
  6. Sign your file, typing "<p><i>" before your name and time and type "</i>" after it.
That's it, you're finished. This quick-and-dirty list glosses over numerous complications and technicalities, but at least your document will be read and displayed correctly by most browsers.

However, if you want to do anything at all fancy, such as links to other documents, graphics, bullet or number lists, or whatever, then read on. This document isn't long and you'll be able to do a lot more when you're finished working through it. Let me know if there are any confusing sections so I can improve it.


What is HTML?

The Hypertext Markup Language (or HTML) is the language used to create the documents for the World Wide Web. Although most browsers will display any document that is written in plain text, there are advantages that you get by writing documents using HTML. When HTML documents are read by applications specifically designed for the Web (usually called browsers or Web browsers) they can include formatting, graphics, and even links to other documents.

As a markup language, HTML is not so much concerned about the appearance of the documents, but about the structure of a document. Rich Text Format (RTF), on the other hand, is an example of a formatting language. The difference between them is that, in HTML, you would use commands to mark the headings, normal paragraphs, lists (and whether they are numbered or not), and even things like addresses. In RTF, you would use commands (usually the word processing program does this for you) to indicate the typeface, font size, and style of the text in a document.

Although HTML is a variant of Standard Generalized Markup Language (SGML), it differs from some other SGML variants in its simplicity. HTML is simple enough to just type in directly without using some sort of HTML editor. HTML editors are useful, especially if you have massive quantities of documents to write, but they are not necessary to get started.

One thing to remember about HTML, though, is that it is a new standard that's constantly changing. It's difficult to define HTML exactly, which is why you often see use of the word "usually" in this document. Also, browsers differ in their support of various HTML extensions and differ (usually a lot) in the way they handle many standard HTML constructs. However, by sticking to the fairly simple subset of the commands outlined in this document, you can be reasonably assured that your documents will look good in just about any Web browser.

In general, HTML commands begin with a < and end with a >. The commands are almost never case sensitive and are usually "containers" (although there are numerous exceptions to both of those generalizations). By a container, I mean that there is usually a beginning command and an ending command. The commands would thus be applied to the text in-between the beginning and ending commands. An example of a container command is the title command, which surrounds the text that is designated as the document's title with <title> and </title>.

White space, meaning anything that is not a printable character, is generally ignored in HTML. Leaving a blank line in your document will generally not create a blank line when the document is displayed in a browser.

Finally, not every element common to typical documents is included in HTML. You will occasionally have problems converting some documents. For example, the version of HTML in common use today doesn't support tables or equations. There are a few tricks you can use (which we will go into later) to get around these limitations. Both tables and equations are part of the next proposed version of HTML, but it will likely be some time before the new version of HTML is supported by most browsers.

A quick word about the HTML philosophy

Using a markup language is a lot like using styles in Word. In Word, for example, you would often have styles for the various heading levels, bullet lists, and other common elements of a document. The advantage of this is that you can change the underlying formatting easily without having to reapply the formatting throughout the document.

However, as an author of a Web-based document, you don't control the actual formatting of your document. So what's the advantage? The advantage is that, with everybody using the same "styles" so-to-speak, the World Wide Web has a certain consistency to it that is absent from many other forms of information access on the Internet. This consistency is all the more impressive when you consider that there is no one, central design authority controlling the content of the Web.

Sometimes you see documents on the Web in which authors have gone to extraordinary effort to make it look exactly the way they wanted it to. They use lots of in-line graphics (for things like bullets, horizontal rules, and even individual characters). Usually, however, the result is that those pages look terrific in one particular browser but look dreadful in others.

I suggest that you try to use the Web's built-in "style-sheet" rather than trying to devise your own. On good, configurable browsers, the users can then see documents the way they want to, not the way the author wants them to.


The Heading

Every HTML document should start by declaring itself as such. You do that with the <html> command at the very top of the document. The very last text in a document should be the ending command, </html>.

The top part of the document should also have a section for heading information which is surrounded by the <head> and </head> commands. There are several items of information that you can put in the header, but almost all of it is totally ignored by most browsers out there. One piece of information you should always have in the heading is the title. The title (as we mentioned earlier) is surrounded by the <title> and </title> commands.

The title of a document is not normally displayed as part of the page, but is often displayed in some sort of special section in most browsers (Mosaic puts the title in a Document Title box just under it's menu, for example). However, the title is also used by most browsers when saving the user's "hotlist," so it should be both descriptive and short enough to fit comfortably on one line.

Finally, the "body" of the document should also be marked off with the <body> and </body> commands. This is the part of the document that is normally displayed as the page in most browsers.

This is what a typical document would look like so far:

<html>
<head>
<title>This is my title</title>
</head>
<body>



</body>
</html>

(Remember that white space doesn't matter, so this stuff could all be on one line if you wanted. It makes no difference.)


The Body

Heading levels

HTML is easiest to use with documents with a fairly rigid structure -- ones with a definite outline of headings, subheadings, sub-sub-headings, and lists. It is not required, but it is good practice to write your document so that the heading "levels" used reflect the organization of your document. For example, the first heading should be a "level 1" heading (I'll show you how to do this soon), subheadings should be "level 2," and so on.

Most browsers recognize at least four heading levels. There is support in HTML for more than that, but what I mean by "recognize" is that the browser gives up to four levels of headings a distinct style. After the fourth level, it gets difficult to tell the heading levels apart. If you get much beyond that, you should consider breaking up your document into multiple pages.

The heading commands look like <hX> and </hX>, where X is the heading level. In most documents on the Web, the first heading is a duplicate of the document's title. Our typical document would look like this after we added the first heading:

<html>
<head>
<title>This is my title</title>
</head>
<body>

<h1>This is my title</h1>


</body>
</html>

Paragraphs

Normal paragraphs are separated with the <p> command. This is one of those exceptional commands that is not a container, although it can be. The next proposed version of HTML makes this a container, so that paragraphs begin with a <p> and end with a </p>. However, very few documents actually use the <p> command this way, so it's up to you if you want to use it.

Our sample document would look like this after we added a paragraph or two and a subheading:

<html>
<head>
<title>This is my title</title>
</head>
<body>

<h1>This is my title</h1>

This is a sample paragraph. The majority of most documents contain
this type of construct. <p>

<h2>This is a subheading</h>

The quick brown fox jumped over the slow lazy dogs.
The quick brown fox jumped over the slow lazy dogs.<p>

The quick brown fox jumped over the slow lazy dogs.
The quick brown fox jumped over the slow lazy dogs.<p>

</body>
</html>

I put the paragraph marks after each paragraph in this example, but they can just as easily go in front of each paragraph. The <p> is just a separator.

Lists

There are three kinds of lists in HTML: ordered, unordered, and a special kind called a definition list. The ordered lists are numbered. Unordered ones typically just use bullets to mark each item.

In ordered lists the browsers take care of inserting the actual numbers. This behavior is convenient for authors because if you insert or delete items in a sorted list, you don't have to worry about renumbering everything. An ordered list begins with <ol> and ends with </ol>.

Unordered lists typically use bullets to mark off each item in the list, but this is up to the browser (a DOS browser may use asterixes or dashes, for example). An unordered list begins with <ul> and ends with </ul>.

In both kinds of lists, the individual items are designated with a <li> command. This is another one of those commands that isn't typically used as a container (i.e. it doesn't have a corresponding </li> command), but it can have one if you want. The new proposed version of HTML uses it as a container, but since virtually all web documents don't use it that way, you're pretty much assured that it will always be recognized as a separator (the same goes with <p>, by the way).

Although it is not strictly legal HTML, you can nest lists to get an outline effect. All current browsers that I'm aware off recognize nested lists and it is likely to be added to the standard.

Here's our sample document with a few lists thrown in:

<html>
<head>
<title>This is my title</title>
</head>
<body>

<h1>This is my title</h1>

This is a sample paragraph. The majority of most documents contain
this type of construct. <p>

<h2>This is a subheading</h>

The quick brown fox jumped over the slow lazy dogs.
The quick brown fox jumped over the slow lazy dogs.<p>

Here's an ordered list:<p>
<ol>
  <li> first item.
  <li> second item.
  <li> notice that <p> commands are not necessary to
	     separate list items.
</ol>

Here's an unordered list:<p>

<ul>
  <li> an item.
  <li> another item.
  <li> here's a nested list
    <ul>
      <li> a nested item
      <li> another nested item
    </ul>
  <li> the last item
</ul>

The quick brown fox jumped over the slow lazy dogs.
The quick brown fox jumped over the slow lazy dogs.<p>

</body>
</html>

This is what the list above would look like when rendered with your browser:

Here's an ordered list:

  1. first item.
  2. second item.
  3. notice that <p> commands are not necessary to separate list items.
Here's an unordered list:

Definition Lists

A definition list is a very flexible type of list that is more useful than its name implies. It's useful for lists where a bit of explanatory text should accompany each item. Each item in the list has two parts, a term (indicated with the <dt> command) and a definition (which uses the <dd> command). The list itself is started with a <dl> command and closed with a </dl> command.

Here's a sample definition list:

<dl>
  <dt> First Term
  <dd> First term's definition.
  <dt> Second term (or title, or whatever)
  <dd> Text that explains or expands on the second term.
</dl>
And This is what it would look like in your browser:

First Term
First term's definition.
Second term (or title, or whatever)
Text that explains or expands on the second term.

Links

Links are what make Web documents unique. Unfortunately, some of what is required to create a link is slightly complicated. The most complex part of it is the URL that points to the resource you're linking to.

URLs

A URL (or Universal Resource Link or Label or something like that) is the address of a document or resource. It usually takes this form:

protocol://machine.name[:port]/directory/document.name

The protocol is the Internet protocol used to reach the document or resource. On the Web, it is typically "http", but it can be any of numerous other things (such as ftp, gopher, telnet, etc). The machine.name is just what you think it is: the name of the host where the document resides (such as www.ba.infn.it). The ":port" portion of the address is optional and is only necessary when a resource is accessible through a non-standard TCP port number. The standard port number for HTTP is 80, for example, which is why we have to put in the port number for our web server (8080) since we don't use the standard port. This goes for other protocols, as well. If you want to reach a gopher at something than port 70 or telnet at something other than port 23, you have to put in a port number.

The directory and document.name components of the URL are self explanatory.

The easiest way to get the URL of a document is to find it using Netscape and then copy the URL into your HTML document. In Netscape, you would copy the text in the Location field near the top of Netscape's window.

Putting Links in HTML documents

The HTML command for putting a link into a document takes this form:

<a href="URL">text of link</a>

You put the URL in the quotes following the "href=" and put the text of the link (the part that users will click or select to activate the link) after the > and before the </a>.

So, here's out document with a few links:

<html>
<head>
<title>This is my title</title>
</head>
<body>

<h1>This is my title</h1>

This is a sample paragraph. The majority of most documents contain
this type of construct. Here's a link embedded in the document right
<a href="http://axahc1.cern.ch:8080/disk$user/hcal_shift/logbook/public/monitproc.html">here
</a>.<p>

<h2>This is a subheading</h>

The quick brown fox <a
href="http://axahc1.cern.ch:8080/disk$user/hcal_shift/logbook/private/1995/cale1995.html>jumped</a>
over the slow lazy dogs.
The quick brown fox jumped over the slow lazy dogs.<p>

...and so on...
Fortunately you don't need always to give such long adresses.If you know that the document is in the same directory as the document you are writing,then you can simply write href=name_of_document, avoiding the writing of all the directories and subdirectories.The work of adding these,thus completing the address, is done by Netscape.

Images

One of the great things about the web is the ability to create and share some snazzy looking documents across platforms. However, you should restrain yourself somewhat. I've noticed in our log files for our server than many people turn off in-line images to increase performance. Even with fast network access to the Internet, some users find documents loaded down with images to be annoying.

However, a dash of colorful images can be nice. Images are also often necessary to make a point that can't be made using text only.

To add an image to your document, you need to convert it first into GIF or Jpeg. The main tool for doing that is Paintshop Pro. You first produce the image you want to convert to GIF/Jpeg on the screen. Then you start PaintShop Pro and capture it. Now you can modify the size,number of colours, etc. When you are satisfied, you save the image in the GIF/Jpeg format. If you want to add transparent background or the Interlace feature (i.e. images which build themselves with increasing details) then use the program LviewPro(remember to save as gif89 which is the format supporting a background color). Click here for an example of image processing.

To make things easier on yourself, put the images that you want to show in your document into the same directory as your document. It is possible to display a GIF images that is stored almost anywhere (even somewhere on the Internet), but I won't get into those kinds of complexities right now.

The HTML command for inserting an image at the current position takes the following form:

<IMG SRC="name_of_image.gif">

That's all there is to it to insert a GIF file into your document. Notice that the location is given relative to the current document. The location does not have to be a full URL (but it can be if you want). This same trick can be used for normal links (not just for images). Some say it's faster to use relative URLs, but I've never really noticed a great difference in speed, but I've somehow got used to using relative URLs for images and full URLs for everything else. Feel free to disagree with me here.

Image Options

There is also one optional argument to the IMG command that you may want to use occasionally.

You can "suggest" to the browser that the image be aligned in a particular way with the surrounding text using the "align=" directive. The choices are "top", "middle", or "bottom", which indicate where the base of the image should be in relation to the base line of the surrounding text.

Here's a couple of examples:

<img align="top" src="iper0.gif"> Some Text.
<img align="middle" src"iper0.gif"> Some Text.
<img align="bottom" src"iper0.gif"> Some Text.
Here's how it would look in your browser:

Some Text.

Some Text.

Some Text.

Another useful option is to suggest a text-only alternative for browsers that don't support in-line images. The "alt" directive is used like this:

<img alt="o" src="./image.gif"> Some Text.
<img alt="o" src"./image.gif"> Some Text.
<img alt="o" src"./image.gif"> Some Text.
For users on a text-only browser like W3-mode of Emacs or Lynx, these items would appear as just "o" instead of something like [IMAGE]. Some web servers make very good use of this directive to display icons for the image-oriented and simple word links (like "[Home] [Next]") for text-only browsers.

Images with links attached

Between the <a href=URL> and </a> you can put any text but also images. In this case the image is used to link the document indicated by the URL.

There is also the possibility to attach a link to a section of the image. For example in a map of Italy you can attach to each town a different document. This is done by including the USEMAP="#nomemap" option in the IMG tag.

After the IMG tag you must describe the image layout using the following tags:

This catalog of images is an example of use.


Scripts(execution of command files)

A link can start the execution of an arbitrary command file. This is very useful, if you want to add interactivity to your hypertext. The simplest script has the following form:
print "HTTP/1.0 200 OK\n";
print "Content-type: text/plain\n";
print "\n";
print "hello world\n";
Of course,this script isn't very useful, since it doesn't do anything,just returns a page with a line of text. Try it here .

Now we want to know the time .Easily done! We replace the last line with the perl commands that give the local time.

print "HTTP/1.0 200 OK\n";
print "Content-type: text/plain\n";
print "\n";
@timelist = localtime (time());
$currtime = join(" ",@timelist[2,1,0]);
print (" $currtime"); 
Try it.

In fact, instead of the commands shown you can put any set of commands, starting programs and doing what you want. What happens is that all the commands found in the script are executed and the output that you would normally get on the remote computer, you now get on the page sent as replay to the script execution.Another example is a script that will give the content of the directory test.The script itself is listed here.


Passing parameters to scripts

You can send parameters to a script,including them in the link, after the script address,separated with a + from the script name. As an example try to open the following URL: http://www.ba.infn.it/cgi-bin/nph-listfile.pl?testpage.html This script types the file that you give after the ?. You can specify any filename after the ?. The code for this script is:
print "HTTP/1.0 200 OK\n";
print "Content-type: text/plain\n";
print "\n";
chdir("/user/www/data/test");
if (open (MYFILE , $ARGV[0])){
    $line = ;
    while ($line ne "") {
	print($line);
	$line = ;
   }
}
To understand how it works,we will use the following script
print "HTTP/1.0 200 OK\n";
print "Content-type: text/plain\n\n";
print "CGI/1.0 test script report\n\n";
if ($ENV{'REQUEST_METHOD'} eq "POST") {
	$form = ;
	print "$form \n";
} else { 
print "argc is $#ARGV \nargv is ";
 while (@ARGV) {
	$ARGV=shift;
	print "$ARGV ";
}
}
print "\n";
#print argc is $#. argv is "$*".
#
print "SERVER_SOFTWARE = $ENV{'SERVER_SOFTWARE'}\n";
print "SERVER_NAME = $ENV{'SERVER_NAME'}\n";
print "GATEWAY_INTERFACE = $ENV{'GATEWAY_INTERFACE'}\n";
print "SERVER_PROTOCOL = $ENV{'SERVER_PROTOCOL'}\n";
print "SERVER_PORT = $ENV{'SERVER_PORT'}\n";
print "SERVER_ROOT = $ENV{'SERVER_ROOT'}\n";
print "REQUEST_METHOD = $ENV{'REQUEST_METHOD'}\n";
print "HTTP_ACCEPT = $ENV{'HTTP_ACCEPT'}\n";
print "PATH_INFO = $ENV{'PATH_INFO'}\n";
print "PATH = $ENV{'PATH'}\n";
print "PATH_TRANSLATED = $ENV{'PATH_TRANSLATED'}\n";
print "SCRIPT_NAME = $ENV{'SCRIPT_NAME'}\n";
print "QUERY_STRING = $ENV{'QUERY_STRING'}\n";
print "QUERY_STRING_UNESCAPED = $ENV{'QUERY_STRING_UNESCAPED'}\n";
print "REMOTE_HOST = $ENV{'REMOTE_HOST'}\n";
print "REMOTE_IDENT = $ENV{'REMOTE_IDENT'}\n";
print "REMOTE_ADDR = $ENV{'REMOTE_ADDR'}\n";
print "REMOTE_USER = $ENV{'REMOTE_USER'}\n";
print "AUTH_TYPE = $ENV{'AUTH_TYPE'}\n";
print "CONTENT_TYPE = $ENV{'CONTENT_TYPE'}\n";
print "CONTENT_LENGTH = $ENV{'CONTENT_LENGTH'}\n";
print "DOCUMENT_ROOT = $ENV{'DOCUMENT_ROOT'}\n";
print "DOCUMENT_URI = $ENV{'DOCUMENT_URI'}\n";
print "DOCUMENT_NAME = $ENV{'DOCUMENT_NAME'}\n";
print "DATE_LOCAL = $ENV{'DATE_LOCAL'}\n";
print "DATE_GMT = $ENV{'DATE_GMT'}\n";
print "LAST_MODIFIED = $ENV{'LAST_MODIFIED'}\n";
This is a script called nph-testcgi.pl that you can start by opening the following URL:

http://www.ba.infn.it/cgi-bin/nph-testcgi.pl?temp.htm

As you see, the server before calling the script has put in variable $ARGV[0] the parameter we have passed. This is the way you develop these scripts. You first use nph-testcgi.pl to see where the parameters are gone and then write the appropriate perl program.


Passing parameters trough forms

Passing parameters to scripts by writing them in the URL is rather clumsy for the user. There is a more elegant way that consists in asking the user these parameters through a form. A form is a normal html page with some special commands that instruct the browser to set up windows,buttons,etc. You can steal a form, just by browsing around until you find one that fits your needs, and then saving it on disk. Anyhow, if you want to know everything about forms, just try this URL: http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html,

The simplest form is the following:

<TITLE>Request of a single parameter for a script </TITLE>
<H1>Request of a single parameter for a script </H1>
<FORM METHOD="POST" ACTION="http://www.ba.infn.it/cgi-bin/nph-type2.pl">
Give here your parameter:<INPUT NAME="parameter"><p>
To start the script push: <INPUT TYPE="submit" VALUE="here">
</FORM>
Try it here.

The script activated by the form is a dummy one containing , as usual only the commands that show what has been input to the script. By looking at the output of this script, after having entered something in the input field, you can see that your parameter goes in the input stream ("STDIN") in the form

parameter=value_you_wrote
Now you can write the real script using the input field. For example, this is the same form and script changed in such a way that it requests a file name and types it.

A more complex example of script can be found in two pieces called nph-guest.pl and nph-gshow.pl. These are used to implement a kind of guestbook
Here is instead an example of form used to collect data.


Debugging scripts

Writing scripts isn't very difficult, but their debugging is rather tricky. Here are some guidelines and tips that will help you:

Images with scripts attached

There is a third way to pass parameters to a script. This is through a image. If you write:
<a href="/htbin/porta4.com"> <IMG SRC="porta4.gif" ISMAP ></A>
then a script called porta4.com is activated whenever we click on the image porta4.gif. The parameter passed to the script are the clicked pixel coordinates. Here you see the script that is connected to the inline image in the following document. (Note that this script is written in a language different from Perl and works on a Vax Openvms machine:the previous Perl scripts were running on a NT machine. )

How it works? When you click on the image, Netscape computes the coordinates of the pixel, let's say 50,30 and sends a request for the following URL:

 http://www.ba.infn.it/htbin/porta4.com?50,30
The script will process the parameters to do something that depends from the part of the image selected:in this case it will compute a different image for each pixel.

Try to open the URL directly ,without clicking the image and you will get the same result.


Clickable images

(Note:this section describes server side image maps :now that most browsers have introduced the client side image maps, this part may be considered obsolete.)

By now you must have got the (right) impression that writing scripts is a dirty affair and also depends heavily on the machine where the server runs. This is very different from HTML which is platform independent. The only notable exception are clickable images that allow also to the inexperienced user to write sophisticated interfaces involving scripts.

First try the following example of clickable image of this kind. In this case the result of the click will be interpreted by the server using the map in galleria.map file. You "program" the hypertext by writing the map file for the image.

The map file will define a set of geometrical shapes in the image and connect to each one of them a different URL. The shapes that you can use are:rectangle,circle,polygonal.

How you compute the coordinates to define a shape? The easiest way is to use netscape itself.

Programs that will create automathically these configuration files exist on some platforms. A more complex example of clickable image can be found here


Where to go from here

This document should get you started. To learn more remember to steal, steal, steal. That is, use the built-in "view source" option built-in to most browsers to view the HTML commands that make up some of your favorite pages on the web. I'm not advocating that you out-right steal documents, but study the formatting commands that are used and try to use the same tricks in your own documents.

Also, check out our Bibliography page for more advanced docs.


Giuseppe Zito

Last modified :