Getting Started with Web Pages

Introduction
- Purpose of this document
- Limits of this document
Structure separate from appearance
- Electronic publishing features
- Structure
- Appearance
Containers and tags
- Purpose of containers
- Containers are delimited by tags
- Containers should not overlap
- Some tags can take one or more parameters
Information whereabouts
- Hypertext and addressing
- URL format
- Protocols
- File specifications
- Labels
Set up the framework of the page
- Purpose of the framework
- HTML container
- HEAD container
- BODY container
Fill in content
- TITLE container
- P containers (optionally open)
- IMG open containers
- EMBED open containers
Delineate structure
- H containers
- Style and Phrase containers
- BR open containers
- HR open containers
- List containers
- TABLE containers
Establish links
- A containers with HREF parameter (link origin)
- A containers with NAME parameter (link destination)

Introduction
- Purpose of this document
  - This document is intended as an introduction for first-time authors to the HyperText Markup Language (HTML), the standard used to compose web pages. It assumes a basic understanding of Internet communications and Unix naming conventions.
  - In this document, we will attempt to condense the knowledge necessary to create a basic web page into 3 concepts and 4 activities:
    - Concept 1: the structure of a web page and its appearance are defined separately.
    - Concept 2: the structure of a web page is indicated by elements, which in turn are delimited by tags.
    - Concept 3: the information within web pages can be linked to other information items, each identified in a standard manner.
    - Activity 1: define the basic framework for the page.
    - Activity 2: fill in multimedia content (text, pictures, animations, etc.)
    - Activity 3: delineate the structure of the page so readers can better use it.
    - Activity 4: establish origin and destination of links to additional material.
- Limits of this document
  - The basic instructions contained within leave out many advanced web features. The main goal here is ease of learning (and, as an added bonus, the generation of pages compatible with virtually all current browser software).
    - You should be able to flesh out the basic pages resulting from this process by incrementally adding elements, without losing any of the work you did initially.
  - The information presented here is based on a my own understanding of the HTML standard, and of its actual implementations. Due to wide-ranging and rapid changes in this area, not everything will work as described here in all circumstances.
    - Regarding terminology, I have tried to comply with general usage as much as possible. Changing and conflicting definitions, as well as a desire to simplify the topic, may have led me to use some terms in unintended ways.

Structure separate from appearance
- Electronic publishing features
  - Unlike desktop publishing, which is geared towards the inflexible format of paper, web page design should take full advantage of the flexibility of electronic publishing.
  - To this end, the standard used to define web pages, the HyperText Markup Language (HTML), allows for strict separation between the structure of the web page and the way that same structure is expressed (visually, aurally, spatially, etc.)
  - As HTML evolved, the separation of structure and appearance was increasingly muddled by ad-hoc additions (such as font specifications). In the upcoming version 4.0 of the standard, this separation is reaffirmed anew by the introduction of stylesheets--which also afford more visual diversity than ever before possible.
- Structure
  - Defining the structure of the web page is what the author does when he/she marks up the page.
  - The structure of the page defines the relative importance and purpose of its elements. For example:
    - Main headlines are defined as more prominent than subheadings.
    - Body type is divided into paragraphs.
    - Lists of items are classified hierarchically.
    - Tabular material is divided into rows and columns.
  - Structural specifications do not imply a specific appearance for the content. Rather, they spell out the purpose of appearance specifications (for instance, a main headline may take on a bold appearance because its purpose is to be more prominent).
- Appearance
  - The appearance of the web page is determined at the time the page is formatted by the browser program.
  - Formatting is partly the result of the author's choices and the browser's defaults--but ultimately both should yield to the reader's preferences.
    - Even when readers use the same browser software and the same type of computer as the author, differences in the appearance of the page will occur--due to availability of fonts, width of windows, type of browser window features enabled, display hardware and software settings, and many many other variables. Attempting to fully specify appearance in a web page is a counterproductive waste of time.
  - The goal of formatting should be to convey the structure and content of the page in a way best suited to the individual reader. It should not be to constrain the page within a preconceived notion of what looks or works best.

Elements and tags
- Purpose of Elements
  - Mark up involves assigning each content item to a specific place in the structure of a web page. This is done by assigning the content item to an element.
  - Except for special circumstances, HTML will ignore any other formatting of the text in the HTML file. In particular, carriage returns, tabs, and multiple spaces will not appear in the browser: they can be used to clarify the HTML code itself for the benefit of the author--without interfering with the reader's view of the page.
- Container elements are delimited by start and end tags
  - A start tag in HTML is a word (the name of the tag) enclosed in angle brackets (less-than and greater-than symbols).
    <sampletag>
  - An end tag is the same as the corresponding start tag, but adds a leading slash.
    <sampletag></sampletag>
- Empty elements are indicated by a start tag only.
- Containers should not overlap
  - If a container starts before a previous container closes, then it should also close before the previous container.
  - In other words, if a container is inside of another container, no part of it should 'stick out'.
  - These containers are okay, since they do not overlap (container B is fully inside container A)
    <sampleA> <sampleB> </sampleB> </sampleA>
  - These containers are okay, since they also do not overlap (container B is fully outside container A)
    <sampleA> </sampleA> <sampleB> </sampleB>
  - These containers won't work, since they do overlap (container B is partially inside container A)
    <sampleA> <sampleB> </sampleA> </sampleB>
- Some tags can take one or more parameters
  - The parameters are used to fully specify the meaning of the tag. For instance, a tag that calls up a picture will take a parameter with the location of the picture file.
  - Parameters appear within the same angle brackets as the tag name, following the tag name. Each parameter is separated by one or more spaces.
    - The parameters are only listed after the start tag, not after the end tag.
  - Each parameter is made up of a word (the name of the parameter), followed by = (equals sign), followed by the value of the parameter.
    <sampletag parameter1="some text" parameter2=256>
    - The parameter value may be a number, a text string, or whatever combination is appropriate for the specific parameter.
    - The parameter value may or may not need to be enclosed in " (straight double quotes). As a rule of thumb, text parameters should be quoted, numeric parameters shouldn't.

Information whereabouts
- Hypertext and addressing
  - An essential feature of the Web is that it supports hypertext--the linking of information items in various locations within a page, within a web site, and even across web sites.
  - To achieve this linking it is necessary to specify the location of the information items down to a very detailed level--more detailed than previous Internet addressing standards would allow. Linking requirements led to the development of Universal Resource Locators (URLs), an addressing scheme that extends previous Internet standards.
- URL format
  - Most fully qualified URLs contain the specification for the computer file that contains the information item, a protocol (indicating how the reader's computer should access the file), and a label (indicating the specific location of the information item within the file).
    - If all three are present, they are listed in the following order: the protocol, then a : (colon), then the file specification, then a # (pound sign), then the label.
  - Some protocols are not used to access files, and the corresponding URLs have different formats.
  - While the actual results are dependent on the computer hosting the web site, you should assume that URLs are case sensitive.
- Protocols
  - The most common protocol is the HyperText Transfer Protocol (HTTP), used to access files through web server software. It is indicated by http.
  - Other commonly used protocols are:
    - File Transfer Protocol (FTP), used to download and upload large files. Indicated by ftp.
    - Gopher, used for electronic publishing over the Internet before the Web. Indicated by gopher.
    - File--indicating that the file should be accessed through the normal operating system facilities of the reader's computer. This generally means that the file is stored on the same computer that is running the browser software. Indicated by file.
  - Two other commonly used protocols that require a special URL format:
    - Mail--used to send e-mail. Indicated by mailto. Instead of a file specification and label, it takes a standard Internet mail address. For example:
      mailto:somebody@some.place.com
      would send mail to the user 'somebody' of the computer with DNS name 'some.place.com'
    - Net News Transfer Protocol (NNTP), indicated by news. Takes a standard Internet newsgroup designation instead of a file specification and label. For example:
      news:alt.somehobby
      would acces the messages posted to the newsgroup 'alt.somehobby'
- File specifications
  - A relative specification starts from the location of the file containing the URL, and lists the folders (directories) that must be opened in order to get to the file containing the information item.
    - Relative specifications are preferrable, since they allow some changes to the location of the files without having to update the URLs.
    - Relative specifications cannot be used to access files on a site different from the one where the URL resides.
    - The elements of the file specification are listed according to Unix conventions: directory names are separated by a / (forward slash), and an enclosing (parent) directory is indicated by .. (two periods).
  - Absolute file specifications start from a fixed location, then list folders leading to the file containing the information item.
    - A file specification starting with a leading slash is an absolute file specification starting at the root (top level) of the filesystem of the same site where the URL resides.
    - To refer to a location on a different site, the file specification should start with // (two forward slashes), the DNS name or IP number corresponding to the site, then the absolute path name to the file starting from the root of the site.
  - If the file specification is missing, the browser program assumes that it should look within the file that contains the URL itself.
- Labels
  - Prior to using a URL containing a label, the same label must be attached to a portion of the destination file using the appropriate markup.
  - If a URL does not include a label, it defaults to the beginning of the destination file.

Set up the framework of the page
- Purpose of the framework
  - This is the bare minimum necessary to create a valid, but empty, web page. It sets up the locations where the actual contents will be placed.
- HTML container
  - The 'outermost' container for the page. All other containers are located inside this container.
    <HTML> </HTML>
- HEAD container
  - The first part of the web page, mostly containing items invisible to the reader (for instance, indexing information, language, authorship) and used by automatic retrieval systems.
  - The HEAD container goes inside the HTML container.
    <HTML> <HEAD> </HEAD> </HTML>
- BODY container
  - The main part of the web page, containing the information displayed to the reader.
  - The BODY container goes inside the HTML container, outside of and after the HEAD container.
    <HTML> <HEAD> </HEAD> <BODY> </BODY> </HTML>

Fill in content
- TITLE container
  - This is the only content visible to the reader which goes inside the HEAD container. All other content items we will consider will go inside the BODY container.
  - TITLE contains a short piece of text describing the purpose of the web page.
    <TITLE>How to Bake Carrot Muffins</TITLE>
  - A succinct but descriptive title will help readers find the page in the history and bookmark menus available in most browsers. It will also improve automatic indexing of the page by Internet search engines.
  - By default, most browsers display the contents of TITLE in the title bar of the browser window.
- P containers (optionally empty)
  - A separate P container is used for each paragraph of the text content of the web page.
  - Most browsers accept both open and closed P containers. The following two examples are generally interpreted in the same manner:
    <P>Mix eggs and flour in a bowl. <P>Mix eggs and flour in a bowl.</P>
  - Using paragraph containers appropriately highlights the meaning of the text, and makes it more legible.
  - By default, most browsers separate paragraphs with blank lines.
- IMG empty element
  - IMG containers are used for images displayed within the body of the web page, called inline images. Non-inline images are the ones displayed separately from the calling web page, using a hyperlink.
  - The IMG container is open (it does not require a matching end tag). It does however require a SRC parameter, whose value is the URL pointing to the image file:
    <IMG SRC="http://pix.sample.com/images/sampleimage.gif">
  - Adding parameters for the size (in pixels) of the image will speed up the display of the page, since the text can be paginated immediately, before the graphics complete loading. Many graphics programs (such as Photoshop) will provide this information.
    <IMG SRC="otherimage.gif" WIDTH=256 HEIGHT=128>
  - By default, most browsers will display GIF and JPEG images inline. More recent versions add support for PNG images.
- EMBED empty elements
  - This is used for other inline media elements (video, sound, animation, etc.) It is a Netscape extension, not an approved part of the HTML standard, and may be eventually replaced by a similar OBJECT container.
  - The EMBED container is also open. In its basic format it is similar to the IMG container--it require a SRC parameter, whose value is the URL pointing to the media file:
    <EMBED SRC="http://glitz.sample.com/animations/samplemovie.dcr">
  - The EMBED container may take a variety of parameters, depending on the specific requirements of the type of media displayed. WIDTH and HEIGHT are commonly used for visual media.
  - Embedded media elements are generally not supported directly by the browser--readers will need to install appropriate software add-ons called plugins.

Delineate structure
- H containers
  - This is a family of containers, delimited by tags named H1, H2, ... through H6. These containers hold text headings, decreasing in prominence as the tag number increases. Typically, H1 is used only once for the main headline at the top of the page, while H2 and H3 are used for subheads within the text.
  - In the following example, the main headline is followed by a brief paragraph, then by a subhead and another paragraph :
    <H1>Carrot Muffin Central</H1> <P>Carrot muffins are good for you. Here is how to whip'em up.</P> <H2>Getting Started</H2> <P>Mix eggs and flour in a bowl.</P>
  - Heads and subheads are important signposts that assist the reader in understanding how the web page is organized.
  - By default, most browsers display headings as larger, bold type, followed by a blank line. Beyond H3 or H4, however, it is generally hard to distinguish the various heading levels.
- Style and Phrase containers
  - These containers are used for small portions of text that need to be displayed in a unique manner (for instance, to emphasize a technical word).
  - Some style tags will mandate the manner in which the content is displayed (for instance, by italicizing it), thus partially violating the separation between structure and appearance:
    <P><B>Carrot</B> muffins are <I>good</I> for you.</P>
    - In this example, the word 'carrot' will be bolded, and 'good' will be italicized.
  - Other tags simply indicate that the text needs to be differentiated somehow, leaving the specifics to the browser and, possibly, to the reader:
    <P><STRONG>Carrot</STRONG> muffins are <EM>good</EM> for you.</P>
    - This example shows the two available emphasis tags, EM (interpreted as italics in most browsers), and STRONG (generally interpreted as bold). Different visual and/or aural devices could be used to highlight the text in these containers.
  - A special case is where type needs to be displayed in a monospaced font (one whose characters are all the same width). This may be necessary to align tabular data without resorting to tables (which may not be supported in some browsers). Putting type in a TT container will accomplish this:
    <H2>Ingredients</H2> <P><TT>Flour__________2 lb.</TT></P> <P><TT>Eggs___________6</TT></P> <P><TT>Carrots________1.5 lb.</TT></P>
- BR empty element
  - BR inserts a line break in the text content of the web page. Notice that this is conceptually different from delineating a paragraph, and is generally displayed differently in browsers (P is followed by a blank line, BR isn't)
  - One use of BR is to break the lines of a poem according to the meter of the verse, separately from the division into stanzas (which may be handled as paragraphs):
    <P>Carrots are orange<BR> Berries are blue<BR> Muffins are yummy<BR> And berries are, too.</P>
- HR empty element
  - HR inserts a horizontal rule (line) in the body of the web page.
  - HR is useful to indicate the boundaries between major sections of content, and to visually organize the web page:
    <P>This paragraph concludes our discussion of carrot muffins.</P> <HR> <H3>All About Blueberry Muffins</H3> <P>Man does not live by carrot muffins alone.</P>
- List containers
  - This is another family of containers used to list content items in hierarchical order, usually displayed in an indented outline format.
  - Most lists are made up of three nested levels:
    - The enclosing list container. Of the many types included in early versions of HTML, only two are now recommended:
      - UL (unorderd list), generally displayed as a bulleted list.
      - OL (orderd list), generally displayed as a numbered list.
    - One or more LI (list items) empty elements inside the list container
    - Actual content items (text and/or graphics) inside each LI.
  - A special case is a DL (definition list). Instead of listing simple LIs, it contains:
    - DT (definition term), meant to be used for the word or words being defined.
    - DD (definition defined), used for the explanation of DT's contents, and usually displayed indented from it.
  - To create more complex outlines, lists can be nested (e.g., an LI can contain a list container, possibly of a different kind).
  - In the example below, an unordered list is nested within an ordered list:
    <OL> <LI>Build up an appetite <LI>Pick your berry <UL> <LI>Boysenberry <LI>Blueberry <LI>Raspberry <LI>Strawberry </UL> <LI>Storm into the kitchen </OL>
- TABLE containers
  - TABLE, with its subcontainers, is used to list content items in tabular order, usually displayed in a grid of rows and columns.
    - TABLEs are used extensively to create more complex layouts than early HTML would allow. Unfortunately this makes for a rigid arrangement that is not amenable to alternative appearances. More flexible aproaches for positioning content elements have emerged in later versions of HTML.
  - A complete table is made up of four nested levels:
    - The enclosing TABLE
    - One or more TR (table row) containers inside TABLE
    - One or more cell containers inside each row. These can be TH (table headers, displayed more prominently) or TD (table data, ordinary content).
    - Actual content items (text and/or graphics) inside each cell.
  - To create more complex arrangements, tables can be nested (e.g., a cell container can contain a TABLE container).
  - Except for TABLE itself (which always requires a closing tag), the other containers can be entered either with or without an end tag.
  - The example below shows a simple 5 rows, 2 columns table with a row of 2 header cells and 4 rows of 2 data cells each:
    <TABLE> <TR> <TH>Berry <TH>Color <TR> <TD>Blueberry <TD>blue <TR> <TD>Boysenberry <TD>purple <TR> <TD>Raspberry <TD>magenta <TR> <TD>Strawberry <TD>red </TABLE>

Establish links
- A containers with HREF parameter (link origin)
  - These contain the text or picture which the reader should click to activate the link.
    <A HREF="http://www.someplace.com/news/info.html#contents">click here to see a news summary</A>
  - The value of the HREF parameter is the URL pointing to the destination of the link.
  - By default, most browsers display the contents of these containers as blue underlined text. The color changes after the link has been activated, to remind the reader that s/he has seen it already.
- A containers with NAME parameter (link destination)
  - These contain the text or picture that should be displayed after the reader clicks a link leading to the labeled location.
    <A NAME="contents">News Summary</A>
  - The value of the NAME parameter is the label attached to the contents. This label can be used in URLs pointing to the location.
  - By default, most browsers do not highlight the contents of these containers in any special way.