Cari di HTML4 
    HTML4 User Manual
Daftar Isi
(Sebelumnya) 2. Introduction to HTML 44. Conformance : requirements ... (Berikutnya)

3. On SGML and HTML

This section of the document introduces SGML and discusses itsrelationship to HTML. A complete discussion of SGML is left to the standard(see [ISO8879]).

3.1 Introduction to SGML

SGML is a system for defining markup languages. Authorsmarkup their documents by representing structural, presentational,and semantic information alongside content. HTML is one example of a markuplanguage. Here is an example of an HTML document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><HTML>   <HEAD>  <TITLE>My first HTML document</TITLE>   </HEAD>   <BODY>  <P>Hello world!   </BODY></HTML>

An HTML document is divided into a head section (here, between <HEAD>and </HEAD>) and a body (here, between <BODY> and </BODY>).The title of the document appears in the head (along with other informationabout the document), and the content of the document appears in the body. Thebody in this example contains just one paragraph, marked up with <P>.

Each markup language defined in SGML is called an SGMLapplication. An SGML application is generally characterizedby:

  1. An SGML declaration. The SGMLdeclaration specifies which characters and delimiters mayappear in the application.
  2. A document type definition (DTD).The DTDdefines the syntax of markup constructs. The DTD may include additionaldefinitions such as character entityreferences.
  3. A specification that describes the semantics to be ascribed to the markup.This specification also imposes syntax restrictions that cannot be expressedwithin the DTD.
  4. Document instances containing data (content) and markup. Each instancecontains a reference to the DTD to be used to interpret it.

This specification includes an SGMLdeclaration, three document type definitions (see the section on HTML version information for a description of thethree), and a list of characterreferences.

3.2 SGML constructs used in HTML

The following sections introduce SGML constructs that are used in HTML.

The appendix lists some SGMLfeatures that are not widely supported by HTML tools and user agents andshould be avoided.

3.2.1 Elements

An SGML document type definition declareselement types that represent structures or desired behavior.HTML includes element types that represent paragraphs, hypertext links, lists,tables, images, etc.

Each element typedeclaration generally describes three parts: a start tag,content, and an end tag.

The element's name appears in the starttag (written <element-name>) and the end tag (written </element-name>);note the slash before the element name in the end tag. For example, the startand end tags of the UL element type delimit the items in a list:

<UL><LI><P>...list item 1...<LI><P>...list item 2...</UL>

Some HTML element types allow authors to omit end tags (e.g., the P and LIelement types). A few element types also allow the start tags to be omitted; for example, HEAD and BODY. The HTML DTD indicates for each elementtype whether the start tag and end tag are required.

Some HTML element types have no content. For example, the line break elementBR has no content; its only role is to terminate a line of text.Such empty elements never have end tags. The document type definition and the text of thespecification indicate whether an element type is empty (has no content) or, ifit can have content, what is considered legal content.

Element names are always case-insensitive.

Please consult the SGML standard for information about rules governingelements (e.g., they must be properly nested, an end tag closes, back to thematching start tag, all unclosed intervening start tags with omitted end tags(section 7.5.1), etc.).

For example, the following paragraph:

<P>This is the first paragraph.</P>...a block element...

may be rewritten without its end tag:

<P>This is the first paragraph....a block element...

since the <P> start tag is closed by the following block element.Similarly, if a paragraph is enclosed by a block element, as in:

<DIV><P>This is the paragraph.</DIV>

the end tag of the enclosing block element (here, </DIV>) implies theend tag of the open <P> start tag.

Elements are not tags. Some people refer to elements astags (e.g., "the P tag"). Remember that the element is one thing, and the tag(be it start or end tag) is another. For instance, the HEAD element is alwayspresent, even though both start and end HEAD tags may be missing in themarkup.

All the element types declared in this specification are listed in the element index.

3.2.2 Attributes

Elements may have associated properties, called attributes,which may have values (by default, or set by authors or scripts).Attribute/value pairs appear before the final ">" of an element's start tag.Any number of (legal) attribute value pairs, separated by spaces, may appear inan element's start tag. They may appear in any order.

In this example, the id attribute is set for an H1 element:

<H1 id="section1">This is an identified heading thanks to the id attribute</H1> 

By default, SGML requires that all attribute values be delimited using either double quotation marks(ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quotemarks can be included within the attribute value when the value is delimited bydouble quote marks, and vice versa. Authors may also use numeric character references to represent doublequotes (&#34;) and single quotes (&#39;). For double quotes authors canalso use the character entity reference&quot;.

In certain cases, authors may specify the value of an attribute without anyquotation marks. The attribute value may only contain letters (a-z and A-Z),digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46),underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommendusing quotation marks even when it is possible to eliminate them.

Attribute names are always case-insensitive.

Attribute values are generally case-insensitive. Thedefinition of each attribute in the reference manual indicates whether itsvalue is case-insensitive.

All the attributes defined by this specification are listed in the attribute index.

3.2.3 Characterreferences

Character references arenumeric or symbolic names for characters that may be included in an HTMLdocument. They are useful for referring to rarely used characters, or thosethat authoring tools make it difficult or impossible to enter. You will seecharacter references throughout this document; they begin with a "&" signand end with a semi-colon (;). Some common examples include:

  • "&lt;" represents the < sign.
  • "&gt;" represents the > sign.
  • "&quot;" represents the " mark.
  • "&#229;" (in decimal) represents the letter "a" with a small circleabove it.
  • "&#1048;" (in decimal) represents the Cyrillic capital letter "I".
  • "&#x6C34;" (in hexadecimal) represents the Chinese character forwater.

We discuss HTML character referencesin detail later in the section on the HTML documentcharacter set. The specification also contains a list of character references that may appear inHTML 4 documents.

3.2.4 Comments

HTML comments have the following syntax:

<!-- this is a comment --><!-- and so is this one, which occupies more than one line -->

White space is not permitted between the markup declaration opendelimiter("<!") and the comment open delimiter ("--"), but is permittedbetween the comment close delimiter ("--") and the markup declaration closedelimiter (">"). A common error is to include a string of hyphens ("---")within a comment. Authors should avoid putting two or more adjacent hyphensinside comments.

Information that appears between comments has no special meaning (e.g., character references are not interpreted assuch).

Note that comments are markup.

3.3 How to read the HTMLDTD

Each element and attribute declaration in this specification is accompaniedby its document type definition fragment. Wehave chosen to include the DTD fragments in the specification rather than seeka more approachable, but longer and less precise means of describing anelement's properties. The following tutorial should allow readers unfamiliar with SGML to read the DTDand understand the technical details of the HTML specification.

3.3.1 DTD Comments

In DTDs, comments mayspread over one or more lines. In the DTD, comments are delimited by a pair of"--" marks, e.g.

<!ELEMENT PARAM - O EMPTY   -- named property value -->
Here, the comment "named property value" explains the use of the PARAM element type. Comments in the DTD are informative only.

3.3.2 Parameter entity definitions

The HTML DTD begins with a series ofparameter entity definitions. A parameter entitydefinition defines a kind of macro that may be referenced andexpanded elsewhere in the DTD. These macros may not appear in HTML documents,only in the DTD. Other types of macros, called character references, may be used in the text of an HTML document or withinattribute values.

When the parameter entity is referred to by name in the DTD, it is expandedinto a string.

A parameter entity definition begins with the keyword <!ENTITY %followed by the entity name, the quoted string the entity expands to, andfinally a closing >. Instances of parameter entities in a DTD beginwith "%", then the parameter entity name, and terminated by an optional";".

The following example defines the string that the "%fontstyle;" entity willexpand to.

<!ENTITY % fontstyle "TT | I | B | BIG | SMALL">

The string the parameter entity expands to may contain other parameterentity names. These names are expanded recursively. In the following example,the "%inline;" parameter entity is defined to include the "%fontstyle;","%phrase;", "%special;" and "%formctrl;" parameter entities.

<!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; | %formctrl;">

You will encounter two DTD entities frequently in the HTML DTD: "%block;""%inline;". They are used when the contentmodel includes block-level andinline elements, respectively (defined in the section on the global structure of an HTML document).

3.3.3 Elementdeclarations

The bulk of the HTML DTD consists of the declarations of element types and their attributes. The <!ELEMENT keywordbegins a declaration and the > character ends it. Between these arespecified:

  1. The element's name.
  2. Whether the element's tags are optional. Twohyphens that appear after the element name mean that the start and end tags aremandatory. One hyphen followed by the letter "O" indicates that the end tag canbe omitted. A pair of letter "O"s indicate that both the start and end tags canbe omitted.
  3. The element's content, if any. The allowed content for an element is calledits content model. Element typesthat are designed to have no content are called emptyelements. The content model for such element types is declaredusing the keyword "EMPTY".

In this example:

 <!ELEMENT UL - - (LI)+>
  • The element type being declared is UL.
  • The two hyphens indicate that both the start tag <UL> and the end tag</UL> for this element type are required.
  • The content model for this element type is declared to be "at least one LIelement". Below, we explain how to specify content models.

This example illustrates the declaration of an empty element type:

 <!ELEMENT IMG - O EMPTY>
  • The element type being declared is IMG.
  • The hyphen and the following "O" indicate that the end tag can be omitted,but together with the content model "EMPTY", this is strengthened to the rulethat the end tag must be omitted.
  • The "EMPTY" keyword means that instances of this type must not havecontent.

Content model definitions 

The content model describes what may be contained by an instance of anelement type. Content model definitions mayinclude:

  • The names of allowed or forbidden element types (e.g., the ULelement contains instances of the LI element type, and the Pelement type may not contain other P elements).
  • DTD entities (e.g., the LABEL element contains instances of the"%inline;" parameter entity).
  • Document text (indicated by the SGML construct "#PCDATA"). Text may containcharacter references. Recall that thesebegin with & and end with a semicolon (e.g., "Herg&eacute;'s adventuresof Tintin" contains the character entity reference for the "e acute"character).

The content model of an element is specified with the following syntax.Please note that the list below is a simplification of the full SGML syntaxrules and does not address, e.g., precedences.

( ... )
Delimits a group.
A
A must occur, one time only.
A+
A must occur one or more times.
A?
A must occur zero or one time.
A*
A may occur zero or more times.
+(A)
A may occur.
-(A)
A must not occur.
A | B
Either A or B must occur, but not both.
A , B
Both A and B must occur, in that order.
A & B
Both A and B must occur, in any order.

Here are some examples from the HTML DTD:

   <!ELEMENT UL - - (LI)+>

The UL element must contain one or more LIelements.

   <!ELEMENT DL - - (DT|DD)+>

The DL element must contain one or more DTor DD elements in any order.

   <!ELEMENT OPTION - O (#PCDATA)>

The OPTION element may only contain text and entities, such as&amp; -- this is indicated by the SGML data type #PCDATA.

A few HTML element types use an additional SGML feature to exclude elementsfrom their content model. Excludedelements are preceded by a hyphen. Explicit exclusions overridepermitted elements.

In this example, the -(A) signifies that the element Acannot appear in another A element (i.e., anchors may not be nested).

   <!ELEMENT A - - (%inline;)* -(A)>

Note that the A element type is part of the DTD parameter entity"%inline;", but is excluded explicitly because of -(A).

Similarly, the following element type declaration for FORM prohibits nested forms:

   <!ELEMENT FORM - - (%block;|SCRIPT)+ -(FORM)>

3.3.4 Attributedeclarations

The <!ATTLIST keyword begins the declaration ofattributes that an element may take. It is followed by the name ofthe element in question, a list of attribute definitions, and a closing >.Each attribute definition is a triplet that defines:

  • The name of an attribute.
  • The type of the attribute's value or an explicit set of possible values.Values defined explicitly by the DTD are case-insensitive. Please consult the section on basic HTML data types for more information aboutattribute value types.
  • Whether the default value of the attribute is implicit (keyword "#IMPLIED"), inwhich case the default value must be supplied by the user agent (in some casesvia inheritance from parent elements); always required (keyword "#REQUIRED"); orfixed to the given value(keyword "#FIXED"). Some attribute definitions explicitly specify a defaultvalue for the attribute.

In this example, the name attribute is defined for theMAP element. The attribute is optional for this element.

<!ATTLIST MAP  name CDATA #IMPLIED  >

The type of values permitted for the attribute is given as CDATA, an SGMLdata type. CDATA is text that may contain character references.

For more information about "CDATA", "NAME", "ID", and other data types,please consult the section on HTML data types.

The following examples illustrate several attribute definitions:

rowspan NUMBER 1 -- number of rows spanned by cell --http-equiv  NAME   #IMPLIED  -- HTTP response header name  --id  ID #IMPLIED  -- document-wide unique id -- valign  (top|middle|bottom|baseline) #IMPLIED

The rowspan attribute requires values of type NUMBER. The defaultvalue is given explicitly as "1". The optional http-equiv attributerequires values of type NAME. The optional id attribute requiresvalues of type ID. The optional valign attribute is constrained totake values from the set {top, middle, bottom, baseline}.

DTD entities in attribute definitions 

Attribute definitions may also contain parameter entity references.

In this example, we see that the attribute definition list for the LINK element begins with the "%attrs;" parameter entity.

<!ELEMENT LINK - O EMPTY   -- a media-independent link --><!ATTLIST LINK  %attrs;  -- %coreattrs, %i18n, %events --  charset %Charset;  #IMPLIED  -- char encoding of linked resource --  href %URI;  #IMPLIED  -- URI for linked resource --  hreflang %LanguageCode; #IMPLIED  -- language code --  type %ContentType;  #IMPLIED  -- advisory content type --  rel %LinkTypes; #IMPLIED  -- forward link types --  rev %LinkTypes; #IMPLIED  -- reverse link types --  media   %MediaDesc; #IMPLIED  -- for rendering on these media --  >

Start tag: required, End tag: forbidden

The "%attrs;" parameter entity is defined as follows:

<!ENTITY % attrs "%coreattrs; %i18n; %events;">

The "%coreattrs;" parameter entity in the "%attrs;" definition expands asfollows:

<!ENTITY % coreattrs "id  ID #IMPLIED  -- document-wide unique id --  class   CDATA  #IMPLIED  -- space-separated list of classes --  style   %StyleSheet;   #IMPLIED  -- associated style info --  title   %Text; #IMPLIED  -- advisory title --"  >

The "%attrs;" parameter entity has beendefined for convenience since these attributes are defined for most HTMLelement types.

Similarly, the DTD defines the "%URI;" parameter entity as expanding intothe string "CDATA".

<!ENTITY % URI "CDATA" -- a Uniform Resource Identifier,   see [URI] -->

As this example illustrates, the parameter entity "%URI;" provides readersof the DTD with more information as to the type of data expected for anattribute. Similar entities have been defined for "%Color;", "%Charset;","%Length;", "%Pixels;", etc.

Boolean attributes 

Some attributes play the role of boolean variables (e.g., the selected attribute for the OPTION element). Their appearance in the start tag of an elementimplies that the value of the attribute is "true". Their absence implies avalue of "false".

Boolean attributes may legally take a single value: the name of theattribute itself (e.g., selected="selected").

This example defines the selected attribute to be aboolean attribute.

selected (selected)  #IMPLIED  -- option is pre-selected --

The attribute is set to "true" by appearing in the element's start tag:

<OPTION selected="selected">...contents...</OPTION>

In HTML, boolean attributes may appear in minimized form -- theattribute's value appears alone in the element's start tag.Thus, selected may be set by writing:

<OPTION selected>

instead of:

<OPTION selected="selected">

Authors should be aware that many user agents onlyrecognize the minimized form of boolean attributes and not the full form.

Copyright © 1997-1999 W3C® (MIT, INRIA, Keio), All Rights Reserved.
(Sebelumnya) 2. Introduction to HTML 44. Conformance : requirements ... (Berikutnya)