Timed text tracks - HTML 5 | Wiki eduNitas.com

HTML 5

Daftar Isi ☛

(Sebelumnya) 4.8.10.11. Synchronising multi ...

4.8.10.13. User interface (Berikutnya)

4.8.10.12. Timed text tracks

4.8.10.12.1 Text track model

A media element can have a group of associated text tracks, known as the media element's list of text tracks. The text tracks are sorted as follows:

The text tracks corresponding to track element children of the media element, in tree order.
Any text tracks added using the addTextTrack() method, in the order they were added, oldest first.
Any media-resource-specific text tracks (text tracks corresponding to data in the media resource), in the order defined by the media resource's format specification.

A text track consists of:

The kind of text track

This decides how the track is handled by the user agent. The kind is represented by a string. The possible strings are:

subtitles
captions
descriptions
chapters
metadata

The kind of track can change dynamically, in the case of a text track corresponding to a track element.

A label

This is a human-readable string intended to identify the track for the user.

The label of a track can change dynamically, in the case of a text track corresponding to a track element.

When a text track label is the empty string, the user agent should automatically generate an appropriate label from the text track's other properties (e.g. the kind of text track and the text track's language) for use in its user interface. This automatically-generated label is not exposed in the API.

An in-band metadata track dispatch type

This is a string extracted from the media resource specifically for in-band metadata tracks to enable such tracks to be dispatched to different scripts in the document.

For example, a traditional TV station broadcast streamed on the Web and augmented with Web-specific interactive features could include text tracks with metadata for ad targeting, trivia game data during game shows, player states during sports games, recipe information during food programs, and so forth. As each program starts and ends, new tracks might be added or removed from the stream, and as each one is added, the user agent could bind them to dedicated script modules using the value of this attribute.

Other than for in-band metadata text tracks, the in-band metadata track dispatch type is the empty string. How this value is populated for different media formats is described in steps to expose a media-resource-specific text track.

A language

This is a string (a BCP 47 language tag) representing the language of the text track's cues. [BCP47]

The language of a text track can change dynamically, in the case of a text track corresponding to a track element.

A readiness state

One of the following:

Not loaded: Indicates that the text track's cues have not been obtained.
Loading: Indicates that the text track is loading and there have been no fatal errors encountered so far. Further cues might still be added to the track by the parser.
Loaded: Indicates that the text track has been loaded with no fatal errors.
Failed to load: Indicates that the text track was enabled, but when the user agent attempted to obtain it, this failed in some way (e.g. URL could not be resolved, network error, unknown text track format). Some or all of the cues are likely missing and will not be obtained.

The readiness state of a text track changes dynamically as the track is obtained.

A mode

One of the following:

Disabled: Indicates that the text track is not active. Other than for the purposes of exposing the track in the DOM, the user agent is ignoring the text track. No cues are active, no events are fired, and the user agent will not attempt to obtain the track's cues.
Hidden: Indicates that the text track is active, but that the user agent is not actively displaying the cues. If no attempt has yet been made to obtain the track's cues, the user agent will perform such an attempt momentarily. The user agent is maintaining a list of which cues are active, and events are being fired accordingly.
Showing: Indicates that the text track is active. If no attempt has yet been made to obtain the track's cues, the user agent will perform such an attempt momentarily. The user agent is maintaining a list of which cues are active, and events are being fired accordingly. In addition, for text tracks whose kind is subtitles or captions, the cues are being overlaid on the video as appropriate; for text tracks whose kind is descriptions, the user agent is making the cues available to the user in a non-visual fashion; and for text tracks whose kind is chapters, the user agent is making available to the user a mechanism by which the user can navigate to any point in the media resource by selecting a cue.

A list of zero or more cues

A list of text track cues, along with rules for updating the text track rendering. For example, for WebVTT, the rules for updating the display of WebVTT text tracks. [WEBVTT]

The list of cues of a text track can change dynamically, either because the text track has not yet been loaded or is still loading, or due to DOM manipulation.

Each text track has a corresponding TextTrack object.

Each media element has a list of pending text tracks, which must initially be empty, a blocked-on-parser flag, which must initially be false, and a did-perform-automatic-track-selection flag, which must also initially be false.

When the user agent is required to populate the list of pending text tracks of a media element, the user agent must add to the element's list of pending text tracks each text track in the element's list of text tracks whose text track mode is not disabled and whose text track readiness state is loading.

Whenever a track element's parent node changes, the user agent must remove the corresponding text track from any list of pending text tracks that it is in.

Whenever a text track's text track readiness state changes to either loaded or failed to load, the user agent must remove it from any list of pending text tracks that it is in.

When a media element is created by an HTML parser or XML parser, the user agent must set the element's blocked-on-parser flag to true. When a media element is popped off the stack of open elements of an HTML parser or XML parser, the user agent must honor user preferences for automatic text track selection, populate the list of pending text tracks, and set the element's blocked-on-parser flag to false.

The text tracks of a media element are ready when both the element's list of pending text tracks is empty and the element's blocked-on-parser flag is false.

A text track cue is the unit of time-sensitive data in a text track, corresponding for instance for subtitles and captions to the text that appears at a particular time and disappears at another time.

Each text track cue consists of:

An identifier

An arbitrary string.

A start time

The time, in seconds and fractions of a second, that describes the beginning of the range of the media data to which the cue applies.

An end time

The time, in seconds and fractions of a second, that describes the end of the range of the media data to which the cue applies.

A pause-on-exit flag

A boolean indicating whether playback of the media resource is to pause when the end of the range to which the cue applies is reached.

A writing direction

A writing direction, either horizontal (a line extends horizontally and is positioned vertically, with consecutive lines displayed below each other), vertical growing left (a line extends vertically and is positioned horizontally, with consecutive lines displayed to the left of each other), or vertical growing right (a line extends vertically and is positioned horizontally, with consecutive lines displayed to the right of each other).

If the writing direction is horizontal, then line position percentages are relative to the height of the video, and text position and size percentages are relative to the width of the video.

Otherwise, line position percentages are relative to the width of the video, and text position and size percentages are relative to the height of the video.

A snap-to-lines flag

A boolean indicating whether the line's position is a line position (positioned to a multiple of the line dimensions of the first line of the cue), or whether it is a percentage of the dimension of the video.

Cues whose text track cue snap-to-lines flag is set will be placed within the title-safe area on user agents that use overscan. Cues with the flag unset will be positioned as requested (modulo overlap avoidance if multiple cues are in the same place).

A line position

Either a number giving the position of the lines of the cue, to be interpreted as defined by the writing direction and snap-to-lines flag of the cue, or the special value auto, which means the position is to depend on the other active tracks.

A text track cue has a text track cue computed line position whose value is that returned by the following algorithm, which is defined in terms of the other aspects of the cue:

If the text track cue line position is numeric, the text track cue snap-to-lines flag of the text track cue is not set, and the text track cue line position is negative or greater than 100, then return 100 and abort these steps.
If the text track cue line position is numeric, return the value of the text track cue line position and abort these steps. (Either the text track cue snap-to-lines flag is set, so any value, not just those in the range 0..100, is valid, or the value is in the range 0..100 and is thus valid regardless of the value of that flag.)
If the text track cue snap-to-lines flag of the text track cue is not set, return the value 100 and abort these steps. (The text track cue line position is the special value auto.)
Let cue be the text track cue.
If cue is not in a list of cues of a text track, or if that text track is not in the list of text tracks of a media element, return −1 and abort these steps.
Let track be the text track whose list of cues the cue is in.
Let n be the number of text tracks whose text track mode is showing and that are in the media element's list of text tracks before track.
Increment n by one.
Negate n.
Return n.

A text position

A number giving the position of the text of the cue within each line, to be interpreted as a percentage of the video, as defined by the writing direction.

A size

A number giving the size of the box within which the text of each line of the cue is to be aligned, to be interpreted as a percentage of the video, as defined by the writing direction.

An alignment

An alignment for the text of each line of the cue, one of:

Start alignment: The text is aligned towards its start side.
Middle alignment: The text is aligned centered between its start and end sides.
End alignment: The text is aligned towards its end side.
Left alignment: The text is aligned to the left.
Right alignment: The text is aligned to the right.

Which sides are the start and end sides depends on the Unicode bidirectional algorithm and the writing direction. [BIDI]

The text of the cue

The raw text of the cue, and rules for its interpretation, allowing the text to be rendered and converted to a DOM fragment.

Each text track cue has a corresponding TextTrackCue object. A text track cue's in-memory representation can be dynamically changed through this TextTrackCue API.

In addition, each text track cue has two pieces of dynamic information:

The active flag

This flag must be initially unset. The flag is used to ensure events are fired appropriately when the cue becomes active or inactive, and to make sure the right cues are rendered.

The user agent must synchronously unset this flag whenever the text track cue is removed from its text track's text track list of cues; whenever the text track itself is removed from its media element's list of text tracks or has its text track mode changed to disabled; and whenever the media element's readyState is changed back to HAVE_NOTHING. When the flag is unset in this way for one or more cues in text tracks that were showing prior to the relevant incident, the user agent must, after having unset the flag for all the affected cues, apply the rules for updating the text track rendering of those text tracks. For example, for text tracks based on WebVTT, the rules for updating the display of WebVTT text tracks. [WEBVTT]

The display state

This is used as part of the rendering model, to keep cues in a consistent position. It must initially be empty. Whenever the text track cue active flag is unset, the user agent must empty the text track cue display state.

When a text track cue whose active flag is set has its writing direction, snap-to-lines flag, line position, text position, size, alignment, or text change value, then the user agent must empty the text track cue display state, and then immediately run the text track's rules for updating the display of WebVTT text tracks.

The text track cues of a media element's text tracks are ordered relative to each other in the text track cue order, which is determined as follows: first group the cues by their text track, with the groups being sorted in the same order as their text tracks appear in the media element's list of text tracks; then, within each group, cues must be sorted by their start time, earliest first; then, any cues with the same start time must be sorted by their end time, latest first; and finally, any cues with identical end times must be sorted in the order they were last added to their respective text track list of cues, oldest first (so e.g. for cues from a WebVTT file, that would initially be the order in which the cues were listed in the file). [WEBVTT]

4.8.10.12.2 Sourcing in-band text tracks

A media-resource-specific text track is a text track that corresponds to data found in the media resource.

Rules for processing and rendering such data are defined by the relevant specifications, e.g. the specification of the video format if the media resource is a video.

When a media resource contains data that the user agent recognises and supports as being equivalent to a text track, the user agent runs the steps to expose a media-resource-specific text track with the relevant data, as follows.

Associate the relevant data with a new text track and its corresponding new TextTrack object. The text track is a media-resource-specific text track.
Set the new text track's kind, label, and language based on the semantics of the relevant data, as defined by the relevant specification. If there is no label in that data, then the label must be set to the empty string.
Associate the text track list of cues with the rules for updating the text track rendering appropriate for the format in question.
If the new text track's kind is metadata, then set the text track in-band metadata track dispatch type as follows, based on the type of the media resource:

If the media resource is an Ogg file

The text track in-band metadata track dispatch type must be set to the value of the Name header field. [OGGSKELETONHEADERS]

If the media resource is a WebM file

The text track in-band metadata track dispatch type must be set to the value of the CodecID element. [WEBMCG]

If the media resource is an MPEG-2 file

Let stream type be the value of the "stream_type" field describing the text track's type in the file's program map section, interpreted as an 8-bit unsigned integer. Let length be the value of the "ES_info_length" field for the track in the same part of the program map section, interpreted as an integer as defined by the MPEG-2 specification. Let descriptor bytes be the length bytes following the "ES_info_length" field. The text track in-band metadata track dispatch type must be set to the concatenation of the stream type byte and the zero or more descriptor bytes bytes, expressed in hexadecimal using uppercase ASCII hex digits. [MPEG2]

If the media resource is an MPEG-4 file

Let the first stsd box of the first stbl box of the first minf box of the first mdia box of a trak box of the first moov box of the file be the stsd box, if any. If the file has no stsd box, or if the stsd box has neither a mett box nor a metx box, then the text track in-band metadata track dispatch type must be set to the empty string. Otherwise, if the stsd box has a mett box then the text track in-band metadata track dispatch type must be set to the concatenation of the string "mett", a U+0020 SPACE character, and the value of the first mime_format field of the first mett box of the stsd box, or the empty string if that field is absent in that box. Otherwise, if the stsd box has no mett box but has a metx box then the text track in-band metadata track dispatch type must be set to the concatenation of the string "metx", a U+0020 SPACE character, and the value of the first namespace field of the first metx box of the stsd box, or the empty string if that field is absent in that box. [MPEG4]
Populate the new text track's list of cues with the cues parsed so far, folllowing the guidelines for exposing cues, and begin updating it dynamically as necessary.
Set the new text track's readiness state to loaded.
Set the new text track's mode to the mode consistent with the user's preferences and the requirements of the relevant specification for the data.
Add the new text track to the media element's list of text tracks.
Fire a trusted event with the name addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized to the text track's TextTrack object, at the media element's textTracks attribute's TextTrackList object.

When a media element is to forget the media element's media-resource-specific text tracks, the user agent must remove from the media element's list of text tracks all the media-resource-specific text tracks.

4.8.10.12.3 Sourcing out-of-band text tracks

When a track element is created, it must be associated with a new text track (with its value set as defined below) and its corresponding new TextTrack object.

The text track kind is determined from the state of the element's kind attribute according to the following table; for a state given in a cell of the first column, the kind is the string given in the second column:

State	String
Subtitles	`subtitles`
Captions	`captions`
Descriptions	`descriptions`
Chapters	`chapters`
Metadata	`metadata`

The text track label is the element's track label.

The text track language is the element's track language, if any, or the empty string otherwise.

As the kind, label, and srclang attributes are set, changed, or removed, the text track must update accordingly, as per the definitions above.

Changes to the track URL are handled in the algorithm below.

The text track readiness state is initially not loaded, and the text track mode is initially disabled.

The text track list of cues is initially empty. It is dynamically modified when the referenced file is parsed. Associated with the list are the rules for updating the text track rendering appropriate for the format in question; for WebVTT, this is the rules for updating the display of WebVTT text tracks. [WEBVTT]

When a track element's parent element changes and the new parent is a media element, then the user agent must add the track element's corresponding text track to the media element's list of text tracks, and then queue a task to fire a trusted event with the name addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized to the text track's TextTrack object, at the media element's textTracks attribute's TextTrackList object.

When a track element's parent element changes and the old parent was a media element, then the user agent must remove the track element's corresponding text track from the media element's list of text tracks, and then queue a task to fire a trusted event with the name removetrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized to the text track's TextTrack object, at the media element's textTracks attribute's TextTrackList object.

When a text track corresponding to a track element is added to a media element's list of text tracks, the user agent must queue a task to run the following steps for the media element:

If the element's blocked-on-parser flag is true, abort these steps.
If the element's did-perform-automatic-track-selection flag is true, abort these steps.
Honor user preferences for automatic text track selection for this element.

When the user agent is required to honor user preferences for automatic text track selection for a media element, the user agent must run the following steps:

Perform automatic text track selection for subtitles and captions.
Perform automatic text track selection for descriptions.
Perform automatic text track selection for chapters.
If there are any text tracks in the media element's list of text tracks whose text track kind is metadata that correspond to track elements with a default attribute set whose text track mode is set to disabled, then set the text track mode of all such tracks to hidden
Set the element's did-perform-automatic-track-selection flag to true.

When the steps above say to perform automatic text track selection for one or more text track kinds, it means to run the following steps:

Let candidates be a list consisting of the text tracks in the media element's list of text tracks whose text track kind is one of the kinds that were passed to the algorithm, if any, in the order given in the list of text tracks.
If candidates is empty, then abort these steps.
If any of the text tracks in candidates have a text track mode set to showing, abort these steps.
If the user has expressed an interest in having a track from candidates enabled based on its text track kind, text track language, and text track label, then set its text track mode to showing, and if there are any text tracks in candidates that correspond to track elements with a default attribute set whose text track mode is set to disabled, then additionally set the text track mode of the first such track to hidden.

For example, the user could have set a browser preference to the effect of "I want French captions whenever possible", or "If there is a subtitle track with 'Commentary' in the title, enable it", or "If there are audio description tracks available, enable one, ideally in Swiss German, but failing that in Standard Swiss German or Standard German".

Otherwise, if there are any text tracks in candidates that correspond to track elements with a default attribute set whose text track mode is set to disabled, then set the text track mode of the first such track to showing.

When a text track corresponding to a track element experiences any of the following circumstances, the user agent must start the track processing model for that text track and its track element:

The track element is created.
The text track has its text track mode changed.
The track element's parent element changes and the new parent is a media element.

When a user agent is to start the track processing model for a text track and its track element, it must run the following algorithm. This algorithm interacts closely with the event loop mechanism; in particular, it has a synchronous section (which is triggered as part of the event loop algorithm). The steps in that section are marked with ⌛.

If another occurrence of this algorithm is already running for this text track and its track element, abort these steps, letting that other algorithm take care of this element.
If the text track's text track mode is not set to one of hidden or showing, abort these steps.
If the text track's track element does not have a media element as a parent, abort these steps.
Run the remainder of these steps asynchronously, allowing whatever caused these steps to run to continue.
Top: Await a stable state. The synchronous section consists of the following steps. (The steps in the synchronous section are marked with ⌛.)
⌛ Set the text track readiness state to loading.
⌛ Let URL be the track URL of the track element.
⌛ If the track element's parent is a media element then let CORS mode be the state of the parent media element's crossorigin content attribute. Otherwise, let CORS mode be No CORS.
End the synchronous section, continuing the remaining steps asynchronously.
If URL is not the empty string, perform a potentially CORS-enabled fetch of URL, with the mode being CORS mode, the origin being the origin of the track element's Document, and the default origin behaviour set to fail.

The resource obtained in this fashion, if any, contains the text track data. If any data is obtained, it is by definition CORS-same-origin (cross-origin resources that are not suitably CORS-enabled do not get this far).

The tasks queued by the fetching algorithm on the networking task source to process the data as it is being fetched must determine the type of the resource. If the type of the resource is not a supported text track format, the load will fail, as described below. Otherwise, the resource's data must be passed to the appropriate parser (e.g. the WebVTT parser) as it is received, with the text track list of cues being used for that parser's output. [WEBVTT]

The appropriate parser will synchronously (during these networking task source tasks) and incrementally (as each such task is run with whatever data has been received from the network) update the text track list of cues.

This specification does not currently say whether or how to check the MIME types of text tracks, or whether or how to perform file type sniffing using the actual file data. Implementors differ in their intentions on this matter and it is therefore unclear what the right solution is. In the absence of any requirement here, the HTTP specification's strict requirement to follow the Content-Type header prevails ("Content-Type specifies the media type of the underlying data." ... "If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource.").

If the fetching algorithm fails for any reason (network error, the server returns an error code, a cross-origin check fails, etc), if URL is the empty string, or if the type of the resource is not a supported text track format, then run these steps:
1. Queue a task to first change the text track readiness state to failed to load and then fire a simple event named error at the track element.
2. Wait until the text track readiness state is no longer set to loading.
3. Wait until the track URL is no longer equal to URL, at the same time as the text track mode is set to hidden or showing.
4. Jump to the step labeled top.
If the fetching algorithm does not fail, then the final task that is queued by the networking task source must run the following steps after it has tried to parse the data:
1. Change the text track readiness state to loaded.
2. If the file was successfully processed, fire a simple event named load at the track element.
  
  Otherwise, the file was not successfully processed (e.g. the format in question is an XML format and the file contained a well-formedness error that the XML specification requires be detected and reported to the application); fire a simple event named error at the track element.
3. Wait until the track URL is no longer equal to URL, at the same time as the text track mode is set to hidden or showing.
4. Jump back to the step labeled top.
If, while the fetching algorithm is active, either:
- the track URL changes so that it is no longer equal to URL, while the text track mode is set to hidden or showing; or
- the text track mode changes to hidden or showing, while the track URL is not equal to URL
...then the user agent must run the following steps:
1. Abort the fetching algorithm, discarding any pending tasks generated by that algorithm (and in particular, not adding any cues to the text track list of cues after the moment the URL changed).
2. Jump back to the step labeled top.
Until one of the above circumstances occurs, the user agent must remain on this step.

Whenever a track element has its src attribute set, changed, or removed, the user agent must synchronously empty the element's text track's text track list of cues. (This also causes the algorithm above to stop adding cues from the resource being obtained using the previously given URL, if any.)

4.8.10.12.4 Guidelines for exposing cues in various formats as text track cues

How a specific format's text track cues are to be interpreted for the purposes of processing by an HTML user agent is defined by that format. In the absence of such a specification, this section provides some constraints within which implementations can attempt to consistently expose such formats.

To support the text track model of HTML, each unit of timed data is converted to a text track cue. Where the mapping of the format's features to the aspects of a text track cue as defined in this specification are not defined, implementations must ensure that the mapping is consistent with the definitions of the aspects of a text track cue as defined above, as well as with the following constraints:

The text track cue identifier: Should be set to the empty string if the format has no obvious analogue to a per-cue identifier.
The text track cue pause-on-exit flag: Should be set to false.
The text track cue writing direction: Should be set to horizontal if the concept of writing direction doesn't really apply (e.g. the cue consists of a bitmap image).
The text track cue snap-to-lines flag: Should be set to false unless the format uses a rendering and positioning model for cues that is largely consistent with the WebVTT cue text rendering rules.
The text track cue line position
The text track cue text position
The text track cue size
The text track cue alignment: If the format uses a rendering and positioning model for cues that can be largely simulated using the WebVTT cue text rendering rules, then these should be set to the values that would give the same effect for WebVTT cues. Otherwise, they should be set to zero.

4.8.10.12.5 Text track API

interface TextTrackList : EventTarget {  readonly attribute unsigned long length;  getter TextTrack (unsigned long index);   attribute EventHandler onaddtrack;   attribute EventHandler onremovetrack;};

media . textTracks . length: Returns the number of text tracks associated with the media element (e.g. from track elements). This is the number of text tracks in the media element's list of text tracks.
media . textTracks[ n ]: Returns the TextTrack object representing the nth text track in the media element's list of text tracks.
track . track: Returns the TextTrack object representing the track element's text track.

A TextTrackList object represents a dynamically updating list of text tracks in a given order.

The textTracks attribute of media elements must return a TextTrackList object representing the TextTrack objects of the text tracks in the media element's list of text tracks, in the same order as in the list of text tracks. The same object must be returned each time the attribute is accessed. [WEBIDL]

The length attribute of a TextTrackList object must return the number of text tracks in the list represented by the TextTrackList object.

The supported property indices of a TextTrackList object at any instant are the numbers from zero to the number of text tracks in the list represented by the TextTrackList object minus one, if any. If there are no text tracks in the list, there are no supported property indices.

To determine the value of an indexed property of a TextTrackList object for a given index index, the user agent must return the indexth text track in the list represented by the TextTrackList object.

enum TextTrackMode { "disabled", "hidden", "showing" };interface TextTrack : EventTarget {  readonly attribute DOMString kind;  readonly attribute DOMString label;  readonly attribute DOMString language;  readonly attribute DOMString inBandMetadataTrackDispatchType;   attribute TextTrackMode mode;  readonly attribute TextTrackCueList? cues;  readonly attribute TextTrackCueList? activeCues;  void addCue(TextTrackCue cue);  void removeCue(TextTrackCue cue);   attribute EventHandler oncuechange;};

textTrack = media . addTextTrack( kind [, label [, language ] ] )

Creates and returns a new TextTrack object, which is also added to the media element's list of text tracks.

textTrack . kind

Returns the text track kind string.

textTrack . label

Returns the text track label, if there is one, or the empty string otherwise (indicating that a custom label probably needs to be generated from the other attributes of the object if the object is exposed to the user).

textTrack . language

Returns the text track language string.

textTrack . inBandMetadataTrackDispatchType

Returns the text track in-band metadata track dispatch type string.

textTrack . mode [ = value ]

Returns the text track mode, represented by a string from the following list:

"disabled": The text track disabled mode.
"hidden": The text track hidden mode.
"showing": The text track showing mode.

Can be set, to change the mode.

textTrack . cues

Returns the text track list of cues, as a TextTrackCueList object.

textTrack . activeCues

Returns the text track cues from the text track list of cues that are currently active (i.e. that start before the current playback position and end after it), as a TextTrackCueList object.

textTrack . addCue( cue )

Adds the given cue to textTrack's text track list of cues.

textTrack . removeCue( cue )

Removes the given cue from textTrack's text track list of cues.

The addTextTrack(kind, label, language) method of media elements, when invoked, must run the following steps:

If kind is not one of the following strings, then throw a SyntaxError exception and abort these steps:
- subtitles
- captions
- descriptions
- chapters
- metadata
If the label argument was omitted, let label be the empty string.
If the language argument was omitted, let language be the empty string.
Create a new TextTrack object.
Create a new text track corresponding to the new object, and set its text track kind to kind, its text track label to label, its text track language to language, its text track readiness state to the text track loaded state, its text track mode to the text track hidden mode, and its text track list of cues to an empty list. Associate the text track list of cues with the rules for updating the display of WebVTT text tracks as its rules for updating the text track rendering. [WEBVTT]
Add the new text track to the media element's list of text tracks.
Queue a task to fire a trusted event with the name addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized to the new text track's TextTrack object, at the media element's textTracks attribute's TextTrackList object.
Return the new TextTrack object.

The kind attribute must return the text track kind of the text track that the TextTrack object represents.

The label attribute must return the text track label of the text track that the TextTrack object represents.

The language attribute must return the text track language of the text track that the TextTrack object represents.

The inBandMetadataTrackDispatchType attribute must return the text track in-band metadata track dispatch type of the text track that the TextTrack object represents.

The mode attribute, on getting, must return the string corresponding to the text track mode of the text track that the TextTrack object represents, as defined by the following list:

"disabled": The text track disabled mode.
"hidden": The text track hidden mode.
"showing": The text track showing mode.

On setting, if the new value isn't equal to what the attribute would currently return, the new value must be processed as follows:

If the new value is "disabled": Set the text track mode of the text track that the TextTrack object represents to the text track disabled mode.
If the new value is "hidden": Set the text track mode of the text track that the TextTrack object represents to the text track hidden mode.
If the new value is "showing": Set the text track mode of the text track that the TextTrack object represents to the text track showing mode.

If the text track mode of the text track that the TextTrack object represents is not the text track disabled mode, then the cues attribute must return a live TextTrackCueList object that represents the subset of the text track list of cues of the text track that the TextTrack object represents whose end times occur at or after the earliest possible position when the script started, in text track cue order. Otherwise, it must return null. When an object is returned, the same object must be returned each time.

The earliest possible position when the script started is whatever the earliest possible position was the last time the event loop reached step 1.

If the text track mode of the text track that the TextTrack object represents is not the text track disabled mode, then the activeCues attribute must return a live TextTrackCueList object that represents the subset of the text track list of cues of the text track that the TextTrack object represents whose active flag was set when the script started, in text track cue order. Otherwise, it must return null. When an object is returned, the same object must be returned each time.

A text track cue's active flag was set when the script started if its text track cue active flag was set the last time the event loop reached step 1.

The addCue(cue) method of TextTrack objects, when invoked, must run the following steps:

If the given cue is in a text track list of cues, then remove cue from that text track list of cues.
Add cue to the method's TextTrack object's text track's text track list of cues.

The removeCue(cue) method of TextTrack objects, when invoked, must run the following steps:

If the given cue is not currently listed in the method's TextTrack object's text track's text track list of cues, then throw a NotFoundError exception.
Remove cue from the method's TextTrack object's text track's text track list of cues.

In this example, an audio element is used to play a specific sound-effect from a sound file containing many sound effects. A cue is used to pause the audio, so that it ends exactly at the end of the clip, even if the browser is busy running some script. If the page had relied on script to pause the audio, then the start of the next clip might be heard if the browser was not able to run the script at the exact time specified.

var sfx = new Audio('sfx.wav');var sounds = sfx.addTextTrack('metadata');// add sounds we care aboutfunction addFX(start, end, name) {  var cue = new TextTrackCue(start, end, '');  cue.id = name;  cue.pauseOnExit = true;  sounds.addCue(cue);}addFX(12.783, 13.612, 'dog bark');addFX(13.612, 15.091, 'kitten mew'))function playSound(id) {  sfx.currentTime = sounds.getCueById(id).startTime;  sfx.play();}// play a bark as soon as we cansfx.oncanplaythrough = function () {  playSound('dog bark');}// meow when the user tries to leavewindow.onbeforeunload = function () {  playSound('kitten mew');  return 'Are you sure you want to leave this awesome page?';}

interface TextTrackCueList {  readonly attribute unsigned long length;  getter TextTrackCue (unsigned long index);  TextTrackCue? getCueById(DOMString id);};

cuelist . length

Returns the number of cues in the list.

cuelist[index]

Returns the text track cue with index index in the list. The cues are sorted in text track cue order.

cuelist . getCueById( id )

Returns the first text track cue (in text track cue order) with text track cue identifier id.

Returns null if none of the cues have the given identifier or if the argument is the empty string.

A TextTrackCueList object represents a dynamically updating list of text track cues in a given order.

The length attribute must return the number of cues in the list represented by the TextTrackCueList object.

The supported property indices of a TextTrackCueList object at any instant are the numbers from zero to the number of cues in the list represented by the TextTrackCueList object minus one, if any. If there are no cues in the list, there are no supported property indices.

To determine the value of an indexed property for a given index index, the user agent must return the indexth text track cue in the list represented by the TextTrackCueList object.

The getCueById(id) method, when called with an argument other than the empty string, must return the first text track cue in the list represented by the TextTrackCueList object whose text track cue identifier is id, if any, or null otherwise. If the argument is the empty string, then the method must return null.

enum AutoKeyword { "auto" };[Constructor(double startTime, double endTime, DOMString text)]interface TextTrackCue : EventTarget {  readonly attribute TextTrack? track;   attribute DOMString id;   attribute double startTime;   attribute double endTime;   attribute boolean pauseOnExit;   attribute DOMString vertical;   attribute boolean snapToLines;   attribute (long or AutoKeyword) line;   attribute long position;   attribute long size;   attribute DOMString align;   attribute DOMString text;  DocumentFragment getCueAsHTML();   attribute EventHandler onenter;   attribute EventHandler onexit;};

cue = new TextTrackCue( startTime, endTime, text )

Returns a new TextTrackCue object, for use with the addCue() method.

The startTime argument sets the text track cue start time.

The endTime argument sets the text track cue end time.

The text argument sets the text track cue text.

cue . track

Returns the TextTrack object to which this text track cue belongs, if any, or null otherwise.

cue . id [ = value ]

Returns the text track cue identifier.

Can be set.

cue . startTime [ = value ]

Returns the text track cue start time, in seconds.

Can be set.

cue . endTime [ = value ]

Returns the text track cue end time, in seconds.

Can be set.

cue . pauseOnExit [ = value ]

Returns true if the text track cue pause-on-exit flag is set, false otherwise.

Can be set.

cue . vertical [ = value ]

Returns a string representing the text track cue writing direction, as follows:

If it is horizontal: The empty string.
If it is vertical growing left: The string "rl".
If it is vertical growing right: The string "lr".

Can be set.

cue . snapToLines [ = value ]

Returns true if the text track cue snap-to-lines flag is set, false otherwise.

Can be set.

cue . line [ = value ]

Returns the text track cue line position. In the case of the value being auto, the string "auto" is returned.

Can be set.

cue . position [ = value ]

Returns the text track cue text position.

Can be set.

cue . size [ = value ]

Returns the text track cue size.

Can be set.

cue . align [ = value ]

Returns a string representing the text track cue alignment, as follows:

If it is start alignment: The string "start".
If it is middle alignment: The string "middle".
If it is end alignment: The string "end".
If it is left alignment: The string "left".
If it is right alignment: The string "right".

Can be set.

cue . text [ = value ]

Returns the text track cue text in raw unparsed form.

Can be set.

fragment = cue . getCueAsHTML()

Returns the text track cue text as a DocumentFragment of HTML elements and other DOM nodes.

The TextTrackCue(startTime, endTime, text) constructor, when invoked, must run the following steps:

Create a new text track cue. Let cue be that text track cue.
Let cue's text track cue start time be the value of the startTime argument, interpreted as a time in seconds.
Let cue's text track cue end time be the value of the endTime argument, interpreted as a time in seconds.
Let cue's text track cue text be the value of the text argument, and let the rules for its interpretation be the WebVTT cue text parsing rules, the WebVTT cue text rendering rules, and the WebVTT cue text DOM construction rules. [WEBVTT]
Let cue's text track cue identifier be the empty string.
Let cue's text track cue pause-on-exit flag be false.
Let cue's text track cue writing direction be horizontal.
Let cue's text track cue snap-to-lines flag be true.
Let cue's text track cue line position be auto.
Let cue's text track cue text position be 50.
Let cue's text track cue size be 100.
Let cue's text track cue alignment be middle alignment.
Return the TextTrackCue object representing cue.

The track attribute, on getting, must return the TextTrack object of the text track in whose list of cues the text track cue that the TextTrackCue object represents finds itself, if any; or null otherwise.

The id attribute, on getting, must return the text track cue identifier of the text track cue that the TextTrackCue object represents. On setting, the text track cue identifier must be set to the new value.

The startTime attribute, on getting, must return the text track cue start time of the text track cue that the TextTrackCue object represents, in seconds. On setting, the text track cue start time must be set to the new value, interpreted in seconds.

The endTime attribute, on getting, must return the text track cue end time of the text track cue that the TextTrackCue object represents, in seconds. On setting, the text track cue end time must be set to the new value, interpreted in seconds.

The pauseOnExit attribute, on getting, must return true if the text track cue pause-on-exit flag of the text track cue that the TextTrackCue object represents is set; or false otherwise. On setting, the text track cue pause-on-exit flag must be set if the new value is true, and must be unset otherwise.

The vertical attribute, on getting, must return the string from the second cell of the row in the table below whose first cell is the text track cue writing direction of the text track cue that the TextTrackCue object represents:

Text track cue writing direction	`direction` value
Horizontal	"" (the empty string)
Vertical growing left	"`rl`"
Vertical growing right	"`lr`"

On setting, the text track cue writing direction must be set to the value given in the first cell of the row in the table above whose second cell is a case-sensitive match for the new value, if any. If none of the values match, then the user agent must instead throw a SyntaxError exception.

The snapToLines attribute, on getting, must return true if the text track cue snap-to-lines flag of the text track cue that the TextTrackCue object represents is set; or false otherwise. On setting, the text track cue snap-to-lines flag must be set if the new value is true, and must be unset otherwise.

The line attribute, on getting, must return the text track cue line position of the text track cue that the TextTrackCue object represents. The special value auto must be represented as the string "auto". On setting, the text track cue line position must be set to the new value; if the new value is the string "auto", then it must be interpreted as the special value auto.

The position attribute, on getting, must return the text track cue text position of the text track cue that the TextTrackCue object represents. On setting, if the new value is negative or greater than 100, then an IndexSizeError exception must be thrown. Otherwise, the text track cue text position must be set to the new value.

The size attribute, on getting, must return the text track cue size of the text track cue that the TextTrackCue object represents. On setting, if the new value is negative or greater than 100, then an IndexSizeError exception must be thrown. Otherwise, the text track cue size must be set to the new value.

The align attribute, on getting, must return the string from the second cell of the row in the table below whose first cell is the text track cue alignment of the text track cue that the TextTrackCue object represents:

Text track cue alignment	`align` value
Start alignment	"`start`"
Middle alignment	"`middle`"
End alignment	"`end`"
Left alignment	"`left`"
Right alignment	"`right`"

On setting, the text track cue alignment must be set to the value given in the first cell of the row in the table above whose second cell is a case-sensitive match for the new value, if any. If none of the values match, then the user agent must instead throw a SyntaxError exception.

The text attribute, on getting, must return the raw text track cue text of the text track cue that the TextTrackCue object represents. On setting, the text track cue text must be set to the new value.

The getCueAsHTML() method must convert the text track cue text to a DocumentFragment for the script's document of the entry script, using the appropriate rules for doing so. For example, for WebVTT, those rules are the WebVTT cue text parsing rules and the WebVTT cue text DOM construction rules. [WEBVTT]

4.8.10.12.6 Text tracks describing chapters

Chapters are segments of a media resource with a given title. Chapters can be nested, in the same way that sections in a document outline can have subsections.

Each text track cue in a text track being used for describing chapters has three key features: the text track cue start time, giving the start time of the chapter, the text track cue end time, giving the end time of the chapter, and the text track cue text giving the chapter title.

The rules for constructing the chapter tree from a text track are as follows. They produce a potentially nested list of chapters, each of which have a start time, end time, title, and a list of nested chapters. This algorithm discards cues that do not correctly nest within each other, or that are out of order.

Let list be a copy of the list of cues of the text track being processed.
Remove from list any text track cue whose text track cue end time is before its text track cue start time.
Let output be an empty list of chapters, where a chapter is a record consisting of a start time, an end time, a title, and a (potentially empty) list of nested chapters. For the purpose of this algorithm, each chapter also has a parent chapter.
Let current chapter be a stand-in chapter whose start time is negative infinity, whose end time is positive infinity, and whose list of nested chapters is output. (This is just used to make the algorithm easier to describe.)
Loop: If list is empty, jump to the step labeled end.
Let current cue be the first cue in list, and then remove it from list.
If current cue's text track cue start time is less than the start time of current chapter, then return to the step labeled loop.
While current cue's text track cue start time is greater than or equal to current chapter's end time, let current chapter be current chapter's parent chapter.
If current cue's text track cue end time is greater than the end time of current chapter, then return to the step labeled loop.
Create a new chapter new chapter, whose start time is current cue's text track cue start time, whose end time is current cue's text track cue end time, whose title is current cue's text track cue text interpreted according to its rules for interpretation, and whose list of nested chapters is empty.
Append new chapter to current chapter's list of nested chapters, and let current chapter be new chapter's parent.
Let current chapter be new chapter.
Return to the step labeled loop.
End: Return output.

The following snippet of a WebVTT file shows how nested chapters can be marked up. The file describes three 50-minute chapters, "Astrophysics", "Computational Physics", and "General Relativity". The first has three subchapters, the second has four, and the third has two. [WEBVTT]

WEBVTT00:00:00.000 --> 00:50:00.000Astrophysics00:00:00.000 --> 00:10:00.000Introduction to Astrophysics00:10:00.000 --> 00:45:00.000The Solar System00:00:00.000 --> 00:10:00.000Coursework Description00:50:00.000 --> 01:40:00.000Computational Physics00:50:00.000 --> 00:55:00.000Introduction to Programming00:55:00.000 --> 01:30:00.000Data Structures01:30:00.000 --> 01:35:00.000Answers to Last Exam01:35:00.000 --> 01:40:00.000Coursework Description01:40:00.000 --> 02:30:00.000General Relativity01:40:00.000 --> 02:00:00.000Tensor Algebra02:00:00.000 --> 02:30:00.000The General Relativistic Field Equations

4.8.10.12.7 Event definitions

The following are the event handlers that (and their corresponding event handler event types) must be supported, as IDL attributes, by all objects implementing the TextTrackList interface:

Event handler	Event handler event type
`onaddtrack`	`addtrack`
`onremovetrack`	`removetrack`

The following are the event handlers that (and their corresponding event handler event types) must be supported, as IDL attributes, by all objects implementing the TextTrack interface:

Event handler	Event handler event type
`oncuechange`	`cuechange`

The following are the event handlers that (and their corresponding event handler event types) must be supported, as IDL attributes, by all objects implementing the TextTrackCue interface:

Event handler	Event handler event type
`onenter`	`enter`
`onexit`	`exit`

(Sebelumnya) 4.8.10.11. Synchronising multi ...

4.8.10.13. User interface (Berikutnya)