Computer Science    
   
Table of contents
(Prev) Whitespace (programming language)who (Unix) (Next)

Whitespace character

In computer science, white space or whitespace is any character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020   space (HTML:  ), also ASCII 32, represents a blank space, used as a word divider in Western scripts.

The term "whitespace" is based on the resulting appearance on ordinary paper.

Contents

Definition and ambiguity

The most common whitespace characters may be typed via the space bar or the tab key. Depending on context, a line-break generated by the return or enter key may be considered whitespace as well.

Unicode

In Unicode (Unicode Character Database) the following 26 characters are defined as whitespace character:

Whitespace[a] (Unicode character property WSpace=Y)
Code pointNameScriptGeneral categoryRemark
000009U+0009 CommonOther, controlHT, Horizontal Tab
000010U+000A CommonOther, controlLF, Line feed
000011U+000B CommonOther, controlVT, Vertical Tab
000012U+000C CommonOther, controlFF, Form feed
000013U+000D CommonOther, controlCR, Carriage return
000032U+0020spaceCommonSeparator, space 
000133U+0085 CommonOther, controlNEL, Next line
000160U+00A0no-break spaceCommonSeparator, space 
005760U+1680ogham space markOghamSeparator, space 
006158U+180Emongolian vowel separatorMongolianSeparator, space 
008192U+2000en quadCommonSeparator, space 
008193U+2001em quadCommonSeparator, space 
008194U+2002en spaceCommonSeparator, space 
008195U+2003em spaceCommonSeparator, space 
008196U+2004three-per-em spaceCommonSeparator, space 
008197U+2005four-per-em spaceCommonSeparator, space 
008198U+2006six-per-em spaceCommonSeparator, space 
008199U+2007figure spaceCommonSeparator, space 
008200U+2008punctuation spaceCommonSeparator, space 
008201U+2009thin spaceCommonSeparator, space 
008202U+200Ahair spaceCommonSeparator, space 
008232U+2028line separatorCommonSeparator, line 
008233U+2029paragraph separatorCommonSeparator, paragraph 
008239U+202Fnarrow no-break spaceCommonSeparator, space 
008287U+205Fmedium mathematical spaceCommonSeparator, space 
012288U+3000ideographic spaceCommonSeparator, space 
a. ^ Unicode 6.0, Chapter 4.6

Within the algorithm for Bidirectional writing, Unicode uses another definition of "Whitespace" (Bidirectional Character Type=WS). These Bidi-WS characters (18 out of the 26 listed in the table here) are "Neutral", they do not determine a writing direction, they just follow neighboring characters in this. The eight other characters listed here are also "Neutral", but have a different Bidi-type.

Usage

Computer languages

Runs of whitespace (beyond a first whitespace character) occurring within source code written in computer programming languages are generally ignored; such languages are free-form. But, for example, in Haskell and Python, whitespace and indentation are used for syntactical purposes. And in the language called Whitespace, whitespaces are the only valid characters for programming, while any other characters are ignored.

Still, for most programming languages, abundant use of whitespace, especially trailing whitespace at the end of lines, is considered a nuisance.[by whom?] However correct use of whitespace can make the code easier to read and help group related logic. In interpreted languages, parsing of unnecessary whitespace may affect the speed of execution. In markup languages like HTML, unnecessary whitespace increases the file size, and may so affect the speed of transfer over a network. On the other hand, unnecessary whitespace can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an infringement of license or copyright that was committed by copying and pasting.

The C language defines whitespace to be "... space, horizontal tab, new-line, vertical tab, and form-feed".[1] The HTTP network protocol requires different types of whitespace to be used in different parts of the protocol, such only the space character in the status line, CRLF at the end of a line, and "linear white space" in header values.[2]

Visible symbol

Sometimes the visible symbol ␣ (Unicode U+2423, decimal 9251, open box) is used to indicate a space. It is much like a closing square bracket ], although not as wide, rotated a quarter-turn clockwise and placed below the writing line. Some fonts render it too narrowly.

This symbol is used in a textbook on the Modula-2 computer language published ca. 1985 by Springer-Verlag, where it is necessary to explicitly indicate a space code. The symbol is also used in the keypad silkscreening of TI-8x series graphing calculators from Texas Instruments.[3]

File names

Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes—such file names instead use an underscore (_) as a word separator, as_in_this_phrase.

Another such symbol was U+2422 blank symbol. This was used in the early years of computer programming when writing on coding forms. Keypunch operators immediately recognized the symbol as an "explicit space".[citation needed]

See also

References

  1. ^ http://www.open-std.org/jtc1/sc22/wg1 4/www/docs/n1548.pdf Section 6.4, paragraph 3
  2. ^ R. Fielding et al., "2.2 Basic Rules", Hypertext Transfer Protocol—HTTP/1.1, RFC 2616
  3. ^ Above the zero "0" or negative "(‒)" key

External links

(Prev) Whitespace (programming language)who (Unix) (Next)