Ilmu Komputer    
   
Daftar Isi
(Sebelumnya) UtutoUZard Web (Berikutnya)

Uuencoding

Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system.

The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computers' character sets, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files by e-mail and posting to usenet newsgroups, etc.

It has now been largely replaced by MIME and yEnc. With MIME, files that might have been uuencoded are transferred with base64 encoding.

Contents

Encoded format

A uuencoded file starts with a header line of the form:

 begin <mode> <file><newline>

<mode> is the file's Unix read/write/execute file permissions as three octal digits. This is typically only significant to UNIX and Linux based operating systems but for reference this is 0644 (0 => signifies the number as octal, 6 => User can read + write, 4 => Group can read, 4 => Others can read) or 0744 (the same except 7 => User can read + write + exec).

<file> is the file name to be used when recreating the binary data.

<newline> signifies a newline character. Each data line uses the format:

 <length character><formatted characters><newline>

<length character> is a character indicating the number of data bytes encoded on that line and ends with a newline character.

The character is an ASCII character determined by adding 32 to the actual byte count, with the sole exception of a grave accent "`" (ASCII code 96) signifying zero bytes. All data lines except the last (if the data was not divisible by 45), have 45 bytes of encoded data. Therefore, the vast majority of length values is 'M', (32 + 45 = ASCII code 77 or "M").

<formatted characters> are encoded characters. See Formatting Mechanism for more details on the actual implementation.

The file ends with two lines:

 `<newline> end<newline>

The second to last line is also a character indicating the line length with the grave accent signifying zero bytes.

As a complete file, the uuencoded output for a plain text file named cat.txt containing only the characters Cat would be

begin 644 cat.txt#0V%T`end

The begin line is a standard uuencode header; the '#' indicates that its line encodes three characters; the last two lines appear at the end of all uuencoded files.

Formatting Mechanism

The mechanism of uuencoding repeats the following for every 3 bytes:

  1. Start with (3) bytes from the source.
  2. Convert to 24 bits.
  3. Convert into (4) 6-bit groupings, bits (00-05),(06-11),(12-17),(18-23).
  4. Evaluate the decimal equivalent of each of the (4) 6-bit groupings. 6 bits allows a range of 0 to 63.
  5. Add 32 to each of the 4. With the addition of 32 this means that possible results can be between 32 (" " space) and 95 ("_" underline). 96 ("`" grave accent) as the "special character" is a logical extension of this range.
  6. Output the ASCII equivalent of these numbers.

If the source is not divisible by 3 then the last 4-byte section will contain padding bytes to make it cleanly divisible. These bytes are subtracted from the line's <length character> so that the decoder does not append unwanted null characters to the file.

uudecoding is reverse of the above, subtract 32 from each character's ASCII code, convert the 4 decimals to 24 bits then output 3 bytes.

The encoding process is demonstrated by this table, which shows the derivation of the above encoding for "Cat".

Original charactersCat
Original ASCII, decimal6797116
ASCII, binary010000110110000101110100
New decimal values1654552
+3248863784
Uuencoded characters0V%T

Uuencode table

The following table shows the conversion of the decimal value of the 6-bit fields obtained during the conversion process and their corresponding ASCII character output code and character.

Note that 96 ("`" grave accent) is a character that is seen in uuencoded files but is typically only used to signify a 0-length line, usually at the end of a file. It will never naturally occur in the actual converted data since it is outside the range of 32 to 95. The sole exception to this is that some uuencoding programs use the grave accent to signify padding bytes instead of a space. However, the character used for the padding byte is not standardized, so either is a possibility.

six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
0032SP 1042* 20524 3062> 4072H 5082R 6092\
0133! 1143+ 21535 3163? 4173I 5183S 6193]
0234" 1244, 22546 3264@ 4274J 5284T 6294^
0335# 1345- 23557 3365A 4375K 5385U 6395_
0436$ 1446. 24568 3466B 4476L 5486V
0537% 1547/ 25579 3567C 4577M 5587W
0638& 16480 2658: 3668D 4678N 5688X
0739' 17491 2759; 3769E 4779O 5789Y
0840( 18502 2860< 3870F 4880P 5890Z
0941) 19513 2961= 3971G 4981Q 5991[

Forks (File, Resource)

Unix traditionally has a single fork where file data is stored. However some file systems support multiple forks associated with a single file. For example, classic Mac OS HFS supported a data fork and a resource fork. Mac OS HFS+ supports multiple forks, as does Microsoft Windows NTFS alternate data streams. Most uucoding tools will only handle data from the primary data fork that can result in a loss of information when encoding/decoding (for example, Windows NTFS file comments are kept in a different fork.) Some tools (like the classic Mac OS application UUTool) solved the problem by concatenating the different forks into one file and differentiating them by file name.

Relation to Xxencode and Base64

Despite its limited range of characters, uuencoded data is sometimes mangled on passage through certain computers using non-ASCII character sets such as EBCDIC. One attempt to fix the problem was the Xxencode format, which used only alphanumeric characters and the plus and minus symbols. More common today is the Base64 format which is based on the same concept of alphanumeric-only as opposed to ASCII 32-95. All 3 formats use 6 bits (64 different characters) to represent their input data.

Base64 can also be generated by the uuencode program and is similar in format, with the exception of the actual character translation:

The header is changed to

begin-base64 <mode> <file>

the trailer becomes

====

and lines between are encoded with characters chosen from

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijk lmnopqrstuvwxyz0123456789+/

Disadvantages

  • UUEncoding takes 3 pre-formatted bytes and turns them into 4 and also adds begin/end tags, filename, and delimiters. This adds at least 40% data overhead compared to the source alone.
  • Newer alternatives exist such as yEnc and MIME.

Support in Python

The Python language supports UUEncoding using the codecs module with the codec "uu". - e.g.:-

"Cat".encode("uu")'begin 666 <data>\n#0V%T\n \nend\n'

Support in Perl

The Perl language supports UUEncoding natively using the pack() and unpack() operators with the format string "u" - e.g.:-

perl -e 'print pack("u","Cat")'#0V%T

Decoding base64 with unpack can likewise be accomplished by translating the characters:

perl -e ' $a="Q2F0"; $a=~tr#A-Za-z0-9+/\.\_##cd; # remove non-bas64 chars$a=~tr#A-Za-z0-9+/# -_#; # translate setsprint unpack("u",pack("C",32+int(length($1) *6/8)) . $1) while($a=~s/(.{60}|.+)//);  'Cat

See also

  • Binary-to-text encoding for a comparison of various encoding algorithms

External links

  • UUDeview - open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS
  • UUENCODE-UUDECODE - open-source program to encode/decode created by Clem "Grandad" Dye
  • StUU - Open Source fast UUDecoder for Macintosh by Stuart Cheshire
  • UUENCODE-UUDECODE - Free on-line UUEncoder and UUDecoder
(Sebelumnya) UtutoUZard Web (Berikutnya)