Cari di Perl 
    Perl User Manual
Daftar Isi
(Sebelumnya) Print a picture recordTest or set particular bits in ... (Berikutnya)
Functions for fixed length data or records

Convert a list into a binary representation

Daftar Isi

  • pack TEMPLATE,LIST

    Takes a LIST of values and converts it into a string using the rulesgiven by the TEMPLATE. The resulting string is the concatenation ofthe converted values. Typically, each converted value lookslike its machine-level representation. For example, on 32-bit machinesan integer may be represented by a sequence of 4 bytes, which will inPerl be presented as a string that's 4 characters long.

    See perlpacktut for an introduction to this function.

    The TEMPLATE is a sequence of characters that give the order and typeof values, as follows:

    1. a A string with arbitrary binary data, will be null padded.
    2. A A text (ASCII) string, will be space padded.
    3. Z A null-terminated (ASCIZ) string, will be null padded.
    4. b A bit string (ascending bit order inside each byte,
    5. like vec()).
    6. B A bit string (descending bit order inside each byte).
    7. h A hex string (low nybble first).
    8. H A hex string (high nybble first).
    9. c A signed char (8-bit) value.
    10. C An unsigned char (octet) value.
    11. W An unsigned char value (can be greater than 255).
    12. s A signed short (16-bit) value.
    13. S An unsigned short value.
    14. l A signed long (32-bit) value.
    15. L An unsigned long value.
    16. q A signed quad (64-bit) value.
    17. Q An unsigned quad value.
    18. (Quads are available only if your system supports 64-bit
    19. integer values _and_ if Perl has been compiled to support
    20. those. Raises an exception otherwise.)
    21. i A signed integer value.
    22. I A unsigned integer value.
    23. (This 'integer' is _at_least_ 32 bits wide. Its exact
    24. size depends on what a local C compiler calls 'int'.)
    25. n An unsigned short (16-bit) in "network" (big-endian) order.
    26. N An unsigned long (32-bit) in "network" (big-endian) order.
    27. v An unsigned short (16-bit) in "VAX" (little-endian) order.
    28. V An unsigned long (32-bit) in "VAX" (little-endian) order.
    29. j A Perl internal signed integer value (IV).
    30. J A Perl internal unsigned integer value (UV).
    31. f A single-precision float in native format.
    32. d A double-precision float in native format.
    33. F A Perl internal floating-point value (NV) in native format
    34. D A float of long-double precision in native format.
    35. (Long doubles are available only if your system supports
    36. long double values _and_ if Perl has been compiled to
    37. support those. Raises an exception otherwise.)
    38. p A pointer to a null-terminated string.
    39. P A pointer to a structure (fixed-length string).
    40. u A uuencoded string.
    41. U A Unicode character number. Encodes to a character in char-
    42. acter mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in
    43. byte mode.
    44. w A BER compressed integer (not an ASN.1 BER, see perlpacktut
    45. for details). Its bytes represent an unsigned integer in
    46. base 128, most significant digit first, with as few digits
    47. as possible. Bit eight (the high bit) is set on each byte
    48. except the last.
    49. x A null byte (a.k.a ASCII NUL, "\000", chr(0))
    50. X Back up a byte.
    51. @ Null-fill or truncate to absolute position, counted from the
    52. start of the innermost ()-group.
    53. . Null-fill or truncate to absolute position specified by
    54. the value.
    55. ( Start of a ()-group.

    One or more modifiers below may optionally follow certain letters in theTEMPLATE (the second column lists letters for which the modifier is valid):

    1. ! sSlLiI Forces native (short, long, int) sizes instead
    2. of fixed (16-/32-bit) sizes.
    3. xX Make x and X act as alignment commands.
    4. nNvV Treat integers as signed instead of unsigned.
    5. @. Specify position as byte offset in the internal
    6. representation of the packed string. Efficient
    7. but dangerous.
    8. > sSiIlLqQ Force big-endian byte-order on the type.
    9. jJfFdDpP (The "big end" touches the construct.)
    10. < sSiIlLqQ Force little-endian byte-order on the type.
    11. jJfFdDpP (The "little end" touches the construct.)

    The > and < modifiers can also be used on () groups to force a particular byte-order on all components in that group, including all its subgroups.

    The following rules apply:

    • Each letter may optionally be followed by a number indicating the repeatcount. A numeric repeat count may optionally be enclosed in brackets, asin pack("C[80]", @arr). The repeat count gobbles that many values fromthe LIST when used with all format types other than a, A, Z, b,B, h, H, @, ., x, X, and P, where it meanssomething else, described below. Supplying a * for the repeat countinstead of a number means to use however many items are left, except for:

      • @, x, and X, where it is equivalent to 0.

      • <.>, where it means relative to the start of the string.

      • u, where it is equivalent to 1 (or 45, which here is equivalent).

      One can replace a numeric repeat count with a template letter enclosed inbrackets to use the packed byte length of the bracketed template for therepeat count.

      For example, the template x[L] skips as many bytes as in a packed long,and the template "$t X[$t] $t" unpacks twice whatever $t (whenvariable-expanded) unpacks. If the template in brackets contains alignmentcommands (such as x![d]), its packed length is calculated as if thestart of the template had the maximal possible alignment.

      When used with Z, a * as the repeat count is guaranteed to add atrailing null byte, so the resulting string is always one byte longer thanthe byte length of the item itself.

      When used with @, the repeat count represents an offset from the startof the innermost () group.

      When used with ., the repeat count determines the starting position tocalculate the value offset as follows:

      • If the repeat count is 0, it's relative to the current position.

      • If the repeat count is *, the offset is relative to the start of thepacked string.

      • And if it's an integer n, the offset is relative to the start of thenth innermost ( ) group, or to the start of the string if n isbigger then the group level.

      The repeat count for u is interpreted as the maximal number of bytesto encode per line of output, with 0, 1 and 2 replaced by 45. The repeat count should not be more than 65.

    • The a, A, and Z types gobble just one value, but pack it as astring of length count, padding with nulls or spaces as needed. Whenunpacking, A strips trailing whitespace and nulls, Z strips everythingafter the first null, and a returns data with no stripping at all.

      If the value to pack is too long, the result is truncated. If it's toolong and an explicit count is provided, Z packs only $count-1 bytes,followed by a null byte. Thus Z always packs a trailing null, exceptwhen the count is 0.

    • Likewise, the b and B formats pack a string that's that many bits long.Each such format generates 1 bit of the result. These are typically followedby a repeat count like B8 or B64.

      Each result bit is based on the least-significant bit of the correspondinginput character, i.e., on ord($char)%2. In particular, characters "0"and "1" generate bits 0 and 1, as do characters "\000" and "\001".

      Starting from the beginning of the input string, each 8-tupleof characters is converted to 1 character of output. With format b,the first character of the 8-tuple determines the least-significant bit of acharacter; with format B, it determines the most-significant bit ofa character.

      If the length of the input string is not evenly divisible by 8, theremainder is packed as if the input string were padded by null charactersat the end. Similarly during unpacking, "extra" bits are ignored.

      If the input string is longer than needed, remaining characters are ignored.

      A * for the repeat count uses all characters of the input field. On unpacking, bits are converted to a string of 0s and 1s.

    • The h and H formats pack a string that many nybbles (4-bit groups,representable as hexadecimal digits, "0".."9" "a".."f") long.

      For each such format, pack() generates 4 bits of result.With non-alphabetical characters, the result is based on the 4 least-significantbits of the input character, i.e., on ord($char)%16. In particular,characters "0" and "1" generate nybbles 0 and 1, as do bytes"\000" and "\001". For characters "a".."f" and "A".."F", the resultis compatible with the usual hexadecimal digits, so that "a" and"A" both generate the nybble 0xA==10. Use only these specific hex characters with this format.

      Starting from the beginning of the template to pack(), each pairof characters is converted to 1 character of output. With format h, thefirst character of the pair determines the least-significant nybble of theoutput character; with format H, it determines the most-significantnybble.

      If the length of the input string is not even, it behaves as if padded bya null character at the end. Similarly, "extra" nybbles are ignored duringunpacking.

      If the input string is longer than needed, extra characters are ignored.

      A * for the repeat count uses all characters of the input field. Forunpack(), nybbles are converted to a string of hexadecimal digits.

    • The p format packs a pointer to a null-terminated string. You areresponsible for ensuring that the string is not a temporary value, as thatcould potentially get deallocated before you got around to using the packedresult. The P format packs a pointer to a structure of the size indicatedby the length. A null pointer is created if the corresponding value forp or P is undef; similarly with unpack(), where a null pointerunpacks into undef.

      If your system has a strange pointer size--meaning a pointer is neither asbig as an int nor as big as a long--it may not be possible to pack orunpack pointers in big- or little-endian byte order. Attempting to doso raises an exception.

    • The / template character allows packing and unpacking of a sequence ofitems where the packed structure contains a packed item count followed bythe packed items themselves. This is useful when the structure you'reunpacking has encoded the sizes or repeat counts for some of its fieldswithin the structure itself as separate fields.

      For pack, you write length-item/sequence-item, and thelength-item describes how the length value is packed. Formats likelyto be of most use are integer-packing ones like n for Java strings,w for ASN.1 or SNMP, and N for Sun XDR.

      For pack, sequence-item may have a repeat count, in which casethe minimum of that and the number of available items is used as the argumentfor length-item. If it has no repeat count or uses a '*', the numberof available items is used.

      For unpack, an internal stack of integer arguments unpacked so far isused. You write /sequence-item and the repeat count is obtained bypopping off the last element from the stack. The sequence-item must nothave a repeat count.

      If sequence-item refers to a string type ("A", "a", or "Z"),the length-item is the string length, not the number of strings. Withan explicit repeat count for pack, the packed string is adjusted to thatlength. For example:

      1. This code: gives this result:
      2. unpack("W/a", "\004Gurusamy") ("Guru")
      3. unpack("a3/A A*", "007 Bond J ") (" Bond", "J")
      4. unpack("a3 x2 /A A*", "007: Bond, J.") ("Bond, J", ".")
      5. pack("n/a* w/a","hello,","world") "\000\006hello,\005world"
      6. pack("a/W2", ord("a") .. ord("z")) "2ab"

      The length-item is not returned explicitly from unpack.

      Supplying a count to the length-item format letter is only useful withA, a, or Z. Packing with a length-item of a or Z mayintroduce "\000" characters, which Perl does not regard as legal innumeric strings.

    • The integer types s, S, l, and L may befollowed by a ! modifier to specify native shorts orlongs. As shown in the example above, a bare l meansexactly 32 bits, although the native long as seen by the local C compilermay be larger. This is mainly an issue on 64-bit platforms. You cansee whether using ! makes any difference this way:

      1. printf "format s is %d, s! is %d\n",
      2. length pack("s"), length pack("s!");
      3. printf "format l is %d, l! is %d\n",
      4. length pack("l"), length pack("l!");

      i! and I! are also allowed, but only for completeness' sake:they are identical to i and I.

      The actual sizes (in bytes) of native shorts, ints, longs, and longlongs on the platform where Perl was built are also available fromthe command line:

      1. $ perl -V:{short,int,long{,long}}size
      2. shortsize='2';
      3. intsize='4';
      4. longsize='4';
      5. longlongsize='8';

      or programmatically via the Config module:

      1. use Config;
      2. print $Config{shortsize}, "\n";
      3. print $Config{intsize}, "\n";
      4. print $Config{longsize}, "\n";
      5. print $Config{longlongsize}, "\n";

      $Config{longlongsize} is undefined on systems without long long support.

    • The integer formats s, S, i, I, l, L, j, and J areinherently non-portable between processors and operating systems becausethey obey native byteorder and endianness. For example, a 4-byte integer0x12345678 (305419896 decimal) would be ordered natively (arranged in andhandled by the CPU registers) into bytes as

      1. 0x12 0x34 0x56 0x78 # big-endian
      2. 0x78 0x56 0x34 0x12 # little-endian

      Basically, Intel and VAX CPUs are little-endian, while everybody else,including Motorola m68k/88k, PPC, Sparc, HP PA, Power, and Cray, arebig-endian. Alpha and MIPS can be either: Digital/Compaq uses (well, used) them in little-endian mode, but SGI/Cray uses them in big-endian mode.

      The names big-endian and little-endian are comic references to theegg-eating habits of the little-endian Lilliputians and the big-endianBlefuscudians from the classic Jonathan Swift satire, Gulliver's Travels.This entered computer lingo via the paper "On Holy Wars and a Plea forPeace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980.

      Some systems may have even weirder byte orders such as

      1. 0x56 0x78 0x12 0x34
      2. 0x34 0x12 0x78 0x56

      You can determine your system endianness with this incantation:

      1. printf("%#02x ", $_) for unpack("W*", pack L=>0x12345678);

      The byteorder on the platform where Perl was built is also availablevia Config:

      1. use Config;
      2. print "$Config{byteorder}\n";

      or from the command line:

      1. $ perl -V:byteorder

      Byteorders "1234" and "12345678" are little-endian; "4321"and "87654321" are big-endian.

      For portably packed integers, either use the formats n, N, v, and V or else use the > and < modifiers describedimmediately below. See also perlport.

    • Starting with Perl 5.9.2, integer and floating-point formats, along withthe p and P formats and () groups, may all be followed by the > or < endianness modifiers to respectively enforce big-or little-endian byte-order. These modifiers are especially useful given how n, N, v, and V don't cover signed integers, 64-bit integers, or floating-point values.

      Here are some concerns to keep in mind when using an endianness modifier:

      • Exchanging signed integers between different platforms works only when all platforms store them in the same format. Most platforms storesigned integers in two's-complement notation, so usually this is not an issue.

      • The > or < modifiers can only be used on floating-pointformats on big- or little-endian machines. Otherwise, attempting touse them raises an exception.

      • Forcing big- or little-endian byte-order on floating-point values fordata exchange can work only if all platforms use the samebinary representation such as IEEE floating-point. Even if allplatforms are using IEEE, there may still be subtle differences. Being ableto use > or < on floating-point values can be useful,but also dangerous if you don't know exactly what you're doing.It is not a general way to portably store floating-point values.

      • When using > or < on a () group, this affectsall types inside the group that accept byte-order modifiers,including all subgroups. It is silently ignored for all othertypes. You are not allowed to override the byte-order within a groupthat already has a byte-order modifier suffix.

    • Real numbers (floats and doubles) are in native machine format only.Due to the multiplicity of floating-point formats and the lack of astandard "network" representation for them, no facility for interchange has beenmade. This means that packed floating-point data written on one machinemay not be readable on another, even if both use IEEE floating-pointarithmetic (because the endianness of the memory representation is not partof the IEEE spec). See also perlport.

      If you know exactly what you're doing, you can use the > or <modifiers to force big- or little-endian byte-order on floating-point values.

      Because Perl uses doubles (or long doubles, if configured) internally forall numeric calculation, converting from double into float and thence to double again loses precision, so unpack("f", pack("f", $foo))will not in general equal $foo.

    • Pack and unpack can operate in two modes: character mode (C0 mode) wherethe packed string is processed per character, and UTF-8 mode (U0 mode)where the packed string is processed in its UTF-8-encoded Unicode form ona byte-by-byte basis. Character mode is the defaultunless the format string starts with U. Youcan always switch mode mid-format with an explicit C0 or U0 in the format. This mode remains in effect until the next mode change, or until the end of the () group it (directly) applies to.

      Using C0 to get Unicode characters while using U0 to get non-Unicode bytes is not necessarily obvious. Probably only the first of theseis what you want:

      1. $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
      2. perl -CS -ne 'printf "%v04X\n", $_ for unpack("C0A*", $_)'
      3. 03B1.03C9
      4. $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
      5. perl -CS -ne 'printf "%v02X\n", $_ for unpack("U0A*", $_)'
      6. CE.B1.CF.89
      7. $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
      8. perl -C0 -ne 'printf "%v02X\n", $_ for unpack("C0A*", $_)'
      9. CE.B1.CF.89
      10. $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
      11. perl -C0 -ne 'printf "%v02X\n", $_ for unpack("U0A*", $_)'
      12. C3.8E.C2.B1.C3.8F.C2.89

      Those examples also illustrate that you should not try to usepack/unpack as a substitute for the Encode module.

    • You must yourself do any alignment or padding by inserting, for example,enough "x"es while packing. There is no way for pack() and unpack()to know where characters are going to or coming from, so they handle their output and input as flat sequences of characters.

    • A () group is a sub-TEMPLATE enclosed in parentheses. A group maytake a repeat count either as postfix, or for unpack(), also via the /template character. Within each repetition of a group, positioning with@ starts over at 0. Therefore, the result of

      1. pack("@1A((@2A)@3A)", qw[X Y Z])

      is the string "\0X\0\0YZ".

    • x and X accept the ! modifier to act as alignment commands: theyjump forward or back to the closest position aligned at a multiple of countcharacters. For example, to pack() or unpack() a C structure like

      1. struct {
      2. char c; /* one signed, 8-bit character */
      3. double d;
      4. char cc[2];
      5. }

      one may need to use the template c x![d] d c[2]. This assumes thatdoubles must be aligned to the size of double.

      For alignment commands, a count of 0 is equivalent to a count of 1;both are no-ops.

    • n, N, v and V accept the ! modifier torepresent signed 16-/32-bit integers in big-/little-endian order.This is portable only when all platforms sharing packed data use thesame binary representation for signed integers; for example, when allplatforms use two's-complement representation.

    • Comments can be embedded in a TEMPLATE using # through the end of line.White space can separate pack codes from each other, but modifiers andrepeat counts must follow immediately. Breaking complex templates intoindividual line-by-line components, suitably annotated, can do as much toimprove legibility and maintainability of pack/unpack formats as /x canfor complicated pattern matches.

    • If TEMPLATE requires more arguments than pack() is given, pack()assumes additional "" arguments. If TEMPLATE requires fewer argumentsthan given, extra arguments are ignored.

    Examples:

    1. $foo = pack("WWWW",65,66,67,68);
    2. # foo eq "ABCD"
    3. $foo = pack("W4",65,66,67,68);
    4. # same thing
    5. $foo = pack("W4",0x24b6,0x24b7,0x24b8,0x24b9);
    6. # same thing with Unicode circled letters.
    7. $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
    8. # same thing with Unicode circled letters. You don't get the
    9. # UTF-8 bytes because the U at the start of the format caused
    10. # a switch to U0-mode, so the UTF-8 bytes get joined into
    11. # characters
    12. $foo = pack("C0U4",0x24b6,0x24b7,0x24b8,0x24b9);
    13. # foo eq "\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9"
    14. # This is the UTF-8 encoding of the string in the
    15. # previous example
    16. $foo = pack("ccxxcc",65,66,67,68);
    17. # foo eq "AB\0\0CD"
    18. # NOTE: The examples above featuring "W" and "c" are true
    19. # only on ASCII and ASCII-derived systems such as ISO Latin 1
    20. # and UTF-8. On EBCDIC systems, the first example would be
    21. # $foo = pack("WWWW",193,194,195,196);
    22. $foo = pack("s2",1,2);
    23. # "\001\000\002\000" on little-endian
    24. # "\000\001\000\002" on big-endian
    25. $foo = pack("a4","abcd","x","y","z");
    26. # "abcd"
    27. $foo = pack("aaaa","abcd","x","y","z");
    28. # "axyz"
    29. $foo = pack("a14","abcdefg");
    30. # "abcdefg\0\0\0\0\0\0\0"
    31. $foo = pack("i9pl", gmtime);
    32. # a real struct tm (on my system anyway)
    33. $utmp_template = "Z8 Z8 Z16 L";
    34. $utmp = pack($utmp_template, @utmp1);
    35. # a struct utmp (BSDish)
    36. @utmp2 = unpack($utmp_template, $utmp);
    37. # "@utmp1" eq "@utmp2"
    38. sub bintodec {
    39. unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
    40. }
    41. $foo = pack('sx2l', 12, 34);
    42. # short 12, two zero bytes padding, long 34
    43. $bar = pack('s@4l', 12, 34);
    44. # short 12, zero fill to position 4, long 34
    45. # $foo eq $bar
    46. $baz = pack('s.l', 12, 4, 34);
    47. # short 12, zero fill to position 4, long 34
    48. $foo = pack('nN', 42, 4711);
    49. # pack big-endian 16- and 32-bit unsigned integers
    50. $foo = pack('S>L>', 42, 4711);
    51. # exactly the same
    52. $foo = pack('s<l<', -42, 4711);
    53. # pack little-endian 16- and 32-bit signed integers
    54. $foo = pack('(sl)<', -42, 4711);
    55. # exactly the same

    The same template may generally also be used in unpack().

 
Source : perldoc.perl.org - Official documentation for the Perl programming language
Site maintained by Jon Allen (JJ)     See the project page for more details
Documentation maintained by the Perl 5 Porters
(Sebelumnya) Print a picture recordTest or set particular bits in ... (Berikutnya)