8 Expressions - 8.17 Bit Syntax Expressions - 《Erlang Reference Manual User's Guide v10.5》

8.17 Bit Syntax Expressions

8.17 Bit Syntax Expressions

<<>>
<<E1,...,En>>

Each element Ei specifies a segment of the bit string. Each element Ei is a value, followed by an optional size expression and an optional type specifier list.

Ei = Value |
     Value:Size |
     Value/TypeSpecifierList |
     Value:Size/TypeSpecifierList

Used in a bit string construction, Value is an expression that is to evaluate to an integer, float, or bit string. If the expression is not a single literal or variable, it is to be enclosed in parentheses.

Used in a bit string matching, Value must be a variable, or an integer, float, or string.

Notice that, for example, using a string literal as in <<"abc">> is syntactic sugar for <<$a,$b,$c>>.

Used in a bit string construction, Size is an expression that is to evaluate to an integer.

Used in a bit string matching, Size must be an integer, or a variable bound to an integer.

The value of Size specifies the size of the segment in units (see below). The default value depends on the type (see below):

For integer it is 8.
For float it is 64.
For binary and bitstring it is the whole binary or bit string.In matching, this default value is only valid for the last element. All other bit string or binary elements in the matching must have a size specification.

For the utf8, utf16, and utf32 types, Size must not be given. The size of the segment is implicitly determined by the type and value itself.

TypeSpecifierList is a list of type specifiers, in any order, separated by hyphens (-). Default values are used for any omitted type specifiers.

Type= integer | float | binary | bytes | bitstring | bits | utf8 | utf16 | utf32
The default is integer. bytes is a shorthand for binary and bits is a shorthand for bitstring. See below for more information about the utf types.
Signedness= signed | unsigned
Only matters for matching and when the type is integer. The default is unsigned.
Endianness= big | little | native
Native-endian means that the endianness is resolved at load time to be either big-endian or little-endian, depending on what is native for the CPU that the Erlang machine is run on. Endianness only matters when the Type is either integer, utf16, utf32, or float. The default is big.
Unit= unit:IntegerLiteral
The allowed range is 1..256. Defaults to 1 for integer, float, and bitstring, and to 8 for binary. No unit specifier must be given for the types utf8, utf16, and utf32.The value of Size multiplied with the unit gives the number of bits. A segment of type binary must have a size that is evenly divisible by 8.

Note

When constructing binaries, if the size N of an integer segment is too small to contain the given integer, the most significant bits of the integer are silently discarded and only the N least significant bits are put into the binary.

The types utf8, utf16, and utf32 specifies encoding/decoding of the Unicode Transformation Formats UTF-8, UTF-16, and UTF-32, respectively.

When constructing a segment of a utf type, Value must be an integer in the range 0..16#D7FF or 16#E000….16#10FFFF. Construction fails with a badarg exception if Value is outside the allowed ranges. The size of the resulting binary segment depends on the type or Value, or both:

For utf8, Value is encoded in 1-4 bytes.
For utf16, Value is encoded in 2 or 4 bytes.
For utf32, Value is always be encoded in 4 bytes.When constructing, a literal string can be given followed by one of the UTF types, for example: <<"abc"/utf8>> which is syntactic sugar for <<$a/utf8,$b/utf8,$c/utf8>>.

A successful match of a segment of a utf type, results in an integer in the range 0..16#D7FF or 16#E000..16#10FFFF. The match fails if the returned value falls outside those ranges.

A segment of type utf8 matches 1-4 bytes in the binary, if the binary at the match position contains a valid UTF-8 sequence. (See RFC-3629 or the Unicode standard.)

A segment of type utf16 can match 2 or 4 bytes in the binary. The match fails if the binary at the match position does not contain a legal UTF-16 encoding of a Unicode code point. (See RFC-2781 or the Unicode standard.)

A segment of type utf32 can match 4 bytes in the binary in the same way as an integer segment matches 32 bits. The match fails if the resulting integer is outside the legal ranges mentioned above.

Examples:

1> Bin1 = <<1,17,42>>.
<<1,17,42>>
2> Bin2 = <<"abc">>.
<<97,98,99>>
3> Bin3 = <<1,17,42:16>>.
<<1,17,0,42>>
4> <<A,B,C:16>> = <<1,17,42:16>>.
<<1,17,0,42>>
5> C.
42
6> <<D:16,E,F>> = <<1,17,42:16>>.
<<1,17,0,42>>
7> D.
273
8> F.
42
9> <<G,H/binary>> = <<1,17,42:16>>.
<<1,17,0,42>>
10> H.
<<17,0,42>>
11> <<G,J/bitstring>> = <<1,17,42:12>>.
<<1,17,2,10:4>>
12> J.
<<17,2,10:4>>
13> <<1024/utf8>>.
<<208,128>>

Notice that bit string patterns cannot be nested.

Notice also that "B=<<1>>" is interpreted as "B =<<1>>" which is a syntax error. The correct way is to write a space after '=': "B= <<1>>.

More examples are provided in Programming Examples.