$strLenBytes (aggregation)

Definition

  • $strLenBytes

New in version 3.4.

Returns the number of UTF-8 encoded bytes in the specified string.

$strLenBytes has the following operatorexpression syntax:

  1. { $strLenBytes: <string expression> }

The argument can be any valid expression as long as it resolves to a string. Formore information on expressions, see Expressions.

If the argument resolves to a value of null or refers to amissing field, $strLenBytes returns an error.

Behavior

The $strLenBytes operator counts the number of UTF-8encoded bytes in a string where each character may use between oneand four bytes.

For example, US-ASCII characters are encoded using one byte. Characterswith diacritic markings and additional Latin alphabetical characters(i.e. Latin characters outside of the English alphabet) are encodedusing two bytes. Chinese, Japanese and Korean characters typicallyrequire three bytes, and other planes of unicode (emoji, mathematicalsymbols, etc.) require four bytes.

The $strLenBytes operator differs from$strLenCP operator which counts thecode pointsin the specified string regardless of how many bytes each characteruses.

ExampleResultsNotes
  1. { $strLenBytes: "abcde" }
5Each character is encoded using one byte.
  1. { $strLenBytes: "Hello World!" }
12Each character is encoded using one byte.
  1. { $strLenBytes: "cafeteria" }
9Each character is encoded using one byte.
  1. { $strLenBytes: "cafétéria" }
11é is encoded using two bytes.
  1. { $strLenBytes: "" }
0Empty strings return 0.
  1. { $strLenBytes: "$€λG" }
7 is encoded using three bytes.λ is encoded using two bytes.
  1. { $strLenBytes: "寿司" }
6Each character is encoded using three bytes.

Example

Single-Byte and Multibyte Character Set

A collection named food contains the following documents:

  1. { "_id" : 1, "name" : "apple" }
  2. { "_id" : 2, "name" : "banana" }
  3. { "_id" : 3, "name" : "éclair" }
  4. { "_id" : 4, "name" : "hamburger" }
  5. { "_id" : 5, "name" : "jalapeño" }
  6. { "_id" : 6, "name" : "pizza" }
  7. { "_id" : 7, "name" : "tacos" }
  8. { "_id" : 8, "name" : "寿司" }

The following operation uses the $strLenBytes operator to calculatethe length of each name value:

  1. db.food.aggregate(
  2. [
  3. {
  4. $project: {
  5. "name": 1,
  6. "length": { $strLenBytes: "$name" }
  7. }
  8. }
  9. ]
  10. )

The operation returns the following results:

  1. { "_id" : 1, "name" : "apple", "length" : 5 }
  2. { "_id" : 2, "name" : "banana", "length" : 6 }
  3. { "_id" : 3, "name" : "éclair", "length" : 7 }
  4. { "_id" : 4, "name" : "hamburger", "length" : 9 }
  5. { "_id" : 5, "name" : "jalapeño", "length" : 9 }
  6. { "_id" : 6, "name" : "pizza", "length" : 5 }
  7. { "_id" : 7, "name" : "tacos", "length" : 5 }
  8. { "_id" : 8, "name" : "寿司", "length" : 6 }

The documents with _id: 3 and _id: 5 each contain a diacriticcharacter (é and ñ respectively) that requires two bytes toencode. The document with _id: 8 contains two Japanese charactersthat are encoded using three bytes each. This makes the lengthgreater than the number of characters in name for the documentswith _id: 3, _id: 5 and _id: 8.

See also

$strLenCP