$substrCP (aggregation)

Definition

  • $substrCP
  • Returns the substring of a string. The substring starts with thecharacter at the specified UTF-8 code point (CP) index (zero-based)in the string for the number of code points specified.

$substrCP has the following operatorexpression syntax:

  1. { $substrCP: [ <string expression>, <code point index>, <code point count> ] }

FieldTypeDescriptionstring expressionstringThe string from which the substring will be extracted. string expressioncan be any valid expression aslong as it resolves to a string. For more information onexpressions, see Expressions.

If the argument resolves to a value of null or refers to a fieldthat is missing, $substrCP returns an empty string.

If the argument does not resolve to a string or null norrefers to a missing field, $substrCP returns an error.code point indexnumberIndicates the starting point of the substring. code point indexcan be any valid expression as long asit resolves to a non-negative integer.code point countnumberCan be any valid expressionas long as it resolves to a non-negative integer or number that can berepresented as an integer (such as 2.0).

ExampleResults{ $substrCP: [ "abcde", 1, 2 ] }"bc"{ $substrCP: [ "Hello World!", 6, 5 ] }"World"{ $substrCP: [ "cafétéria", 0, 5 ] }"cafét"{ $substrCP: [ "cafétéria", 5, 4 ] }"tér"{ $substrCP: [ "cafétéria", 7, 3 ] }"ia"{ $substrCP: [ "cafétéria", 3, 1 ] }"é"

Behavior

The $substrCP operator uses the code points to extractthe substring. This behavior differs from the$substrBytes operator which extracts the substringby the number of bytes, where each character uses between one and fourbytes.

Example

Single-Byte Character Set

Consider an inventory collection with the following documents:

  1. { "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
  2. { "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
  3. { "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }

The following operation uses the $substrCP operator toseparate the quarter value into a yearSubstring and aquarterSubstring. The quarterSubstring field represents therest of the string from the specified byte index following theyearSubstring. It is calculated by subtracting the byte indexfrom the length of the string using $strLenCP.

  1. db.inventory.aggregate(
  2. [
  3. {
  4. $project: {
  5. item: 1,
  6. yearSubstring: { $substrCP: [ "$quarter", 0, 2 ] },
  7. quarterSubtring: {
  8. $substrCP: [
  9. "$quarter", 2, { $subtract: [ { $strLenCP: "$quarter" }, 2 ] }
  10. ]
  11. }
  12. }
  13. }
  14. ]
  15. )

The operation returns the following results:

  1. { "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
  2. { "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
  3. { "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }

Single-Byte and Multibyte Character Set

A collection named food contains the following documents:

  1. { "_id" : 1, "name" : "apple" }
  2. { "_id" : 2, "name" : "banana" }
  3. { "_id" : 3, "name" : "éclair" }
  4. { "_id" : 4, "name" : "hamburger" }
  5. { "_id" : 5, "name" : "jalapeño" }
  6. { "_id" : 6, "name" : "pizza" }
  7. { "_id" : 7, "name" : "tacos" }
  8. { "_id" : 8, "name" : "寿司sushi" }

The following example uses the $substrCP operator to create a threebyte menuCode from the name value:

  1. db.food.aggregate(
  2. [
  3. {
  4. $project: {
  5. "name": 1,
  6. "menuCode": { $substrCP: [ "$name", 0, 3 ] }
  7. }
  8. }
  9. ]
  10. )

The operation returns the following results:

  1. { "_id" : 1, "name" : "apple", "menuCode" : "app" }
  2. { "_id" : 2, "name" : "banana", "menuCode" : "ban" }
  3. { "_id" : 3, "name" : "éclair", "menuCode" : "écl" }
  4. { "_id" : 4, "name" : "hamburger", "menuCode" : "ham" }
  5. { "_id" : 5, "name" : "jalapeño", "menuCode" : "jal" }
  6. { "_id" : 6, "name" : "pizza", "menuCode" : "piz" }
  7. { "_id" : 7, "name" : "tacos", "menuCode" : "tac" }
  8. { "_id" : 8, "name" : "寿司sushi", "menuCode" : "寿司s" }

See also

$substrBytes