$regexFindAll (aggregation)

$regexFindAll (aggregation)

Definition

$regexFindAll

New in version 4.2.

Provides regular expression (regex) pattern matching capability inaggregation expressions. The operator returns an array of documentsthat contains information on each match. If a match is not found,returns an empty array.

MongoDB uses Perl compatible regular expressions (i.e. “PCRE” )version 8.41 with UTF-8 support.

Prior to MongoDB 4.2, aggregation pipeline can only use the queryoperator $regex in the $match stage. For moreinformation on using regex in a query, see $regex.

Syntax

The $regexFindAll operator has the following syntax:

{ $regexFindAll: { input: <expression> , regex: <expression>, options: <expression> } }

Field Description

input The string on which you wish to apply the regex pattern.Can be a string or any valid expression that resolves to a string.

regex The regex pattern to apply. Can be any valid expression that resolves to either a string or regexpattern /<pattern>/. When using the regex /<patthern>/, youcan also specify the regex options i and m (but not thes or x options):- "pattern"- /<pattern>/- /<pattern>/<options>Alternatively, you can also specify the regex options with theoptions field. To specify the s or x options, youmust use the options field.You cannot specify options in both the regex and the optionsfield.

options

Optional. The following <options> are available for usewith regular expression.NoteYou cannot specify options in both the regex and theoptions field.

Option	Description
`i`	Case insensitivity to match both upper and lower cases. Youcan specify the option in the `options` field or as part ofthe regex field.
`m`	For patterns that include anchors (i.e. `^` for the start,`$` for the end), match at the beginning or end of eachline for strings with multiline values. Without this option,these anchors match at beginning or end of the string.If the pattern contains no anchors or if the string value hasno newline characters (e.g. `\n`), the `m` option has noeffect.
`x`	“Extended” capability to ignore all white space charactersin the pattern unless escaped or included in a characterclass.Additionally, it ignores characters in-between and includingan un-escaped hash/pound (`#`) character and the next newline, so that you may include comments in complicatedpatterns. This only applies to data characters; white spacecharacters may never appear within special charactersequences in a pattern.The `x` option does not affect the handling of the VTcharacter (i.e. code 11).You can specify the option only in the `options` field.
`s`	Allows the dot character (i.e. `.`) to match allcharacters including newline characters.You can specify the option only in the `options` field.

Returns

The operator returns an array:

If the operator does not find a match, the operator returns an emptyarray.
If the operator finds a match, the operator returns an array ofdocuments that contains the following information for each match:
- the matching string in the input,
- the code pointindex (not byte index) of the matching string in the input, and
- An array of the strings that corresponds to the groups captured bythe matching string. Capturing groups are specified with parenthesis() in the regex pattern.

[ { "match" : <string>, "idx" : <num>, "captures" : <array of strings> }, ... ]

Behavior

$regexFindAll and Collation

$regexFindAll ignores the collation specified for thecollection, db.collection.aggregate(), and the index, if used.

For example, the create a sample collection with collation strength1 (i.e. compare base character only and ignore other differencessuch as case and diacritics):

db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )

Insert the following documents:

db.myColl.insertMany([
   { _id: 1, category: "café" },
   { _id: 2, category: "cafe" },
   { _id: 3, category: "cafE" }
])

Using the collection’s collation, the following operation performs acase-insensitive and diacritic-insensitive match:

db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )

The operation returns the following 3 documents:

{ "_id" : 1, "category" : "café" }
{ "_id" : 2, "category" : "cafe" }
{ "_id" : 3, "category" : "cafE" }

However, the aggregation expression $regexFind ignorescollation; that is, the following regular expression pattern matching examplesare case-sensitive and diacritic sensitive:

db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ }  } } } ] )
db.myColl.aggregate(
   [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ }  } } } ],
   { collation: { locale: "fr", strength: 1 } }       // Ignored in the $regexFindAll
)

Both operations return the following:

{ "_id" : 1, "category" : "café", "results" : [ ] }
{ "_id" : 2, "category" : "cafe", "results" : [ { "match" : "cafe", "idx" : 0, "captures" : [ ] } ] }
{ "_id" : 3, "category" : "cafE", "results" : [ ] }

To perform a case-insensitive regex pattern matching, use thei Option instead. Seei Option for an example.

Examples

$regexFindAll and Its Options

To illustrate the behavior of the $regexFindAll operator asdiscussed in this example, create a sample collection products withthe following documents:

db.products.insertMany([
   { _id: 1, description: "Single LINE description." },
   { _id: 2, description: "First lines\nsecond line" },
   { _id: 3, description: "Many spaces before     line" },
   { _id: 4, description: "Multiple\nline descriptions" },
   { _id: 5, description: "anchors, links and hyperlinks" },
   { _id: 6, description: "métier work vocation" }
])

By default, $regexFindAll performs a case-sensitive match.For example, the following aggregation performs a case-sensitive$regexFindAll on the description field. The regexpattern /line/ does not specify any grouping:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/ } } } }
])

The operationr returns the following:

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ]}, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] }
] }
 
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{
   "_id" : 6,
   "description" : "métier work vocation",
   "returnObject" : [ ]
}

The following regex pattern /lin(e|k)/ specifies a grouping(e|k) in the pattern:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k)/ } } } }
])

The operation returns the following:

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject": [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{
   "_id" : 6,
   "description" : "métier work vocation",
   "returnObject" : [ ]
}

In the return option, the idx field is the code point index and not the byteindex. To illustrate, consider the following example that uses theregex pattern /tier/:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /tier/ } } } }
])

The operation returns the following where only the last recordmatches the pattern and the returned idx is 2 (instead of 3if using a byte index)

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : [ ] }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] }
{ "_id" : 6, "description" : "métier work vocation",
             "returnObject" : [ { "match" : "tier", "idx" : 2, "captures" : [ ] } ] }

i Option

Note

You cannot specify options in both the regex and theoptions field.

To perform case-insensitive pattern matching, include the i option as part of the regex field or in the optionsfield:

// Specify i as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/i } }
 
// Specify i in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "i" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "i" } }

For example, the following aggregation performs a case-insensitive$regexFindAll on the description field. The regexpattern /line/ does not specify any grouping:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/i } } } }
])

The operation returns the following documents:

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ { "match" : "LINE", "idx" : 7, "captures" : [ ] } ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ] }, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

m Option

Note

You cannot specify options in both the regex and theoptions field.

To match the specified anchors (e.g. ^, $) for each line of amultiline string, include the m optionas part of the regex field or in theoptions field:

// Specify m as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/m } }
 
// Specify m in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "m" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "m" } }

The following example includes both the i and the m options tomatch lines starting with either the letter s or S formultiline strings:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /^s/im } } } }
])

The operation returns the following:

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ { "match" : "S", "idx" : 0, "captures" : [ ] } ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "s", "idx" : 12, "captures" : [ ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

x Option

Note

You cannot specify options in both the regex and theoptions field.

To ignore all unescaped white space characters and comments (denoted bythe un-escaped hash # character and the next new-line character) inthe pattern, include the s option in theoptions field:

// Specify x in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "x" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "x" } }

The following example includes the x option to skip unescaped whitespaces and comments:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } }
])

The operation returns the following:

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

s Option

Note

You cannot specify options in both the regex and theoptions field.

To allow the dot character (i.e. .) in the pattern to match allcharacters including the new line character, include the s option in the options field:

// Specify s in the options field
{ $regexFindAll: { input: "$description", regex: /m.*line/, options: "s" } }
{ $regexFindAll: { input: "$description", regex: "m.*line", options: "s" } }

The following example includes the s option to allow the dotcharacter (i.e. .) to match all characters including new line as wellas the i option to perform a case-insensitive match:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex:/m.*line/, options: "si"  } } } }
])

The operation returns the following:

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

Use $regexFindAll to Parse Email from String

Create a sample collection feedback with the following documents:

db.feedback.insertMany([
   { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com"  },
   { "_id" : 2, comment: "I wanted to concatenate a string" },
   { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
   { "_id" : 4, comment: "It's just me. I'm testing.  fred@MongoDB.com" }
])

The following aggregation uses the $regexFindAll to extractall emails from the comment field (case insensitive).

db.feedback.aggregate( [
    { $addFields: {
       "email": { $regexFindAll: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }
    } },
    { $set: { email: "$email.match"} }
] )

First Stage
The stage uses the $addFields stage to add a new fieldemail to the document. The new field is an array that containsthe result of performing the $regexFindAll on thecomment field:

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ ] } ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing.  fred@MongoDB.com", "email" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } ] }

Second Stage
The stage use the $set stage to reset the email array elements tothe "email.match" value(s). If the current value of emailis null, the new value of email is set to null.

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ "aunt.arc.tica@example.com" ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ "cam@mongodb.com", "c.dia@mongodb.com" ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing.  fred@MongoDB.com", "email" : [ "fred@MongoDB.com" ] }

Use Captured Groupings to Parse User Name

Create a sample collection feedback with the following documents:

db.feedback.insertMany([
   { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com"  },
   { "_id" : 2, comment: "I wanted to concatenate a string" },
   { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
   { "_id" : 4, comment: "It's just me. I'm testing.  fred@MongoDB.com" }
])

To reply to the feedback, assume you want to parse the local-part ofthe email address to use as the name in the greetings. Using thecaptured field returned in the $regexFindAll results,you can parse out the local part of each email address:

db.feedback.aggregate( [
    { $addFields: {
       "names": { $regexFindAll: { input: "$comment", regex: /([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } },
    } },
    { $set: { names: { $reduce: { input:  "$names.captures", initialValue: [ ], in: { $concatArrays: [ "$$value", "$$this" ] } } } } }
] )

First Stage
The stage uses the $addFields stage to add a new fieldnames to the document. The new field contains the result ofperforming the $regexFindAll on the comment field:

{
   "_id" : 1,
   "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
   "names" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ "aunt.arc.tica" ] } ]
}
 
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
   "_id" : 3,
   "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
   "names" : [
      { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ "cam" ] },
      { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ "c.dia" ] }
    ]
}
{
   "_id" : 4,
   "comment" : "It's just me. I'm testing.  fred@MongoDB.com",
   "names" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ "fred" ] } ]
}

Second Stage
The stage use the $set stage with the$reduce operator to reset names to an array that containsthe "$names.captures" elements.

{
   "_id" : 1,
   "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
   "names" : [ "aunt.arc.tica" ]
}
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
   "_id" : 3,
   "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
   "names" : [ "cam", "c.dia" ]
}
{
   "_id" : 4,
   "comment" : "It's just me. I'm testing.  fred@MongoDB.com",
   "names" : [ "fred" ]
}