ABAP Keyword Documentation → ABAP - Reference → Processing Internal Data → Character String and Byte String Processing → Expressions and Functions for String Processing → Regular Expressions → Syntax of Regular Expressions
Character String Patterns
Character strings are represented by concatenations or operators.
Other versions: 7.31 | 7.40 | 7.54
Concatenations
Concatenations are valid regular expressions that are written after each other. If r and s are regular expressions, the concatenation rs matches all character strings that can be formed from the concatenation of character strings that match r and s.
Examples
The following table shows some results of a test.
Pattern | Text | Match |
---|---|---|
H[aeu]llo | Hallo | X |
H[aeu]llo | Hello | X |
H[aeu]llo | Hullo | X |
H[aeu]llo | Hollo | - |
H[aeu]llo is the concatenation of five regular expressions for single characters.
Operators for Character Strings
These operators are made up of the special characters {, }, *, +, ?, |, (, ), and \. The special characters can be made into literal characters using the prefix \ or by enclosing with \Q ... \E.
Concatenation Operators
The operators {n}, {n,m}, *, +, and ? (whereby n and m are natural numbers, including zero) can be written directly after a regular expression r, and thus generate concatenations rrr... of the regular expression:
- The regular expression r{n} is equivalent to an n-fold concatenation of r. The regular expression r{0} matches an empty character string, and therefore also the offset before the first character of a character string, the spaces between the characters in character strings, and the offset after the last character in a character string.
- The regular expression r{n,m} is equivalent to at least n and a maximum of m concatenations of r. The value of n must be smaller than or equal to the value of m. The expression r{n,} is equivalent to at least n concatenations of r.
- The regular expression r? is equivalent to r{0,1}, which means the expression r or the empty character string.
- The regular expression r* is equivalent to r{0,}, i.e. a concatenation of r of any length, including the empty character string. When subgroups are used (see below), and in a text search, r* matches the longest possible substring (greedy behavior).
- The regular expression r+ is equivalent to r{1,}, i.e. a concatenation of any length of r excluding the empty character string. When subgroups are used, and in a text search, r+ matches the longest possible substring (greedy behavior).
- The regular expressions r{n,m}?, r*? and r+? are reserved for later language enhancements (non-greedy behavior) and currently trigger the exception CX_SY_INVALID_REGEX.
Note
A primary rule that applies to a regular expression with chaining operators is that the entire expression must match if possible. This rule limits the length of character strings that match concatenations with the operators * and +, and thus their greedy behavior.
Example
The following table shows some results from a test.
Pattern | Text | Match |
---|---|---|
Hel | Hello | X |
H. | Hello | X |
. | Hello | - |
. | Hello | X |
.+H.+e.+l.+l.+o.+ | Hello | - |
xHxexlxlxox | Hello | X |
l+ | ll | X |
Example
The first subexpression a+ is matched with the first five characters "aaaaa"
from text
, while the last "a" character from text
is left for the second subexpression a.
DATA TEXT type STRING.
DATA result_tab TYPE match_result_tab.
text = 'aaaaaa'.
FIND ALL OCCURRENCES OF REGEX '(a+)(a)'
IN text RESULTS result_tab.
Alternatives
The operator | can be written between two regular expressions r and s, and thus generates a single regular expression r|s, which matches both r and s.
Note
Concatenations and other operators are more binding than |, which means that r|st and r|s+ are equivalent to r|(?:st) or r|(?:s+), and not to (?:r|s)t or (?:r|s)+.
Examples
The following table shows some results of a test.
Pattern | Text | Match |
---|---|---|
H(e | a | u)llo |
H(e | a | u)llo |
He | a | ullo |
He | a | ullo |
Subgroups
The operators ( ... ) and (?: ... ) group concatenations of regular expressions together into one entity and thus influence the range of effectiveness of other operators such as * or |, which act on this entity. In this case, the regular expressions (r) and (?:r) match the regular expression r.
Note
The aforementioned greedy behavior of chaining operators also applies to subgroups, from left to right. However, this does not violate the primary rule that the entire regular expression has to match.
Examples
The following table shows some results of a test.
Pattern | Text | Match |
---|---|---|
Tral+a | Tralala | - |
Tr(al)+a | Tralala | X |
Tr(?:al)+a | Tralala | X |
In the first expression, the concatenation with the operator + acts on the literal character l, in the second and third expressions, it acts on the subgroup al.
Subgroups with Registration
The operator ( ... ) acts in the same way as (?: ... ) in the formation of subgroups. In addition, when the regular expression is compared with a character string, the substrings that match the subgroups ( ... ) of the expression, are stored sequentially in registers. In this process, an operator \1, \2, \3, ... is assigned to each subgroup, which can be listed within the expression after its subgroup, and thus acts as a placeholder for the character string stored in the corresponding register. In text replacements, the special characters $1, $2, $3, ... can be used to access the last assignment to the register.
The number of subgroups and registers is only limited by the capacity of the platform.
Note
The addition SUBMATCHES
of the statements
FIND and REPLACE
and
the eponymous column of the results table filled using the addition RESULTS
can be used to access the content of all registers for a found location. The class CL_ABAP_MATCHER contains the method GET_SUBMATCH for this purpose.
Examples
The following table shows some results of a test.
Pattern | Text | Match |
---|---|---|
(["']).+\1 | "Hello" | X |
(["']).+\1 | "Hello' | - |
(["']).+\1 | 'Hello' | X |
The concatenation (["']).+\1 matches all text strings of which the first character is " or ' and the last character is the same as the first. For both successful checks, the register contains the values " or '.
Example
The example demonstrates the greedy behavior of the + operator in subgroups and how it conforms to the primary rule that the entire regular expression must match if possible. The first subgroup is matched with as many "a" characters as possible. In this case, the first four characters "aaaa". For the other two subgroups, one "a" character is still left in each case.
DATA text TYPE string.
DATA result_tab TYPE match_result_tab.
text = 'aaaaaa'.
FIND ALL OCCURRENCES OF REGEX '(a+)(a+)(a+)'
IN text RESULTS result_tab.
Literal Characters
The operators \Q ... \E form a character string of literal characters from all enclosed characters. Special characters have no effect in this character string.
The following table shows some results of a test.
Pattern | Text | Match |
---|---|---|
.+\w\d | Special: \w\d | - |
.+\w\d | Special: \w\d | X |
.+\Q\w\d\E | Special: \w\d | X |
Reserved Enhancements
The character string (? ... ) is generally reserved for later language enhancements, and with the exception of the already supported operators (?:...), (?=...) and (?!...), triggers the exception CX_SY_INVALID_REGEX.