Regular Expressions
In some cases, you can use regular expressions to increase the flexibility and adaptability of your Vuser scripts.
Regular expressions can be used in functions:
- lr_xml_find
- TE_find_text
- TE_wait_text
- Most Web Click and Script functions
- web_reg_dialog
- web_reg_save_param_ex
- web_reg_save_param_regexp
Note: In RTE Vuser scripts only, all regular expressions must begin with an exclamation point (!).
Regular expressions are a major programming topic, far beyond what is described here. They are often referred to as "REs" or "regexes", or in the singular, "RE" or "regex". You can get more information from the many textbooks dedicated to the subject and by searching the Internet.
This topic describes the features of REs most commonly used in Vuser scripts.
Note that the requirements of load testing rarely require the full power of regular expressions and that sophisticated use of regular expressions make a Vuser script harder to understand and debug. Moreover, use of REs adversely affects performance.
Avoid careless and unnecessary use of REs. For example, use of REs like ".+" and ".*" can have unexpected results.
With functions that take left and right boundaries, do not use REs to specify an open-ended boundary. To search from the beginning of a string to the right boundary, do not specify the left boundary. To search from the left boundary to the end of a string, do not specify the right boundary.
Regular Expression Syntax
REs specify a pattern which, when it occurs in a string, that string is said to match the RE pattern. Generally, there is a match if the RE occurs anywhere in the string, but an RE can be anchored to the beginning or end of the string. If the RE is preceded by a caret (^), it matches the string only if it occurs at the beginning of the string. If the RE ends with the dollar sign ($), it matches only if the RE occurs at the end of the string. The dollar sign is a common source of confusion, because there may be invisible characters at the end of a string, so that no match is found although casual inspection of the source seems to indicate that there is a match.
If an RE is anchored both to the beginning and the end, then it matches an entire string. For example, ^abc$
matches only "abc", but not "_abc" or "abc_".
Any character or phrase that is not one of the special metacharacters described below is matched literally. For example, a regular expression of "abc
" matches exactly that: "abc
".
Some characters, called metacharacters, have special meanings. The most common are square brackets ([ and ]), the backslash (\), the caret (^), the dollar sign ($), the dot (.), the vertical bar (|), the question mark (?), the asterisk (*), the plus sign (+), and parentheses ( and ). To search for an occurrence of a literal character that serves as a metacharacter, "escape" it by preceding it with a backslash. For example, to search for the string "Enter a \ or |", use the regular expression, "Enter a \\ or \|".
Note that ANSI C will treat a single backslash as part of the C syntax, rather than as part of the regular expression. In C scripts, to pass a regular expression that contains a backslash, precede the RE backslash with another backslash. For example, to pass the regular expression, \*
, meaning to find a literal asterisk, pass \\*
.
web_reg_save_param_regexp( "ParamName=stam", "RegExp=\\*", LAST);
Some metacharacters only function as metacharacters when they appear in specific contexts or positions, as shown in the following table. If you want to use a metacharacter as a literal character and it's not clear how it will function, you can always escape it to avoid doubt.
Meta-character | Means | When it occurs |
---|---|---|
[ | Begins a character set or range | Anywhere except between brackets |
] | Ends a character set or range | As first ] after an opening bracket, [ |
( ) | Indicates that the characters between the parentheses are treated as a group | Anywhere |
^ | 1 - Anchors the RE to the start of the string 2 - Negates a character set. | As first character in the RE As first character after opening bracket ([) in character set |
$ | Anchors the RE to end of string | As last character in the RE |
. | Matches any one character | Anywhere |
? | Indicates that the preceding character or group is optional. For example, "ab?c" matches both "abc" and "ac". "A(123)?B matches both "AB" and "A123B" | Anywhere except the first character in the RE |
+ | Indicates that the preceding character or group appears one or more times | Anywhere except the first character in the RE |
* | Indicates that the preceding character or group is optional but can appear any number of times | Anywhere except the first character in the RE |
\ | The escape character. Indicates that the following character is interpreted literally | Anywhere except after itself |
| | Indicates that either the preceding or following character or group appears. "(ABC)|(123)" matches "ABC" or 123 | Anywhere except the first character in the RE |
A common pitfall in the use of REs is copying a phrase containing metacharacters from another source into the script. Always check phrases pasted into REs for metacharacters and escape them as necessary.
The following are among the options that can be used to create regular expressions:
A dot (.) matches any single character. For example, "welcome."
matches welcomes, welcomed, or welcome followed by a space or any other single character. If you specifically want to match welcome
followed by a period and no other character, escape the dot: "welcome\.".
A series of dots matches the same number of unspecified characters as there are dots.
To match a single character from a character set, use square brackets ([ ]). For example, to search for a date that is either 1968 or 1969, use:
196[89]
Use a hyphen (-) in a character set to indicate a range. (Note that a hyphen functions as a metacharacter here, but in all other contexts it is a regular literal character.) For example, to match any year in the 1960s, use:
196[0-9]
A hyphen does not signify a range if it appears as the first or last character within brackets, or immediately after a caret (^).
A caret (^) as the first character after the left bracket creates a negated character set. A negated character set matches any character except for those specified. For example:
[^A-Za-z]
matches any non-alphabetic character. The caret has this special meaning only when it appears first within the brackets.
Note that within brackets, all special characters, except ], \, ^,
and -
are literals, not metacharacters. If the right bracket is the first character in the range, it is also literal. For example,
"[]g-m]"
matches the right bracket, and g through m. Another way of expressing this is by placing a backslash before the right bracket:
"[g-m\]]"
An asterisk (*) matches zero or more occurrences of the preceding character or character class. If the asterisk follows a period, the search locates any combination of characters. For example:
FAQ*
matches FA, FAQ, FAQQ, FAQQQ, etc.
[a-zA-Z ]*
matches a string of any length containing only letters. In the string "abc9hij", this RE matches "abc", stopping at the `9'. Note that because the asterisk matches zero or more occurrences, [a-zA-Z ]* also matches an empty string.
The plus sign (+) behaves like the asterisk, but matches one or more occurrences. Therefore, Q+ does not match FA without a `Q' and [a-zA-Z ]+ does not match an empty string.