A regex is a string of text that allows you to create patterns that help match, locate, and manage text. Sheetkraft contains various useful regex functions that can be helpful for matching, replacing, searching strings. We use various dialects to create patterns for any operation (Search, Match, Replace, Split) for regex:
. matches any single character, not just an actual dot character. Dot is thus a special character in a pattern.
. matches a literal dot. The backslash is an escape character. It escapes the meaning that the following character would have if the backslash were not present. Since dot matches any character, escaping this meaning makes it match the literal dot only.
\ matches a single literal backslash. The first backslash escapes the meaning of the second backslash so that the second backslash is no longer an escape character.
abc matches the literal text abc. All letters and digits match themselves. They do not have a special meaning.
\s matches a single space character. Other than the normal space (character code 32 there are various other characters like tab, non-breaking space etc that are considered spaces. \s matches any such space character.
\S matches any single character that is not a space. In general, if a backslash followed by a lower-case letter matches something, a backslash followed by an upper-case letter matches anything other than that something.
\d matches any single digit.
\w matches any single word character. A word character matches a letter or digit or underscore.
\d\d matches two consecutive digits.
\d\d.\d\d matches two consecutive digits immediately followed by a literal dot immediately followed by two consecutive digits.
[a-z] matches any single character in the range lower case a to lower case z
[a-z0-9] matches any single character in the range a to z or 0 to 9.
[0-9_] matches any single character in the range 0 to 9 or an underscore.
[-+] matches any single character that is a literal minus or a literal plus. Minus does not have the same meaning as in the examples above because there is no character between the opening square bracket and itself.
[0-9] matches any single character that is a 0 or a literal minus or a 9. Minus does not form a range because it is escaped.
[[]] matches any single character that is an opening square bracket or a closing square bracket. Note that the second opening square bracket does not need to be escaped but the first closing square bracket needs to be escaped. This same pattern could also be written as [[]] where the second opening square bracket is escaped unnecessarily. This works because there is no meaning defined for an escaped opening square bracket in this context, so the escape character is effectively ignored.
\d* matches zero or more consecutive digits
\d+ matches one or more consecutive digits
[a-z]+ matches one or more consecutive characters each of which is in the range a to z.
\d{1,3} matches at least 1 and at most 3 consecutive digits
\d{3,} matches 3 or more consecutive digits
v\d{3,} matches a literal v followed by 3 or more consecutive digits
\d{5} matches exactly 5 consecutive digits
? is a shortcut for {0,1}, just as * is a shortcut for {0,} and + is a shortcut for {1,}
abc|def matches either abc or def. | has very low precedence. So, this is not ab followed by either of c or d followed by ef. It is either of abc or def.
We can restrict where the match occurs by using anchors ^ and $
^abc matches any text starting with abc. It does not match xabc
abc$ matches any text ending with abc. It does not match abcy
^a.c$ matches any text starting with a followed by any single character and ending with c. It matches abc but does not match xabc or abcy
^^abc is equivalent to ^abc. To get a literal ^, it will need to be escaped as in ^^abc which matches ^abc.
Let us consider a simple case:
(Note: This is a simple application of regex functions. This can be implemented using various other search patterns. During Implementation you can come across various problem statements, for example: file path parsing, date extraction, Account number extraction from a bank statement, which can be achieved effectively using regex functions.)
Question: You need to extract the buyer, seller, number of units and date from the given data. Then from the original data you need to remove the Date to form a new data. Check whether units were not lost from the new data. After that finding the day, month, year from date. All the above tasks should be done using regex functions.
Fill down the formulae using filldown.sk function.
Fill down the above formula for all the entries.
We want to store all the details in the database. We will use lazy regex here instead of greedy. We will fetch table data using Regex 2a and Regex 2b between two strings (See Figure).
We can see the function arguments from UI of Microsoft Excel (See Figure)
If the pattern has multiple capture groups, then the 3rd argument is used (if required).
Capture Groups: In order to extract parts of a match, the part has to be put in parenthesis, which is called as capturing group.
For Example:
(\d{3}).(\d{4}) matches a sequence of 3 digits followed by a literal dot followed by a sequence of 4 digits.
(?:abc){2}(\d{3}) matches two consecutive occurrences of abc followed by three digits. , for example abcabc121. Parenthesis create a capturing group, but sometimes we want to group without capturing the grouped part. Here abc is grouped but not captured. Grouping it allows specification of the quantifier {2} that applies to the whole group. \d{3} is grouped and captured. Since there is no capturing group before it, it is numbered 1. Non-capturing groups are not numbered as there would be no point to numbering them.
((?:abc){2}|(?:def){2})xyz matches either abcabcxyz or defdefxyz. This contains a single capturing group.
((abc){2}|(def){2})xyz also matches either abcabcxyz or defdefxyz. But this contains three capturing groups. The number of the capturing group is based on the order in which the opening parenthesis of the capturing group occurs in the pattern. The first capturing group is (abc){2}|(def){2}, the second group is abc and the third is def.
\\/+ matches a sequence of at least one character that is not a forward or backward slash. ^ negates the character class. This can be used to match a part of a file system path. For example \\/+$ can be used to extract the file name from a path.
In regex functions, “1$”, “2$” is used, which depicts the 1st capturing group and 2nd capturing group, respectively. This can be used in Regexsearch, SearchAll, Replace functions.
For example, we want to maintain a database of employees’ vehicles in some database (assuming 1 vehicle per head) in multiple capture groups such as State/Union Territory, RTO.
Final Output will look like this:
The same example in Regex Replace can be seen here. Regex replace can be used using sheetkraft UI in “More” options. (See Figure)