Posted: July 16, 2021
Updated: Jan. 7, 2022
RegEx (regular expression) is a sequence of characters to define a search pattern, often used in search tools for functions such as find and find and replace. The right RegEx can neatly clean up a list of client or company names in Google Sheets with the click of a button - find and remove duplicates, remove titles such as Dr., Mr., Inc., and Co. Ltd., all in one go.
Many programming languages have RegEx either built-in or via libraries. For example, Google BigQuery is based on SQL but supports RegEx as well, allowing users to write cleaner code and avoid complicated subqueries.
In standard SQL, the query below extracts the substring between two parentheses.
substr (column, instr (column, '(')+1,
instr (column, ')') instr(column '(')-1)
In ReGex, you can rewrite it as follows.
regex_extract (column, r'/((.*)')
Basic RegEx:
. Any character
.* 0 or more characters
.+ 1 or more characters
? Optional character
^ Beginning of a line
$ End of a line
\ Escape a special character
REGEXREPLACE(A2,"\?[^?]*$","")
Extract substring to the left of ?, useful for removing parameters from URLs
[/]*$
Remove the final / in a string, if exists, which is useful for identifying duplicate URLs
RegEx is available in many well-loved digital marketing tools such as Data Studio, Tag Manager, BigQuery, and recently Search Console.
Resources
Slash Escape: interactive learning game
regex101: test and debug.