RegEx for SEO Projects

Posted: July 16, 2021

Updated: Jan. 7, 2022

RegEx (regular expression) is a sequence of characters to define a search pattern, often used in search tools for functions such as find and find and replace. The right RegEx can neatly clean up a list of client or company names in Google Sheets with the click of a button - find and remove duplicates, remove titles such as Dr., Mr., Inc., and Co. Ltd., all in one go.

Many programming languages have RegEx either built-in or via libraries. For example, Google BigQuery is based on SQL but supports RegEx as well, allowing users to write cleaner code and avoid complicated subqueries.

In standard SQL, the query below extracts the substring between two parentheses.

substr (column, instr (column, '(')+1,
instr (column, ')') instr(column '(')-1)

In ReGex, you can rewrite it as follows.

regex_extract (column, r'/((.*)')

Basic RegEx:

. Any character

.*  0 or more characters

.+ 1 or more characters

?  Optional character

^  Beginning of a line

$  End of a line

\  Escape a special character

  • Extract substring to the left of ?, useful for removing parameters from URLs

  • Remove the final / in a string, if exists, which is useful for identifying duplicate URLs

RegEx is available in many well-loved digital marketing tools such as Data Studio, Tag Manager, BigQuery, and recently Search Console.