Posted: July 16, 2021

Updated: Jan. 7, 2022


RegEx is an easy to learn and amazingly useful tool for SEO. If you are working in web analytics, it will be a great time saver. You don't need programming knowledge to use RegEx - although there are some similarities, RegEx is not a programming language, and does not run and make decisions.


What is RegEx

RegEx (regular expression) is a sequence of characters to define a search pattern. For example, RegEx is used in search tools for functions such as find and find and replace. The right RegEx can neatly clean up a list of client or company names in Google Sheets with the click of a button - find and remove duplicates, remove titles such as Dr., Mr., Inc., and Co. Ltd., all in one go.


Many programming languages have RegEx either built-in or via libraries. For example, Google Big Query is based on SQL but supports RegEx as well, allowing users to write cleaner code and avoid complicated subqueries.


In standard SQL, the query below extracts the substring between two parentheses.

substr (column, instr (column, '(')+1,
instr (column, ')') instr(column '(')-1)

In ReGex, you can rewrite it as follows.

regex_extract (column, r'/((.*)')

Here are some basic RegEx to get started:

. Any character

.*  0 or more characters

.+ 1 or more characters

?  Optional character

^  Beginning of a line

$  End of a line

\  Escape a special character


RegEx for SEO

Some common uses of RexEx might be mass-updating a product or service name for a rebrand, or web scraping to extract metadata.


In Google Analytics, RegEx are specific sequences of characters that match patterns in your analytics data. It helps you effectively customize your queries for view filters, goals, segments, audiences, channel groupings, etc. RegEx also works for Data Studio, Tag Manager, BigQuery, and recently Search Console.


RegEx Resources

Posted: Aug, 19, 2021

Updated: Jan. 6, 2022


Python is perfect for automating repetitive tasks, leaving more time for your SEO strategy. This post introduces python scripts that you can use with little to no programming knowledge.


Python Modules for SEO

Before we get started, here are the most common python modules for SEO.

  • webbrowser: opens a browser to a specific page

  • requests: downloads files and webpages from the internet

  • bs4: parses html and xml

  • selenium: launches and controls a web browser

  • google-search-results: scrape and parse Google, Google Scholar, Bing, Baidu, Yandex, Yahoo, Ebay results using SerpApi

webbrowser is part of the python standard library, so it should automatically be installed when you installed python3.


For the remaining modules, you can install them with pip, which is also pre-installed. Simply type pip install followed by the module name. For example, pip install bs4.


If you're not sure whether the modules are installed, or you just want to double-check, run this test in terminal.

$ python -c "import webbrowser"
$ echo $?
0 # webbrowser module exists in system

$ python -c "import bs4"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: No module named bs4
$ echo $?
1 # bs4 module does not exist in system

To get a list of installed modules without pip, type help("modules") in any python console. To do the same thing using pip, type pip list.


Now, the python scripts introduced in this post are mostly focused around data extraction and analysis. You can use most of them by simply copy/pasting even if you are a non-programmer. If you are already familiar with python and html, of course you can edit and customize the templates.


Python SEO Analyzer

The Python SEO Analyzer is a comprehensive crawler for your website or XML sitemap, that analyzes some of the most important ranking factors: page title, meta description, keywords, image alt text, and word count. This is useful for automating simple audits.


URL scraping

This article explains 3 ways to scrape search results pages in bulk using python. The author provides an example of scraping hostels data from Hostelworld, but the same workflow can be applied for your specific needs. For more on web scraping with python for beginners, I recommend the web scraping section of Automate the Boring Stuff with Python.


Screaming Frog Shingling

Most SEOs are probably already familiar with Screaming Frog, but this python script can make your audit process easier. Screaming Frog Shingling uses the Screaming Frog internal html with text extraction, along with a shingling algorithm, to compare possible duplicate content across your pages. Once the comparison is complete, it will create a csv file that includes similarity score between pages.


Optimize Images

Optimize Images is a command-line interface (CLI) utility to help you reduce image file sizes and boost page speed and performance.


Check for Page Updates

This depends on the page source code, but you can start by checking if either the last-modified date or etag were updated.

import requests

url = 'yoururl'
headers = requests.get(url).headers
last_mod = headers['Last-Modified']
etag = headers['Etag']

print('Etag: {}'.format(etag))
print('Last-Modified: {}'.format(last_mod))

If the last-modified and etag are not available, you can try a workaround with hashlib or httplib, or check this urlwatch project.


Time Series Analysis

If your business is on a seasonal or cyclical schedule, you will need to do time series analysis to compare results over relevant time periods. There are many different ways to do time series analysis, but this comprehensive article explains how to create an additive model for financial time series data using python and the Prophet forecasting package developed by Facebook.


External Links

The code below, only slightly modified from the template in Web Scraping with Python published by O'Reilly, outputs all external links from a given URL.

import urllib
from bs4 import BeautifulSoup
from urllib.request import urlopen
from urllib.parse import urlsplit
import re
ext = set()
def getExt(url):
    o = urllib.parse.urlsplit(url)
    html = urlopen(url)
    bs = BeautifulSoup(html, 'html.parser')
    # Remove duplicate output for https and http
    for link in bs.find_all('a', href = re.compile('^((https://)|(http://))')):
        if 'href' in link.attrs:
            if o.netloc in (link.attrs['href']):
                continue
            else:
                ext.add(link.attrs['href'])
getExt('https://www.reimorikawa.com/')
for i in ext:
    print(i)

As you can see, I tested it with the homepage for my website. The output just showed the social media links that I have in the footer, which are the only external links on reimorikawa.com

https://www.facebook.com/rei620m
https://www.linkedin.com/in/reimorikawa/
https://www.twitter.com/rei620m/
https://www.instagram.com/rei620m/

Crawl 404

As the name suggests, Crawl 404 checks pages via XML sitemap, to find 404 errors, i.e., broken links. The way it works is by searching for links ("href" attribute), checking their response code, and dumping 404 error URLs to a text file. If after running Crawl 404, you need to mass-create URL redirects, please also check my recent article: How to Create URL Redirects in Google Sheets


Organize Local Files

This last use case diverges from the theme of web scraping so far, and is not limited to digital marketing. You can use the shutil module for basic interactions with your local os, such as reading, renaming, and moving files. I recently started using this method to automate my workflow for invoicing, which saved a lot of time on operational tasks.


See my repository here. This python script creates a new folder for organizing invoices, and checks that the file size is not too large.


Posted: June 23, 2021

Updated: Dec. 8, 2021


Google Sheets is a collaborative, user-friendly software that lets you easily manipulate data and share reports, making it an important tool for digital marketing. Knowing how to get the maximum utility out of Google Sheets will surely boost your performance and help you work smarter, not longer.


In this post, I will list up some interesting and effective uses of Google Sheets that I've actually been using for my digital marketing projects.


Create URL Redirects

If you need to mass-create a lot of redirects, such as for a site migration, you can quickly do that using Google Sheets to create the required RegEx code and make sure that all URLs are healthy. I explained this in detail and shared a template in a previous article: How to Create URL Redirects in Google Sheets.


Separate URLs into their Components

If you're working in SEO, at some point you will have a need to separate URLs into their components, such as for analyzing a sitemap or site structure. You can do this by going to the data menu in google sheets → split text to columns, and then choose "/" as the separator. Or, you can use the split formula: =SPLIT(A1,“/”,true,true)


Check Your Users' Browser Window Widths

This one is probably most useful when creating a new landing page. You can get the browser size (both width and height) in Google Analytics, and then export that data to Google Sheets to see the distribution of browser window widths. I first learned about this in a blog post from 2018, and have used it regularly. It's a bit old but still works the same: How to make a chart of your users' window sizes.


Ad Spend Tracker

You can use simple subtraction, division, and date formulas to pace the monthly ad spend. When managing PPC campaigns, I usually setup a quick Google Sheet to track the monthly ad spend compared to days elapsed in the month to make sure that I'm spending the budget evenly.

=DAY(TODAY())/DAY(DATE(YEAR(NOW()),MONTH(NOW())+1,0)) 

If I have enough accumulated data, I also create a few simple formulas to project the total monthly conversions, to make sure I'm on track to meet my goals.


You can also do this with Google Data Studio, as explained in this Medium article. However, Google Data Studio has been a bit finicky for me these days especially if you bring in blended data, so I prefer keeping my analysis in sheets if possible.


Predict Organic Traffic

As a prerequisite to performing this function, you'll need to prepare a competitive research of your site and competitors. Include basic metrics such as target keywords, domain rating, Google ranking, and site traffic. Then, use a multi factor forecasting method to create a formula for predicting site traffic based on domain rating and Google ranking. You can do this with a programming language, but a simple Google Sheet works fine too. I discussed this as part of a previous article: How to Measure ROI for SEO Projects.


Google Ads add-on

There is a built-in Google Ads add-on in Google Sheets that you can use to pull campaign data. I've found this tool to be a bit buggy so I don't use it to automate regular reporting, and my view is that you should be checking your Google Ads account itself anyway. But the add on is useful for quickly pulling data for one-time analyses, especially for larger accounts.


Google Analytics add-on

Similarly, there is also a Google Analytics add-on in Google Sheets. I recently started using it after talking to colleague, and had an overall better experience than with the Google Ads add-on. You can automatically populate Google Sheets with analytics data, modify it where needed (fix historical data, make any customizations, etc.) and use that to import into Data Studio.


Identify SEO Issues

If you're an SEO professional, you are probably already using tools like ahrefs or SEMrush. But in an emergency, you can also quickly identify SEO issues with Google Sheets by importing the html of pages in your site, then checking for things like missing h1, title, meta tags, image alts, etc.


Use the ImportXML function to scrape structured data from webpages. Then, you can use XPath query to specify which part of the XML to extract.


Some simple XPath queries:

  • //h1 to scrape h1s

  • //title to scrape titles

  • //@href to scrape links

Other common XPath queries for SEO:

  • //meta[@name='description']/@content for the meta description

  • //link[@rel=’canonical’]/@href for canonicals


---


The next few examples use Google Apps Script, which is a great way to automate data movement between Google properties and save yourself time from repetitive tasks. It mostly runs on JavaScript 1.6, with some additions from 1.7 and 1.8. So you’ll need some JavaScript knowledge to edit the script and link it to Google properties, but not to the level of app or website development.


Email Reminders

Recently, I was asked to set up a CRM system in Google Sheets, to send reminders to follow up with prospective customers. I delivered that task using Google Sheets script, using a modification of this system: Trigger Email Reminders Based on Dates in Google Sheets.


Email Notifications

Although most SEO tools have an option to get email notifications, Google Apps Script allows you to customize the exact type and timing. You can feed your SEO data directly into Google Sheets by connecting it with Google Search Console, or third-party tools such as SEMRush or Screaming Frog. This can be useful for tracking KPIs and addressing issues as soon as they arise. For example, you can set up alerts for when a certain keyword falls below the top 3 search results, or the crawl rate drops below 50%.


Confirmation Emails

Still staying with the theme of emails, you can use Google Apps Script to send confirmation emails to users who fill out your Google Form. Although Google Forms has an option to send a standard confirmation email, Google Apps Script allows for further customization. For example, you can send different confirmation emails depending on the user's responses, or include other outgoing links. Here's a post by Google Workspace for Developers that explains in depth how to do this.


Site Migration

Use Google Apps Script to make sure all pages are accounted for properly during a site migration. You can write a script to keep an eye on the status code of all pages before and after the migration, or determine the redirects chain if any. Here’s a step-by-step guide on how to use Google Apps Script to check for broken links. Another use case is checking which pages have been indexed.


Scrape SERPs

Tracking your own site is not enough; you need to be aware of competitors to succeed in SEO. Fortunately, you can use Google Apps Script for this too. You can create a script to retrieve Google search results and capture data that interests you: url, title, meta description, ranking position, etc., all according to pre-selected criteria such as language and keyword.


For more Google Sheets resources, you can check keywords in sheets for free scripts for SEO. There are different templates for tasks such as getting old webpages from wayback machine, SEO forecasting and clustering.


Manage Google Calendar Events

This last use case isn't directly related to digital marketing, but it can be useful for anyone who needs to manage a lot of meetings or appointments. You can automatically export from Google Sheets to Google calendar events.



Digital Marketing Blog