Posted: Sept. 29, 2021
Updated: Feb. 7, 2022
Technical SEO is unrelated to the actual content on your site, but still affects organic rankings. This post serves as a basic checklist for things to consider in a technical SEO audit.
I recommend starting with Search Console and Screaming Frog to get the big picture and identify basic issues like page titles, meta descriptions, image size, canonicals and server response codes. Screaming Frog also offers advanced features to check more technical factors such as pagination issues, international SEO, and site structure.
XML sitemap is a file that helps search engines to understand your website structure and crawl it. It includes a list of all pages on your site, the pages prioritization, when they were last modified, and how frequently they are updated. Usually, the pages will be categorized by topic, post, product, etc.
You'll probably check the sitemap at the beginning of a technical audit. Find the sitemap of any page by typing /sitemapl.xml after the URL, for example,
https://example.com/sitemap.xml. If the site has multiple sitemaps, use /sitemap_index.xml
Register your sitemap with Google Search Console, which includes several tools to check technical SEO metrics such as mobile optimization and page speed. The Google Search Console XML Sitemap Report will give you the technical insight to achieve 1:1 ratio of URLs added to the site and updating the sitemap.
Ideally, your site has an internal linking structure that connects all pages efficiently, so you don’t need a sitemap. It’s actually optional. But it’s best practice for large sites to have a sitemap.
If your site has a sitemap, it's best practice to include all 200 OK URLs, and have a 1:1 ratio of exact URLs in the sitemap as there are on the site. 4xx and 5xx URLs, orphaned pages, and parameters should be removed.
Server Response Code and Redirects
Easily bulk check source codes with this Google Apps Script.
Use RegEx redirects for to bulk redirect multiple source URLs to the same destination, or .htaccess file for smaller scale redirects.
If your site is hosted on WordPress, be careful about using .htaccess, because it will be depreciated in php 7.4 and subsequent versions. WP Engine suggests alternatives such as using RegEx directly on WordPress or managing redirects in Yoast SEO Premium.
If you are completely removing a page, orphan it, then use a 410 so that Google can remove it more quickly.
On the topic of page speed, other best practices include using a fast DNS provider and minimizing http requests by keeping the CSS style sheet, scripts, and plugins to a minimum. You can also compress web pages by reducing image file size and cleaning up the code, especially for content in the first view. PageSpeed Insights is a free tool to check your page speed, which also provides specific recommendations on how to improve it.
The robots.txt file, also called the robots exclusion protocol or standard, is a text file that tells search engines which pages to crawl or not crawl. You can see the robots.txt file for any website by adding /robots.txt to the end of the domain. For example, https://www.reimorikawa.com/robots.txt
Search engines look at robots.txt first before crawling a site, so a disallowed page will be completely excluded.
Crawl budget is the number of pages Google crawls and indexes on a website within a given timeframe. If your pages exceed your site's crawl budget, Googlebot will not index the balance, which can negatively affect your rankings.
Performing regular log file analysis can provide insights about how Googlebot (and other web crawlers and users) are crawling your website, giving you the necessary information to optimize the crawl budget. If your site is large and has crawl budget issues, you can adjust the crawl rate via search console.
SSL (Secure Sockets Layer)
SSL (secure sockets layer) is a security technology that creates an encrypted link between a web server and browser. It’s clear if a website is using SSL because the URL will start with https (hypertext transfer protocol), not http. In 2014, Google announced that they want to see https everywhere, and websites using SSL will get priority for SEO. Google Chrome now displays warnings anytime a user visits a site that does not use SSL.
You can install an SSL certificate on your website. These days, most top website builders such as Wix include SSL by default.
Schema, also called structured data markup, enhances search results through the addition of rich snippets. For example, you can add star ratings or prices for your products. Adding schema by itself is not an SEO factor, but it is recommended by Google and can indirectly help improve rankings and increase page views.
JSON-LD is the preferred method for adding structured data to your site. It's easy to add schema markup to your site with Google's Structured Data Markup Helper and preview it with the Structured Data Testing Tool to check for any warnings.
For more information about how to add schema markup, see this blog post.
Canonical Link Element
Even if you don't have multiple parameter-based URLs of each page, different versions of your pages using https, http, www. .html, etc. can quickly add up. That's where the rel=canonical tag comes in - allowing you to manage duplicate content by specifying the canonical or preferred version of your page. This functions to report duplicate content and tell Google to consolidate the ranking signals, so your page won't be disadvantaged.
If you are using a CMS like Wix or Squarespace, your web hosting service might automatically add canonical tags with the clean URL. For example, my homepage already has one as well.
<link rel="canonical" href="https://www.reimorikawa.com"/>
Log File Analysis
The log file is your website’s record of every request made to your server. It includes important information such as: the URL of the requested page, http status code, IP address of the request server, timestamp, user agent making the request, request method (GET/POST), client IP, and referrer.
By performing log file analysis, you can gain insights about how Googlebot (and other web crawlers and users) are crawling your website. The log file analysis will help you answer important technical questions such as:
How frequently is Googlebot crawling your site?
How is the crawl budget being allocated?
How often are new and updated pages being crawled?
You can identify where the crawl budget is being used inefficiently, such as unnecessarily crawling static or irrelevant pages, and make improvements accordingly.
Obtain the Log File
The log file is stored on your web browser, and you can access it via your server control panels’ file manager, command line, or using an FTP client (recommended).
The server log file is commonly found in the following locations.
Tools and Software
You can convert your .log file to a .csv and analyze it in Microsoft Excel or Google Sheets, or use an online log file analyzer such as SEMRush or Screaming Frog Log File Analyser. The best log file analyzer will depend on your website and what tools you might already be using for technical SEO.
Performing regular log file analysis can be extremely useful for technical SEO, but there are some limitations. Page access and actions that occur via cache memory, proxy servers, and AJAX will not be reflected in the log file. If multiple users access your website with the same IP address, it will be counted as only one user. On the other hand, if one user uses dynamic IP assignment, the log file will show multiple accesses, overestimating the traffic count.
Additional Considerations for Technical SEO
Mobile User Experience: Google has been giving higher ranking to websites that have a responsive or mobile site since April 2015. At the same time, they released the mobile-friendly testing tool to help SEOs ensure that they would not lose rankings after this algorithm update.
AMP (Accelerated Mobile Pages): AMP is an open source framework that aims to speed up the delivery of mobile pages via AMP’s html code. Since Google has a mobile-first approach to indexing and having a responsive site is a significant ranking factor, you should consider enabling AMP.