Regression Analysis for SEO Forecasting

When working on digital marketing strategy, one question that gets asked a lot is how to forecast the business impact of SEO projects. Is the company generating a positive ROI on their investment in a digital marketing team or agency?

It’s certainly an important question - although SEO leads to "free" traffic, SEO projects often consume a lot of time and resources, and there are basically no short-term results. Unlike sales activities that lead directly to revenue, digital marketing is at least a few steps removed, so some extra calculations are needed.


The cost for SEO projects is straightforward - how much are you paying for a digital marketing agency, or if you have an internal digital marketing team, how much are they getting paid? Also consider the opportunity cost - could the time and money spent for SEO projects be put to better use for a different lead generation pipeline such as PPC, digital PR, or trade shows?


Whenever possible, use past data to set realistic expectations and goals. Of course, many marketing teams already do this - use data sources such as Google Analytics and Salesforce to estimate the monetary value of site traffic.

But for SEO projects, we need to take another step backwards, and think about the monetary value of increases in factors like domain authority and Google ranking. You can look up general statistics, for example, the #1 ranked page gets about one third of the traffic - but when forecasting, that could largely impact your business, you'll want to delve a bit deeper and consider your industry and competitors.

Forecast Organic Traffic with Regression Analysis

Prepare a competitive research of your own site and competitors. Include the target keywords, domain rating, Google ranking and site traffic. If you don't already have this data, you can get a quick report from SEO tools like ahrefs or SEMrush.

Depending on the number of keywords you have, categorize them by what stage of the marketing funnel the users would be in. If your site is new or you're in a niche market, you might only have about 50 keywords or less though - in that case, just remove branded keywords as they will likely skew your forecast.

Then, use a multi-factor method to create a formula for forecasting site traffic based on domain rating and Google ranking. You can do this with a BI tool, programming, or a simple Excel sheet. You could start with the built-in forecasting tools, and then experiment with other statistical models to see which one fits best. For example, multiple linear regression (explained in the second part of this article) could be a possible solution.

Regression analysis is a common technique for understanding how a dependent variable such as organic traffic, changes in relation to independent variables such as search volume. It helps businesses decide which aspects of their service to prioritize and improve, by answering questions such as How do changes in price, delivery time, and order size impact NPS? or How does social media engagement impact page views?

If you have trouble formulating a statistically significant model, you can try adding other important factors such as the page's publish date or number of high-quality backlinks.

Limitations to SEO Forecasting

Because there are so many factors that affect SEO, you'll have to use forecasting as a general guide at best. The Google algorithm which supposedly considers 200+ factors is constantly updating. There are also outside factors like current events that significantly impact users' search activity. Although forecasting is a routine activity in SEO, there is a lot of room for error, and since SEO is affected by so many different factors, oftentimes you might not be sure where discrepancies are arising from.

Finally, as long as humans are conducting data analysis, there will be some level of cognitive bias involved. In layman's terms, the Dunning Kruger Effect means that "the less competent you are, the more confident you tend to be." Basically, people who are unskilled suffer a dual burden: they tend to reach erroneous conclusions and miss key insights, and on top of that, their incompetence robs them of the metacognitive ability to realize it.