If you're reaching the limits of what you can do with Microsoft Excel, it maybe time to move on to a more powerful tool. Instead, R can handle large datasets and create more detailed visualizations. Working in R is like a spreadsheet, but you interact with it through programming functions rather than clicking around with your mouse.
R is a programming language for statistical computing, developed from the earlier version S. It can be expanded with a library of 10,000+ packages, which are add-on features for accomplishing tasks - kind of similar to the vast range of WordPress plugins. This GitHub repository has a good list of R packages for digital marketing so that you can interact with familiar tools such as GA, GTM, Search Console, and social media and PPC platforms, via R.
When you first install R from CRAN, you will have base R, which includes the necessary machinery to run R on your computer, as well as standard R packages - stats, utils, graphics. You can download and install tidyverse with the following command.
Tidyverse was born in 2014, and is becoming increasingly mature. It includes the following core packages:
readr: read data
tidyr: tidy data
dplyr: transform data and work on relational databases
ggplot2: plot data and create static charts for visualization ※ For rich interactive charts, use htmlwidget instead)
Fortunately, R is fairly easy to set up, write, and QA. It's a centralized language with only one actively supported version, and there are plenty of resources such as the R Documentation and StackOverflow.
If you know basic Excel functions like VLOOKUP and pivot tables, you can soon become familiar with R. You probably won't miss any important features from Excel, and there is a lot more to gain. For example, the dplyr and ggplot packages can mimic VLOOKUP and pivot tables on Excel, but you can do more advanced data analysis and visualization.
R makes it easy to create statistical models, but background knowledge about statistics is required. Otherwise, you won't know which test to use or how to interpret the results.
R for Digital Marketing
From automating Google Analytics reporting to analyzing A/B tests, R offers a blue ocean of uses for digital marketing.
One of the easiest digital marketing projects to start off in R is web crawling. There are many ways to do this - for example, you can just open a connection to a webpage using the url function, then use readLines to read elements from the page.
But there's no need to reinvent the wheel. Rcrawler is a package for web crawling and extracting structure data, which you can then use for a wide range of analyses.
In R, factor analysis is implemented by the factanal() function of the built-in stats package, which can perform maximum-likelihood factor analysis on a covariance or data matrix. Factor analysis is useful in marketing to assess a brand or product's position relative to competitors, and estimate the strategic positioning.
Correlation and Regression Analysis
R's plot function can create a scatterplot of any input variables, so you can quickly find the answer to questions like, how are your PPC and SEO results correlated with sales, and what numerical impact did your brand awareness campaign make on future lead generation? Based on the scatterplot, you can then perform linear or multiple regression to see the strength of correlation, and forecast future results if needed.
The survival package, as the name suggests, provides survival analysis. For example, you can compare the LTV between multiple customer groups.
The tidytext and dplyr packages allow you to perform sentiment analysis on your customer reviews, social media mentions, and survey comments, turning free text into useful data.
As I mentioned in the previous item about sentiment analysis, a lot of marketing data can come in the form of natural language. If you're dealing with a text corpus such as social media posts or survey answers, you can create a word cloud, a useful descriptive tools for capturing basic qualitative insights. Since word clouds are visually attractive and easy to understand at a glance, you can use them in presentations for stakeholders from any background. This Medium article explains in-depth how to generate word clouds in R.
Hierarchical Linear Modeling (HLM)
In most cases, data tends to be clustered. For example, marketing teams often think of their target audiences as belonging to one of several segments. Hierarchical Linear Modeling (HLM) allows you to break down clustered data and analyze it at a subgroup or individual level. Here is a step by step guide to using R for mixed model analysis.
Structural Equation Modeling
Structural Equation Modeling (SEM) using the lavaan package is a broad topic within the R statistical programming language, building on linear regression described above. One marketing application of SEM is customer survey validation. Survey responses are often interrelated in complex ways, and you can use R to reduce data fields and analyze how those underlying latent variables are related to one another.
Choice models are used to predict the market share for new products. This is an analysis of how customers think about feature prioritization: product, pricing, and ultimately profile design. If the product designer is creating multiple products, they can use the choice model to find the optimal coverage of as many important priorities as possible.
In this workshop by the authors of R for Marketing Research and Analytics, Chris Chapman provides an in-depth example of structural equation modeling in the first half, and Elea McDonnell Feit provides a detailed explanation of choice modeling in the second half.