What does a modern marketing reporting stack look like?

Lessons I learnt from working for startups with strong analytics needs and various data sources.

The main issue around marketing reportings

This is 2019, and sometimes reporting looks like it has been stuck in the 2000s.

Today more than ever, marketers need to have an instant access to numbers and being able to build their very own reports, to monitor those numbers on a daily basis.

They have never been given so much data, but they are struggling to be able to gather it and actually get actionable insights.

Indeed, it has now become standard to have your data scattered all over your marketing stack: leads in Hubspot, clients in Salesforce, acquisition spendings both in Facebook Ads and Google Ads, payments in Stripe…

And the marketing team does not have anymore the luxury of time to wait for the IT to build dashboards they will actually never use. Either because they do not have the right information in it, or the data lacks of actionnable insights.

Being able to think and make the reports autonomously is key for the marketing teams, and the time you got weekly Excel reports for each data source should be over.

Excel no!

Copying and pasting data from several Excel files to get the big picture is a tremendous loss of time, not to mention the master file size issues and dealing with the history.

It cant take up to one day a week to a marketing manager to do this data gathering job before even trying to get insights from it.

And we have not even talked yet about the integrity of the data and its format. One of the major issues in setting up a data pipeline is to ensure the quality of the data itself. If the data gets corrupted on the transport chain, the analysis will be biased and sources of errors.

Who has never had issues with weird encoding or missing columns when you paste data from a file to another?

But I have some good news: this is 2019, and there is actually a way of solving this issue.

Building a modern marketing reporting stack

A plea for a marketing database

I strongly advocated in favor of setting up a Customer Data Platform (CDP) in my previous article. A CDP is a game-changing tool in the hands of marketing teams, because it enables them to have a clear vision on each customer journey in real time.

But this kind of databases is a really advanced use case for mature companies. Actually, there is a first step on the road to becoming a modern marketing team: the setup of a marketing database.

By marketing database, I hereby mean a unified source of truth for your marketing team, gathering data from your acquisition campaigns, leads, sales and clients.

Gathering this data in one single place thus enables your marketing team to plug a BI tool on it and get a clear overview on its actions and their consequences in terms of business.

NB : Business Intelligence (BI) and/or Data Visualisation (Dataviz) tools such as Google Data Studio, Tableau or QlikView enable you to build reports and get value from your data.

How to setup your marketing database

Whereas this step looks like being the most difficult, there are actually a lot of very good tools to achieve this in a few minutes and this is pretty straight forward.

There are two distinct tools you need to use to get your marketing data into a database:

  • The database itself.
  • The tool which will managed the data pipelines to your database for you: an ELT software (Extract, Load and Transform data).

The main advantage of this architecture (ELT + PostgreSQL DB) lies in its ability to answer one of today’s most struggling issues: scalability. Indeed, this approach lets you add data sources very easily without having to think your entire data architecture again (and thus your reporting stack, too).

Setup a database in two minutes

Since you want to setup a database for your marketing team, I strongly advise you to use a managed database solution, so that they do not have to care about maintenance and backups. AWS RDS and DigitalOcean both have really good offers for this, starting ~ $15 a month.

I strongly advise you to choose a PostgreSQL database since this is a very popular open source database which is widely supported by ELT softwares.

NB: here is my DigitalOcean referral link if you want to get a free $100 credit to try it out.

Move your data to your database in five

Once your database is setup, you now have to choose the pipelines for your data. Segment or Stitchdata (Talend) are very popular solutions for this.

I strongly advise you to start with Stitch, which has a free plan until 5M records a month (whereas Segment is a more expensive solutions for mature companies).

Stich data sources
Stitch data sources

Stitch offers a large variety of connectors to plug your favorite data sources to the destination of your choice (the PostgreSQL database you just set up in your case). The user interface is pretty intuitive and straight forward.

All what you have to do is select your source, choose which data you want to replicate to your DB, and the frequency of replication.

And, that’s it! A few minutes later, all the data from your different data sources are now synced to your database.

Setting up a modern acquisition reporting with Data Studio

The tool itself

Data Studio yes!

Data Studio is a reporting tool from the Google Marketing Suite.

It has several huge advantages:

  • It is free.
  • It is very intuitive for non-technical people.
  • It offers several native connectors to plug your data sources and start building reports, including Google Analytics, Google Ads, Google Sheets, BigQuery… and PostgreSQL.
  • You can easily join data from different sources to build actionable dashboards.
  • Its dashboards are refreshed live.

If you want to see what Data Studio reportings look like, here is a summary of 30 templates for your reports (in French, by Cartelis).

NB : Data Studio has some limitations regarding the joins and the data export, but this is something we can talk about in another article.

The final step: plug Data Studio to your database

Pluging Google Data Studio to your marketing database is really easy since Data Studio offers a native connector for this kind of databases.

Juste click on “Resources > Add a new data source > PostgreSQL”… and that’s it! You are now able to query all your marketing data gathered in your very own database, and build real-time updated reports.

The final stack
The marketing reporting stack

Coming soon: OpenClassrooms case study

A case study case will follow on how we helped OpenClassrooms to setup their marketing reporting stack after they raised 60 millions euro. Keep updated!

OpenClassrooms is a French online education platform for vocational training, providing courses in IT, technology, entrepreneurship, and digital skills. Courses are conducted fully online, through a mix of video resources, online reading, real-life projects and individual mentoring sessions.

How I built a Customer Data Platform based on Google Analytics unsampled data

The goal of this article is to give you a brief overview about how I built a homemade CDP (Customer Data Platform), why I did this, and which technologies I used. Focuses on key steps and detailed technical issues will follow in other articles. I wish you a good read!

Introduction

What is a Customer Data Platform?

 

“A CDP is a marketing system that unifies a company’s customer data from marketing and other channels to enable customer modeling and optimize the timing and targeting of messages and offers.” – Gartner Report

In other words, a Customer Data Platform is a user-centric database handled by the marketing team and updated at each user interaction, based on both cold and behavioural data.

It thus enables a company to get real-time user profiles and address them at the best moment of their customer journey. This is the first step in setting up an omichannel marketing strategy.

To really push the point, we could say that today’s data marketing core value is not in the CRM software anymore, it is in the CDP.

Why Google Analytics data?

There are many ways and trackers to collect behavioural data on your website. I first tried this with Keen.io, which is really a great but quite expensive one.

I was then looking into one of the numerous free alternatives when I found out an article by Dmitri Ilin, explaining how to export data from Google Analytics to Google BigQuery.

This article grabbed my attention because it solves a well known issue in the analytics world: the sampling of Google Analytics data. Indeed, although this solution is probably the most advanced free analytics tool, it samples the data when the stats you want to access involve a small number of sessions (so that you cannot access the detail of each visitor journey).

Retrieving unsampled Google Analytics data thus solves this issue and enables you to access the full journey of each visitor. Moreover and unlike other analytics tools, it natively includes the tracking of Google Analytics events, which are super powerfull and easy to set up. This is why I chose to use this method for the building of my Customer Data Platform.

The way I built the CDP

Retrieving, storing and processing the data

Dmitri Ilin’s article was a great inspiration on how to get Google Analytics unsampled data, but this was only the first step on the road to reach the business goal I set to myself: building a Customer Data Platform and being able to plug my CRM to it. Reaching this goal would thus enable me to have precise information on all my clients and address them at the best time.

Moreover, storing all the logs in Google BigQuery did not looked like the best solution for the objective I wanted to reach. It seems far more appropriate to store raw data in a datalake than in such an expensive database.

Since I am more familiar with the AWS environment, I decided to adapt Dmitri Ilin’s tutorial to it. I created and deployed a Flask web application on Elastic Beanstalk which retrieves the logs and stores them to AWS S3 (which plays the role of a datalake here).

The global architecture
The global architecture

The logs are concatenated daily at 3am with Pandas (very popular Python library for processing data), and stored into time and userID-based directories in my S3 bucket.

How do I identify a user to all its interactions?

This step is the key to have a good analytics tool and complete user profiles.

My userID (Auth0’s one) is passed as a custom dimension in Google Analytics, so that I am able to identify a user when he signs in.

In order to have a complete vision of the customer journey, I then match Google Analytics’ ClientID (unique for each client) to the userID.

This way, all interactions before the signin and after the logout are also stored and identified to the right user.

Calculate the aggregates, score and store the user profiles

Once we have a directory per user on S3, it is quite easy to be able to calculate aggregates and scores for each one of them.

Aggregates are like global stats on raw logs, and it is a crucial information in marketing. It is very useful when it comes to knowing how many times a user has visited a particular web page, or how many times did he triggers a web-based event.

In my case, I set up three aggregates: the list of ClientIDs, the number of login events, and the number of times a user has visited the pricing page.


Preview (on Data Studio) of the CliendID, number of logins and number of times each one of the users visited the pricing page

The scorings are based on the aggregates. The scoring is a way to find out which leads have the more chances to become customers, so that you can focus your marketing efforts on them.

For example, I created an “activity score”, based on the number of logins per week. The rules are the following:

  • if “number of logins per week” is 0: score = 0
  • if “number of logins per week” is >0 and <3: score = 1
  • if “number of logins per week” is >3: score = 2

 

It then enables me to send a dedicated newsletter to the leads with the best activity score, because it looks like they are very interested into my product.

The aggregates and scoring are calculated daily with AWS Athena, with is a querying tool for S3 (using SQL syntax). The output is stored on a CSV file on S3, which is then loaded into a PostgreSQL database on RDS (Amazon Relational Database Service).


Calculate the aggregates and load them to the DB

This database is my CDP: every user has only one profile, which is refreshed daily based on behavioural data, and contains marketing aggregates I am using in order to better communicated with him.

Next steps

The uses cases described here are really basics. There are so much other things we can do with such a tool: segment the user profiles based on the behavioural data, personalize the website content, setting up a multi-touch attribution model… I will definitely have a deeper look to these use cases.

Another challenge is to have the user profile refreshed in real time (not only on a daily basis anymore). I have some ideas about this, and I’ll get back to you soon with more news.

Thanks for reading! If you are interested into this topic, please stay in touch.