GA4 & Reverse ETL: activating your behavioral data in BigQuery
Learn how you can start leveraging your data in BigQuery to enrich your customer data in Google Analytics.
Glenn Vanderlinden
April 20, 2022
7 minutes
If you’re in the marketing analytics industry, there’s no denying that it’s an exciting time. The space is going through its most fundamental shift in years. Google Analytics, the most commonly known marketing analytics platform, is moving towards a new version called Google Analytics 4 (GA4). This version of GA is built around an event-based tracking model and features a more granular way of measuring user-generated events in web and app environments (called streams).
While event-based tracking is not a new concept, and technology vendors such as Mixpanel and Amplitude have employed these concepts for a very long time, Google still reigns as a prominent force in the analytics space—whereas other tools are adopted and primarily used by product teams and power users. The objective of the article is not to compare event-based analytics products, but you can enjoy some further exploratory reading here.
The most obvious implication of GA4 is the mass migration of any and all analytical and business users currently operating in GA–and that’s a heavy lift. In fact, it’s commonly referred to as the great analytics reimplementation, and this change will impact millions of organizations.
Even though this sounds like a daunting task, it’s worth it. Especially if you’re thinking about building a holistic data warehouse-first approach in which marketing or behavioral data is only part of the puzzle.
The Real Superpower of GA4
The new opportunities of GA4 lie in the combination of its event-based measurement model, the ability to combine data from different streams (app & web) into a single property, and embedded AI capabilities. Things marketers have asked about for years and have struggled with in the GA3.
However, the real superpower of GA4 lies in its native capability to stream data straight into BigQuery, Google’s version of an analytics-oriented data warehouse hosted in Google Cloud.
Here’s a quick overview of the new infrastructure model:
Note that any server-side integrations are deliberately omitted for the sake of simplicity.Marketers and analytics people alike are able to use the new GA4 user interface to build reports and dissect the data up to a certain extent. This so-called limit is determined by the license tier of each account. All data (including raw metadata invisible in the UI), regardless of the configuration in the UI is streamed into BigQuery (license tier still applicable). The setup and configuration of the link between BigQuery and GA4 is straightforward and the GA4 data schema in BigQuery is well documented.
Enrich Your Warehouse with Behavioral Data
The native connectivity between GA4 and BigQuery proves to be a major asset for organizations looking to enrich their customer 360° view in Google Cloud with behavioral data—or for organizations that are looking to build that overview in the years to come.
In summary, GA4 allows you to add a behavioral data stream to your data warehouse. From a more technical perspective, GA4 provides a set of software developer kits (SDKs) that allow you to bring behavioral data to your data warehouse.
Data Activation: Close the Loop with Reverse ETL
Here’s the big win. After you enrich your data with a behavioral data stream and unify your customer data in your warehouse, you can put it back to work. Using Reverse ETL, data can be easily synced to any type of destination. Here’s a quick map of this scenario:
Data activation with Reverse ETL yields multiple opportunities in the context of GA4:
Data Enrichment: Your behavioral data can now be shared with and leveraged by various data & marketing destinations like Salesforce, Hubspot, or Slack. For example, you’ll be able to personalize your marketing campaigns in your ESP (i.e. Braze) with the specific products a user saw while browsing your website.
Better Audiences: Audiences created in Hightouch can now be built leveraging more granular behavioral signals. An example of this could be an audience for cart abandoners syncing to Facebook, Google, and your ESP for multichannel retargeting campaigns. By creating audiences centrally in Hightouch, you no longer have to worry about creating and maintaining each audience across multiple destinations. In short - richer audiences are managed centrally and activated across your stack.
Resource Savings: Adopting a Reverse ETL solution like Hightouch means you’re outsourcing integrations and API management. This allows your engineering team to focus on what matters for your business rather than maintaining integrations with 3rd party solutions.
How Does this Compare to a CDP?
While GA4 does now handle the collection of behavioral events, as well as the syncing of those events to the data warehouse (BigQuery) for analysis where it can then be activated through a Data Activation Platform like Hightouch, there is still a missing component of ID resolution in the warehouse.
The purpose of identity resolution is to stitch together all the various customer events and provide your teams with a singular view of a customer. In addition to the need to stitch events to users, Google does not allow for PII (only user_id variants), meaning that behavioral data will need to be stitched to PII data once it's in the data warehouse. This identity resolution will need to be built by the data engineering team in charge of the data warehouse, which actually is a manageable task.
Reverse ETL solutions make the most sense in companies where there has been investment in bringing data to the warehouse and making it the source of truth through enrichment efforts like ID resolution. If this is the case, then Google Analytics 4 now allows you to add a new behavioral data stream to further enrich your destinations, your user profiles, and the calculated traits that come with it.
Send Enriched Data Back to Google Analytics 4
Google Analytics 4 makes it easy to extract and enrich your customer data within your BigQuery data warehouse. A common enrichment use case is to marry online behavior like product views with offline events like in-store purchases.
Marketers and business analysts want to see these offline events in the context of their existing reporting dashboard, Google Analytics. Google developed the Measurement Protocol API so developers can send data to GA4 to enable analysts to:
- Tie Online to Offline Behavior
- Measure interactions both client-side and server-side
- Send events that happen outside standard user interaction (e.g., offline conversions)
Unfortunately, the Measurement Protocol becomes one more API integration for engineers to develop and maintain.
Enter Hightouch’s Google Analytics 4 Integration
In addition to the more than 200 marketing, sales, and business applications, Hightouch now supports Google Analytics 4 as a destination to activate your data.
Connect your data source — BigQuery or any of the data sources Hightouch supports — with Google Analytics 4 in minutes. Start syncing your data as often as you need. Your analysts will thank you.
For more information, check out our Google Analytics 4 Documentation.
About the Author
Glenn Vanderlinden is co-founder and solution architect at Human37. Their mission is to build a world where customers get the best experiences from brands. Human37 does this by helping brands build their customer data strategy and their technology stack to activate their data.