Skip to main content
Log inGet a demo
Get a demo

Twelve reasons to put the data warehouse at the center of your tech stack

Learn actionable ways to improve your tech stack by sourcing data from your warehouse.

Adam Greco.

Adam Greco

March 26, 2025

12 minutes

Twelve reasons to put the data warehouse at the center of your tech stack.

Almost every tool in your tech stack needs data to run. Whether you’re using a CRM like Salesforce, advertising over paid media, performing digital analytics, or sending customer communications, your SaaS tools need data to work, and they offer lots of ways to get data into them.

Companies often send data to each tool via one-off solutions. The one-off approach works fine initially but can quickly turn your tech stack into a complicated spiderweb of interdependent data pipelines and SDKs. You wind up with many tags on your website, sending data to many tools, and the data in one tool never matches the data in another.

Image.

A better way to manage data in your tech stack is to adopt a “hub-and-spoke” approach in which you send some or all data events to your data warehouse first and then route data (and audiences) to downstream SaaS products. In this post, I’ll explain this emerging approach and share twelve specific ways it can benefit your organization.

Improve data collection via a hub-and-spoke data warehouse approach

In today’s digital world, data collection has become critical. Understanding customer journeys and experiences is at the top of most organizations' minds, and since most customer interactions occur digitally, this means collecting a lot of data. Organizations want to optimize data collection and provide downstream SaaS products with the best possible data. However, many organizations have deployed multiple SaaS products over the years, each with its own proprietary data collection method. The result is a complex set of disparate data pipelines that can create disjointed data collection. The following are examples of how hydrating SaaS products partially or entirely from your data warehouse can improve data collection.

Data enrichment

When you collect event data, the data exists only in the context of that particular moment. Even if you know who the user is completing the event, you often don’t have access to historical information about that user. The best you can do is pass the event along with any metadata available. But if you send the event data to the data warehouse, millions of rows of customer data are at your disposal. Once you know the user’s unique identifier, you can look up data related to that user and enrich the events they are performing. From the data warehouse, a Composable CDP (like Hightouch) can send enriched events and properties to hundreds of downstream SaaS products.

For example, let’s say that Adam is logged into an eCommerce website and looks at a mobile phone. In a digital analytics context, that would constitute a “Product View” event and have an associated property of the SKU of the product viewed. This data could easily be collected by a digital analytics product tag and sent directly to the digital analytics product. However, if this event were sent to the data warehouse, additional information about Adam could be sent to the digital analytics product. The data warehouse may know where Adam lives, if he is married, how many kids he has, his household income, and his lifetime value. All of this information could be passed with the event to improve the types of digital analyses that can be conducted. And because this information isn’t being exposed to code debuggers, it can even include PII if your privacy and legal teams allow it.

Image.

In addition, the data warehouse likely has information about the product SKU Adam viewed from the product catalog, such as inventory levels, brand, etc. This information can also be passed to the digital analytics product instead of analysts relying on lookup files that change retroactively.

If your organization has offline events that might be interesting alongside online activity, instead of sending them to multiple SaaS products, they can be sent to the data warehouse, and a Composable CDP can route them to various SaaS destinations. These offline events could include in-store purchases or in-person conference leads. Metrics from advertising networks, such as impressions, clicks, and costs, can also be sent to SaaS products to help compute return on advertising spend (ROAS).

Computed events and properties

Another advantage of routing event data through the warehouse before sending it to SaaS products is creating new events and properties based on existing data. I think of these as “computed” events and properties since they use formulas to create new events and properties based on event data flowing in and existing data in the warehouse. In Hightouch, these “computed” events and properties are called Traits. Hightouch Traits can use any of the following functions:

Image.

Using these functions, you can compute new properties, such as the most recent product category viewed, last onsite search term used, number of onsite searches conducted, etc. Once you define the function, data can be sent to downstream SaaS products. While similar functionality may exist in some individual SaaS products, the advantage of using the data warehouse and Composable CDP is that these new events and properties can leverage all information in the warehouse. For example, if you have a physical store, you could identify the last product purchased in person (offline) and send that to a SaaS product as a computed property. The possibilities are infinite.

Performance improvements

Event enrichment and computed properties in the data warehouse or Composable CDP can help improve digital product performance by reducing the work your website or digital product needs to complete on the client side. Your digital products can slow down if you do too many calculations or complex things on the front end. Speed is essential, and slowness has been known to lower engagement and conversion rates. Many digital teams, particularly those who have invested heavily in SEO optimization, where page speed is vital, are eager to deprecate multiple SDKs and event instrumentation from their marketing and web apps. If you can reduce the number of SaaS tags by collecting core events into your data warehouse and then sending them to downstream tools rather than implementing a separate SDK for every downstream tool on your website, you can drive performance improvements.

Data backfills

As business priorities change, you may encounter situations where data you have been collecting in the data warehouse becomes helpful to see in a SaaS product but isn’t available. If you populate data from the warehouse, you can backfill this data using timestamps to your SaaS products. Backfilling allows you to use the data immediately and not wait weeks for new event data to be collected from the front end.

Event curation

Many SaaS products charge customers based on events collected. The more data you send, the more you pay. Within digital products, all events are not created equal. Some events are critical, and others are just noise. For example, in digital analytics, you may not care how often people view your careers page, but if it gets a lot of traffic, you are paying for that traffic.

However, if you route data through the data warehouse, you can choose what data you send to each SaaS product. If there are events that are low value, you can skip sending them downstream to save money. This could also be an effective way to prevent "bot" datat from polluting your digital analytics and other SaaS products!

Improve data quality via a hub-and-spoke data warehouse approach

Collecting customer event data is meaningless if the data quality is poor. If your internal stakeholders don’t trust the data, they won’t use it. The following are a few examples of leveraging the data warehouse to improve data quality.

Data governance

Modern data engineering teams often apply software development practices to managing the data warehouse. These software development practices include well-documented models, lineage, and dependencies. Additionally, versioning and reviews ensure the production environments are stable and reliable. Few companies apply the same rigor to event instrumentation, resulting in many duplicative and conflicting efforts to maintain data dictionaries and keep track of downstream data dependencies. Leveraging the investment in data governance around the data warehouse reduces the reliance on less robust governance, which is common in tracking implementations directly to SaaS products.

Improved identity resolution

Many SaaS products have some form of identity resolution to recognize if the current user is one they have seen before. Identity resolution can be important for marketing attribution and deduplication. If a SaaS product only has visibility into a subset of the data about a customer, it is nearly impossible for identity resolution to be accurate.

However, if event data is routed through the data warehouse, customer identifiers can be resolved against many more customer data sources to improve identity resolution. Triangulating among multiple customer data sources provides a much more accurate identity resolution. Once identities are resolved, the resolved ID can be sent to downstream SaaS products to improve attribution, de-duplication, etc.

Reduce user property identify calls

When implementing events using SDKs directly to SaaS tools, they are typically accompanied by many identify calls to ensure that a known user ID is provided and update the SaaS products' user properties. There are many scenarios where user property updates can be redundant, when the value already exists in the SaaS tool, or where properties may be overridden. Updating core user properties directly from the data warehouse is a more reliable and streamlined way to ensure the reliability of values in various SaaS applications.

Improve data activation via a hub-and-spoke data warehouse approach

Many tech stacks exist mainly to turn customer data into value for the organization. Value creation is often done by building customer audiences and activating those audiences through messages, experiences, or offers. The following outlines ways to use your data warehouse to improve customer activation.

Improved digital advertising performance

Most organizations strive to improve the quality and cost-efficiency of digital advertising. One of the main ways to optimize advertising is through advertising network conversion APIs (CAPI). Data warehouses can help your organization improve its use of CAPI and can decrease your cost per acquisition. How? Today, most ad platforms use powerful AI algorithms to determine who to target, what ad variation to show, and how much (or how little) to bid on ad placements. The better these algorithms work, the better your ads perform, meaning you can drive more conversions at a lower cost. The AI algorithms rely heavily on the signals you share through Conversion APIs to perform well. The events you send provide AI training data that helps the algorithms learn and make better decisions on optimizing your campaigns.

The problem, however, is that each signal needs to be matched back to a known user for the AI algorithms to “learn” from the data. Therefore, you need to hydrate your events with customer information they can match back, like hashed emails, phone numbers, and residential addresses. But that data isn’t typically attached to your event tags and may be spread across multiple systems, including your POS, shipping system, loyalty program, etc. But if you use a Composable CDP and pass your events through the warehouse, you can easily join multiple datasets with each event before passing them to the ad platform. You can even use tools like Match Booster to enrich your events with third-party data if you don’t have any of your own to add. Enriching your data with trusted third-party data increases your match rates for every Conversion API event, provides more training data to the AI algorithms, and improves the efficiency and effectiveness of your digital advertising performance.

Image.

Another interesting benefit of populating event data from the warehouse is respecting user consent requests. When you have multiple SaaS products, knowing what data can and cannot be collected for each user can be difficult. However, suppose you store all the consent approvals in the data warehouse. In that case, Composable CDPs like Hightouch can consider these consent flags globally before sending user data to downstream SaaS products. Instead of managing consent in multiple SaaS products, it can be managed globally and automatically prevent marketers from accidentally using unconsented customer data.

Image.

As the threat of third-party cookie deprecation waxes and wanes, many companies aim to rely less on client-side tracking and any analytics and business use cases that browser and OS changes and ad-blocking technologies can implicate. By definition, leveraging the data warehouse pushes SaaS applications to a more reliable and robust data source that is not directly affected by cookie deprecation.

Improve executive reporting via a hub-and-spoke data warehouse approach

One final advantage of populating SaaS data from the warehouse is improved KPI alignment and executive reporting. A recurring problem in SaaS is that KPIs in each SaaS product are never the same. For example, suppose you use a SaaS digital analytics product. In that case, it is highly unlikely the number of orders in the SaaS product will match the number of orders in an executive BI dashboard. The difference is often due to different event collection methodologies. Differences like this can erode trust in one or both systems; if they happen too often, people can lose their jobs!

But suppose you populate data from the data warehouse to your SaaS products (in this example, digital analytics). In that case, the numbers in the SaaS product and the BI dashboard should be the same since they both have the data warehouse as the source.

Final thoughts

Over the past few decades, organizations have become accustomed to using multiple data collection methods for all SaaS products. Over time, this has created various issues ranging from incomplete data, inaccurate data, disjointed data, performance issues, etc. In this post, I shared a different hub-and-spoke approach in which multiple SaaS products are populated via the centralized data warehouse. While there may be more, the above details twelve advantages of this new hub-and-spoke approach to SaaS product data population. As I see more and more organizations move to this approach, you should consider putting the data warehouse at the center of your data stack as you re-platform or re-think future tech stack strategies.

More on the blog

  • Friends don’t let friends buy a CDP.

    Friends don’t let friends buy a CDP

    How spending the first half of his professional career at Segment drove Tejas Manohar to disrupt the 3.5 billion dollar CDP category.

  • What is a Composable CDP?.

    What is a Composable CDP?

    Learn why Composable CDPs are seeing such rapid adoption, how they work, and why they're replacing traditional CDPs.

  • What is AI Decisioning?.

    What is AI Decisioning?

    Find how AI Decisioning can autonomously run experiments to deliver the best possible customer experiences at a scale not achievable by human intervention.

Share

Sign up for our newsletter

It takes less than 5 minutes to activate your data. Get started today.