The Warehouse Sync Logs feature writes sync metadata back into your data warehouse. It makes per-row information from the live debugger available in your warehouse so that you can perform complex analysis, rather than just inspect syncs row-by-row.
When you enable Warehouse Sync Logs, Hightouch logs a corresponding row for every row processed during the sync. This includes the row's status and any errors from processing the row. You can then explore these logs using SQL or BI tools you use on top of your warehouse.
For example, you can:
Categorize all the errors in your sync using regular expressions and find unexpected errors.
Filter out previously failed rows from your model using a JOIN.
Aggregate the sync history to see what rows are changing the most. Flapping rows can be a sign of data integrity issues.
Visualize row changes in your models over time. For example, you may be interested in seeing how targeted users in an ad campaign changed over the campaign duration.
Hightouch writes sync logs into three tables within the hightouch_audit schema:
Changelog: This table contains a row for every operation performed by Hightouch. It includes the result of the operation and any error messages from syncing.
Snapshot: This table contains each row's latest status in your model. The information is similar to the Changelog table, but since it contains the latest status, it's easier to query for some use cases.
Sync runs: This table contains a log of all the sync runs. You can JOIN the changelog and snapshot tables to this table for more information on when the sync occurred and how it was configured.
Information across all syncs is written into these same three tables—you can differentiate which rows were part of which sync using the sync_id column.
See the detailed schema section for a detailed description of the available columns.
Enabling Warehouse Syncs logs requires you to enable the Lightning sync engine first. Hightouch supports using Warehouse Sync Logs with the following sources:
The user you used to connect your source to Hightouch must be able to write into the hightouch_audit schema. You shouldn't require any additional permissions once you've set up the Lightning sync engine.
The following example queries are written for Snowflake, but you could create similar queries for other sources. Check out Hightouch's dbt package for more use cases.
This SQL tracks when users enter and exit a model. It's particularly useful when used with Customer Studio audiences and visualized in a BI tool.
with details as (
select
model_name,
row_id,
op_type as type,
started_at as timestamp,
lag(op_type) over(partition by model_name, row_id orderby started_at) as lag_type
from hightouch_audit.sync_changelog c
join hightouch_audit.sync_runs r on c.sync_id = r.sync_id
where op_type != 'changed'orderby model_name, row_id
)
select
row_id as user_id,
model_name as audience,
type,
timestamp
from details
where (lag_type != type or lag_type is null)
orderby model_name, row_id, timestamp
This SQL finds all current (most recently synced rows that didn't fail) across all models. It's particularly useful to find all memmers for audiences created in Customer Studio.
with model_names as (select distinct
sync_id,
model_name
from hightouch_audit.sync_runs
)
select
model_name,
row_id as user_id
from hightouch_audit.sync_snapshot s
join model_names r on s.sync_id = r.sync_id
where s.status != 'failed'qualify row_number() over (partition by user_id, model_name order bynull) = 1
order by user_id
Hightouch writes to the sync_changelog,sync_snapshot, and sync_runs tables after each sync.
If you've enabled audience snapshotting, you'll also find a hightouch_planner.audience_membership table in your warehouse.
Refer to the audience snapshot docs for its detailed schema.
This hightouch_audit.sync_changelog table is an append-only log of all changes across all sync runs. If the same row is synced in multiple sync runs, it has multiple entries in this table.
COLUMN
DESCRIPTION
sync_id
The ID of the sync
sync_run_id
The ID of the sync run
op_type
Whether the row was added, changed, or removed relative to the last run. This is computed by Hightouch when planning the sync run
row_id
The value of the row's primary key as defined from the model
status
Whether the row was successfully synced into destination. They value may be: succeeded - the row was successfully synced, failed - Hightouch attempted to sync the row, but it failed to sync, or aborted - Hightouch planned to sync the row, but didn't attempt to sync. This may happen if the sync may have been cancelled, or the sync encountered a fatal error that terminated the run early.
failed_reason
If the status is failed, this field contain a string describing why the row failed to sync.
fields
A JSON object of the raw data from the model that is synced into the destination. Note that this is the raw data from the warehouse, not the payload that Hightouch sent to the destination. This column has limitations in Redshift (see FAQ for details, below).
split_group
(Optional) The split group name. If no syncs are using splits, this column isn't created.
This hightouch_audit.sync_snapshot table stores the current status of each row in the most recent sync run, even if the row wasn't synced in the most recent run.
After each run, the old statuses for the sync are dropped and replaced with updated statuses.
COLUMN
DESCRIPTION
sync_id
The ID of the sync
op_type
Whether the row was added, changed, or unchanged relative to the last run
row_id
The value of the row's primary key as defined from the mode
status
The status of the row. See the sync_changelog.status description for a list of possible statuses
failed_reason
If the status is failed, this contains a string describing why the row failed to sync
fields
The fields from the model for this row. See the sync_changelog.fields description for more information
split_group
(Optional) The split group name. If no syncs are using Audience Splits, this column isn't created.
This hightouch_audit.sync_runs table stores general metadata information about each sync run. You can join the sync_changelog and sync_snapshot tables using the sync_id column.
COLUMN
DESCRIPTION
sync_id
The ID of the sync
sync_run_id
The ID of the sync run
primary_key
The primary key column of your sync as defined on the model attached to the sync.
destination
The destination type, for example, Salesforce or Braze
model_name
The name of the model attached to the sync
model_id
The ID of the model attached to the sync
status
The status of the sync run. This will be either succeeded or failed. In general, the per-row results of the sync are a better indication of status.
error
The sync-level error if the sync terminated early
started_at
When the sync run started
finished_at
When the sync run finished
num_planned_add
The number of planned adds.
num_planned_change
The number of planned changes.
num_planned_remove
The number of planned removes.
num_attempted_add
The number of planned adds that were actually attempted.
num_attempted_change
The number of planned changes that were actually attempted.
num_attempted_remove
The number of planned removes that were actually attempted.
num_succeeded_add
The number of planned adds that were successfully synced to the destination
num_succeeded_change
The number of planned changes that were successfully synced to the destination
num_succeeded_remove
The number of planned removes that were successfully synced to the destination
num_failed_add
The number of planned adds that were attempted, but failed to get synced into destination
num_failed_change
The number of planned changes that were attempted, but failed to get synced into destination
num_failed_remove
The number of planned removes that were attempted, but failed to get synced into destination
The performance impact of enabling Warehouse Sync Logs is low since it reuses data already present from the Lightning sync engine. Hightouch only writes the rows after Hightouch syncs to the destination, meaning there is no effect on destination throughput.
Pruning entries in the history tables is safe. Doing so doesn't affect future syncs, though the deleted rows won't be rewritten into the history tables.
Because Redshift does not support strings longer than 65,535 bytes,
it is not possible to store JSON representations of arbitrary length in the warehouse sync logs. Hightouch attempts
to store model data in the fields column of sync_changelog and sync_snapshot, but if the model contains either
long strings or a large number of fields, some fields may be truncated.
Truncated values in sync_changelog.fields and sync_snapshot.fields do not mean that values were truncated
when Hightouch synced data to the destination; it's purely a limitation of the warehouse logs.
Ready to get started?
Jump right in or a book a demo. Your first destination is always free.