Reminds me a lot of phenomenal Hadoop and Kerberos: Madness beyond the gates[1], which coincidentally saved me many times from madness. Thanks Steve, I can't fathom what you had to go through to get the cursed knowledge!
Why? SAP holds the most important data for companies that use it, but it's notoriously difficult to replicate this data consistently into a data analytics platform (think Snowflake, Redshift, etc...).
Couple of companies specialize in the SAP replication, but it's hard to validate the correctness of the replicated data, because:
- the SAP data is changing continuously and rapidly
- there are hundreds of tables and TBs of data
Usually it's the consumers of data downstream who notice that the data just "doesn't feel right".
Tracelake adds a validation layer on top of the SAP to X replication, which periodically compares the data between source and target and informs you about any missing / incorrect data, so you can tackle data quality issues proactively.
1 - https://steveloughran.gitbooks.io/kerberos_and_hadoop/conten...