As a data engineer, you should consider how changes can be done in a non-breaking way.
A non-breaking change to a data model is something that won't have any down stream impact, like adding a column or re-ordering columns. Adding columns only impacts down stream models when they do SELECT * statements, which is one of the reasons why it's best practice to avoid them.
On the other hand, a breaking change will have significant impact on down stream models and usually requires expensive back-fills. An example of a breaking change is modifying a WHERE statement which changes the cardinality of a table.
If you're working at any significant scale where it's expensive and time consuming to back-fill many tables, consider whether or not a change can be done in a backwards compatible way and how expensive a breaking change would be. If it's not very expensive to make a breaking change, it can be easier to maintain since all models are kept up to date without any legacy, so there's always a trade-off.
Even if it's not too costly to back-fill many tables, it can be time consuming communicating breaking changes to stakeholders or validating all data consumers are up to date. Arguably, this is even more challenging than the technical/compute costs of breaking changes.
As a software engineer, it's commonplace to consider whether changing an API or a database model should be done in a breaking or non-breaking fashion. I believe this best practice should be adopted by data teams as well.
That's why we designed
#SQLMesh to provide automatic detection of breaking and non-breaking changes by analyzing your SQL queries. This allows you to assess the impact of your changes at compile time and understand potential costs (both compute and organizational) before you finalize your changes.
tobikodata.com/automatically…