data governance

Enhancing data quality with data contracts: A pragmatic approach

Data quality and data management challenges have remained a constant throughout my career—from my early days as a “rocket scientist” developing image-guided missile systems to my data leadership days building analytics platforms in financial service. We can land a person on the moon, again and again (the next human mission to the moon is scheduled for the Fall of 2025), yet data quality challenges remain prevalent.

There is a particular gravity to that sentiment. What will it take to finally innovate through this challenge that has stumped the data world for so long? As we enter the Data Renaissance, could data contracts finally offer a viable solution?

What are data contracts?

Data contracts are formal agreements that define the expectations and standards for data quality, format, structure, semantics, and usage. They serve as a governance tool, ensuring data meets specific quality criteria irrespective of its source or destination. Quality data is essential in today's data-driven world, where poor data quality can lead to significant operational inefficiencies, LLM hallucinations, and decision-making errors.

📖 Dive deeper: Read our 2024 Trends & Predictions to understand the true cost of poor-quality data.

Why do we need data contracts?

Imagine a scenario where two teams, one from Lockheed Martin in Colorado and the other from NASA’s Jet Propulsion Lab, are working on a groundbreaking project—the Mars Climate Orbiter. The year is 1998. Both teams are experts in their fields. Their mission is to navigate the Orbiter into Mars’ orbit, a feat that requires precision and seamless collaboration.

The big day comes on September 23rd, 1999, when the Orbiter is poised to enter Mars' orbit, a moment expected to be the triumphant culmination of years of hard work. Instead, the spacecraft, now off course, skims too close to the Martian atmosphere, leading to its disintegration. After investing $128 million and countless resources, the mission anticipated to yield groundbreaking scientific data suddenly ended in ruins. What happened?

There was an invisible fault line—assumptions in data communication and service level agreements (SLAs). As the project progressed, a critical but unnoticed error crept in. The Lockheed team handed over data using pounds to express force. Meanwhile, the NASA team, assuming a standard metric conversion practice, interprets this data in newtons per square meter. The failure of each team to understand the meaning of that number, coupled with the failure to agree upon a unit of measure, although seemingly minor, is like a hairline crack in a dam—unnoticed and potentially catastrophic.

Mars Climate Orbiter crashed

This preventable and unfortunate incident underscores the importance of establishing clear, robust data contracts in collaborative projects. It's a reminder that assumptions, especially in the realm of data and interpretation, can lead to significant failures. That’s why we need data contracts.

Plenty of stories like these impact businesses every day. While it doesn’t often hold the same level of catastrophic outcomes, data quality concerns, and miscommunications have huge impacts on businesses—from general frustration to significant financial losses. Recent research by Experian makes an eye-opening claim that companies lose 15%-25 percent of their revenue due to poor-quality data. 

Benefits of implementing data contracts

Wherever the accuracy and usability of data and insights are paramount, implementing data contracts should emerge as a key practice. These contracts act as binding agreements on the format, quality, frequency, service level, and management of data exchanged between producers and consumers of data.

Standardization: Data contracts improve standardization and consistency, establishing clear guidelines for formatting and structuring data, leading to standardized and consistent data across various systems and departments.

Integrity: By setting predefined quality benchmarks, data contracts help maintain the integrity and reliability of data. It acts as a quality control measure, ensuring that data is fit for its intended use. The Mars Orbiter would have benefited greatly from data contracts.

Accountability: Data contracts facilitate greater collaboration between different teams and departments by providing a shared understanding and language around data. This works to hold parties accountable for the quality of data they produce or consume, ensuring compliance with organizational data policies and standards.

3 myths about Data Contracts

1. Data contracts require data mesh

While data mesh and data contracts complement each other well, it's important to note that data contracts can be implemented independently of a data mesh architecture. They offer a more straightforward approach to improving data quality, focusing on the immediate environment without needing a complete overhaul of your data architecture.

2. There are no standards for data contracts

While it is true that data contracts are an emerging architectural approach, they are formulated by tried-and-true methods. Data quality, schema evolution, SLAs, and data specifications have been around for a long time, and data contracts are a way to better organize and formalize these standards. 

In late 2023, AIDA User Group and the Linux Foundation AI & Data joined forces to create Bitol.io. Bitol has defined an open standard for data contracts called the Open Data Contract Standard. 

Data contracts standard

So, yes, standards exist and continue to evolve. For example, Paypal recently open-sourced its specification, which has evolved into an open standard called Open Data Contract Standard hosted by Bitol. Additionally, Google’s Protobuf and Avro are standards that are commonly used in data contract implementations. Chad Sanderson, founder of Gable.ai, has written extensively on using Protobuf.

3. Data contracts are just data specifications and tests

Having published specifications, descriptions, and tests is critical to almost every data initiative. And yes, they are a technical vehicle for data mesh and data testing. Yet, they are often underdeveloped and under-resourced. 

Data contracts do help fill that gap and may entail additional effort. Pay me now or pay me 10x later—ask the Lockheed and NASA teams which they would prefer! Beyond the testing aspect, data contracts provide visibility, management of ongoing data evolution, ownership, and accountability.

The quick-win approach

Implementing data contracts doesn't require an overhaul of your existing data architecture, like adopting a data mesh. However, integrating a data contract can be highly beneficial, so consider using a quick-win approach.

A quick win is an immediate improvement that delivers tangible business value and is highly visible. The limited scope and low complexity contribute to the ease of implementation while shortening the timeline. 

Considerations for quick wins

When selecting a use case to implement, look to identify those that have low complexity and high business value. Warning: these may not be easily identifiable, in part, because others may have already picked the low-hanging fruit. The ideal quick-win use cases go beyond just low complexity and high business value; They can often be extended or adapted to additional lines of business or provide a repeatable process for future use cases. In short, great quick wins contribute to the overall data strategy.

Here’s a summary of practical advice for effective implementation:

1. Start with a measurable, short-term goal. 

Demonstrate the value of data contracts by delivering a measurable result in less than three months. This approach helps gain management support and paves the way for broader implementation and funding. The business value of your initiative is only as good as your ability to earn buy-in from stakeholders.

2. Be conservative on your scope. 

You want to show value, which can lead to being overly ambitious.  Instead, begin with a manageable scope and focus on specific data challenges or inconsistent results in your organization that map directly to business processes. Target one or two data sources that have not been adequately analyzed or aggregated. 

3. Try to stay out of the weeds. 

I often ask myself, “Where is the devil?”  We all know he’s in the details, but while attention to detail is crucial, avoid getting bogged down in minutiae. Delegate tasks effectively and maintain a balance between detail-oriented and big-picture perspectives.

4. Leverage quick wins as soon as possible.  

Early successes with data contracts can earn you credibility, attention, and additional budget. You won’t always be able to identify the ideal quick-win. However, it's essential not to let these quick wins distract you from the long-term strategic goals. 

Pro Tip: Avoid the paradox of quick wins. A series of quick wins is not a data strategy but a good start.

Key takeaways about data contracts

In summary, implementing data contracts can improve analytics by ensuring that self-service analytics tools like ThoughtSpot use the highest quality data and maintain your service level agreements with our business stakeholders. For both upstream data producers and downstream data consumers, data contracts are a powerful tool for improving data quality, whether or not they are part of a larger data mesh implementation. 

They offer a structured, standardized approach to managing data quality, enhancing collaboration, and ensuring compliance. Organizations can significantly improve the quality and value of their data assets by starting small, focusing on achievable goals, and gradually expanding their application. Remember, the journey to excellent data management is incremental, and each small step can significantly improve data quality and organizational efficiency.

Looking for a BI and analytics solution to help your team realize the true value of your data? ThoughtSpot AI-Powered Analytics helps every user discover the insights they need to make an outsized impact. See how it works—join a live demo.