Like many of us in high school, my kids have recently discovered Pink Floyd’s “Another Brick in the Wall (Part 2).” It’s no surprise that teenagers yearning for adulthood identify with a song that tells them that “we don’t need no education.”
They believe that in a few short years they’ll be free from the restraints and restrictions of an authoritarian system that they see limiting their freedom.
Of course, we all know that’s not really what happens. In the world of data and analytics, control and structure is important for many reasons—enforcing security, maintaining privacy, and ensuring that meaningful conclusions are drawn by end users.
Roger Waters didn’t think he was singing about data governance, but he was.
But the world of data governance is changing, enabled by modern technologies that can put more power in the hands of end users without giving up all control. A few years ago there were only two broad options—control everything in IT, or give away the data to individual departments and users.
Today, we have more options. Let’s look at some of the ways data governance is changing, and a few ways it’s not.
1 - Data Quality and Cleansing
One of the most common misperceptions I see is that self-service analytics means handing over every aspect to end users.
That’s simply not true.
In many ways, the traditional jobs of data cleansing and data quality don’t change significantly in a self-service environment. Data cleansing should still happen prior to sharing data with analysts or end users.
Official definitions of metrics still come from a centrally governed and managed process.
Instead of enforcing data quality on a per-report basis—a process that doesn’t scale well—analysts are increasingly focused on quality on a per-metric or per-attribute basis. This curated information is then shared with end users via the self-service platform.
Making sure that the data is correct, meaningful, and timely is still the purview of analysts. Instead of controlling access, analysts are able to focus on meaning.
2 - Total Control is Not the Solution
As the recognition of the value of enterprise data has grown over the past couple of decades, organizations have developed policies and processes to treat data like other valuable assets. Data was secured, meted out to users on a need-to-know basis, and centrally controlled.
Unlike traditional assets, though, data becomes more powerful when more people have easier access to it.
In fact, tightly controlling access to data often has the effect of reducing the value of data. Access to traditional assets is needed to achieve a goal—try buying hardware without access to money—but data isn’t required to make decisions.
It’s only required to make good decisions.
The trick is to balance a centralized, organization-wide understanding of the meaning and visibility of information without raising the barrier to access so high that the users who would benefit from it decide to live without it.
Technologies like cloud computing and user experience lessons learned in the consumer space with products like Google and Amazon are finally enabling that balance, and policies are starting to catch up by focusing on centralized definition and security controls, but lowering the barrier to ad-hoc access to information significantly.
3 - Trust
Trust is critical to any self-service data project. One of the most common objections I hear from executives evaluating self-service data products is that they’re hesitant to trust their end users with more open access to information.
But trust goes two ways—end users need to trust that the data represents what they think it does, while the traditional owners of data need to trust that end users will draw the right conclusions from the data.
There’s a common misunderstanding that a self-service data strategy means giving end users access to data earlier in the data lifecycle, potentially before it’s been through a cleansing or quality process. In fact, it’s just a different process. Analysts focus on sharing well-defined data via a self-service platform instead of building one-off reports.
The governance process is largely the same, resulting in more timely, easier access to meaningful information.
4 - Security and privacy
The interest in self-service access to data and data discovery is growing rapidly. Actually achieving either, though, is as much about processes and policies as it is about technology. Even with self-service solutions in place, people shouldn’t have access to datasets they shouldn’t be able to see (security), and data elements that might identify individuals should be masked from people who shouldn’t be able to see it (privacy).
Traditionally, analysts will typically give data to people with the right to see it on request. Data discovery technologies merely make this easier, and retaining central visibility over who is accessing the data and when they’re using it allows for centralized governance in the same way that Amazon can see your spending habits without you ever talking to them.
A larger challenge is the meaning of information. Analysts have historically shared information only after looking at raw data and ensuring that the meaning is obvious to end users.
Solving this meaning problem requires a combination of policy (trusting end users with the information they need), process (ensuring that any information an end user sees already has business logic applied), and technology (choosing solutions that allow end users to take independent action without losing centralized visibility to what they’re doing).
5 - Closing the loop
In a traditional data management program with centralized control of everything, it can be a major challenge connecting the owners of the source systems and data stewards with the end users of the information.
They’re often at opposite ends of the value chain.
It’s the data warehousing, business intelligence, and data governance folks who sit in the middle and translate between the two groups who may not even know each other.
Fortunately, modern tools and processes allow for greater visibility for everyone. This lets analysts and others responsible for governance to focus on moderating the conversation instead of controlling it.
So what does it mean?
There’s no question that data governance is changing rapidly, and that understanding the new landscape is critical to a successful data strategy.
It’s been my experience, however, that pretty much everyone is aware of this. The bigger challenge is understanding what’s not changing, and leveraging the lessons we’ve learned the hard way to ensure privacy, security, and quality while taking advantage of modern processes and products to free access to information.
My kids may enjoy listening to Pink Floyd while dreaming of the next stage of their lives, but I’m queuing up Freedom by Pharrell Williams.