How can AI & ML strengthen data governance practices?
27 April 2022
Over the past years, Artificial Intelligence & Machine Learning technologies have been on the rise throughout several industries. While most of us probably think of smart assistants (Alexa, Siri, or Google Assistant), website chatbots, or (why not?) self-driving cars when talking about AI & ML, we shouldn't lose sight of AI & ML developments in data governance. What developments can strengthen data governance practices, and how will they most likely do that? And what is already possible today? Let me share some insights from my experience.
First things first: What exactly are AI & ML?
AI, or Artificial Intelligence, is a branch of computer science focused on developing machines that can perform tasks that would typically require some form of human intelligence. While Artificial intelligence is a rather vague term relating to many realities, most of what is dubbed "Artificial intelligence" actually refers to "advanced analytics."
ML, or Machine Learning, is a branch of AI dedicated to using data and algorithms to mimic how humans learn. What makes ML unique is that it can automatically improve the accuracy of such processes through experience and the use of data.
Three practical examples of how AI & ML can strengthen data governance practices
1. Automated data classification
Already today, several data catalogs provide Machine Learning and so-called Named Entity Recognition (NER) capabilities to identify sensitive information automatically. As ML algorithms learn from human interactions, they help refine data classification processes and improve inference fidelity. Their very nature also reduces the need for further human interactions.
However, I should mention that all of that comes with an important footnote. The outcome of the entire process still depends largely on the quality of your data. With success ratios that may still dip below 20%, AI and ML still have to go through a learning curve to make automated data classification a standard practice. Nevertheless, even though a manual validation is still required to accept or reject data classification proposals, I consider this a big step forward compared to the manual classification procedures most companies are using today.
2. Automated extraction of terms, definitions, policies, and controls from documents
AI and ML can be used to automate the creation of a set of business terms that organizations can then import into an enterprise data catalog to govern their data. Today, for example, it is feasible to pull insights from texts and convert legal documents to plain language that non-legal professionals can understand, an AI discipline known as natural language processing (NLP ).
And while the automated extraction of terms and definitions is possible, I have some questions regarding AI’s and ML’s ability to extract policies and controls - unless they are explicitly and clearly mentioned in the processed documents. Further technological developments are needed to increase the quality and reliability of such extraction results delivered by AI and ML algorithms.
3. Automated recommendation of additional data sets of interest
Based on your previous interactions, most data catalogs can predict which data sets you will most likely be looking for. As a result, they can even recommend additional data sets that may be of interest to you.
As data consumers from different communities search the data catalog for assets that meet their needs, ML algorithms leverage active learning algorithms to improve search results and recommendations for individual users. They do that by incorporating previous user selections and actions to refine predictive models iteratively.
Where are AI & ML in data governance today?
In my experience, automated data classification and user recommendations represent the most relevant use cases of AI and ML in data governance today. For example, Machine Learning models can recognize and identify which metadata is stored in a specific data column, and automatically schedule and maintain metadata scans. And a data catalog can exploit Machine Learning algorithms to understand usage patterns, correlations between individual user requests and the selected data assets, and a user’s affinity to specific data sources based on their classification and contents.
And while the widespread use of AI and ML in data governance is an ongoing process, the existing integrations of AI and ML technologies in data catalogs help automate operational aspects of data governance processes – already today. Sure, there’s still a way to go when building ML models and exploring additional opportunities where AI or ML could play a role. But make no mistake: both AI & ML are there to stay in data governance. You should, therefore, always take them into account when choosing a data governance tool or developing new data governance processes.
How can AI & ML strengthen your data governance practices?
Want to discuss how AI & ML can strengthen your data governance practices? Feel free to contact us for more information.