Challenge
Migrating millions of data assets into the cloud is a complex and challenging project that involves multiple teams and resources. For a major cloud migration project at one of our clients, a large telecommunications company, the internal project team needed to comprehend how data-driven teams within the organization depend on the availability of physical data assets to be migrated. And how, for example, the unavailability of specific data tables or columns would impact consuming assets (such as BI reports or Marketing campaigns) that rely on those physical data assets.
Our client considered this cloud migration project the perfect opportunity to demonstrate the value that Data Governance can bring to their entire organization. They had already implemented an operational Data Governance environment, pushing the metadata of millions of assets stored in their data warehouse, data lake or local databases to Collibra. To further boost the adoption of the Collibra platform and facilitate this major cloud migration project, this client decided to reach out to Datashift to take on the challenge together.
Approach
As a first step, we focused on scoping the consuming assets impacted by the cloud migration. For example, which consuming assets depend on data tables or columns that will move to the cloud? Which relevant metadata do we want to govern in Collibra for these consuming assets? The outcome of this scoping exercise was to include all BI reports, Machine Learning and AI models, and Marketing campaigns into Collibra, along with the links to the physical data assets on which those consuming assets depend.
Next, we worked closely with the stakeholders from all impacted teams to decide on various questions. Which metadata, for example, should we include in Collibra for these consuming assets? How should we link those consuming assets to the physical data assets? Should we do this on the level of a data table or a data column? And how should we link these data tables and columns to logical data domains within the company (such as Product, Internet Services, Customer, Legal, IT, …) to assign responsible teams per data domain?
Answering those questions proved challenging since manual efforts were needed in several cases to come up with the required information. Additional complexity arose because linking all physical data assets on the same level (a table or a column) was impossible. We therefore supported all impacted teams by providing them with high-level Collibra training along with a series of templates to fill out the relevant metadata for their consuming assets and the links to the physical data assets.
Finally, we uploaded this metadata into Collibra and linked the consuming assets to the physical data assets. In addition, we created Data Lineage diagrams that visualize how all pieces of the puzzle link to each other. These Data Lineage diagrams enabled everybody on the impacted teams to see precisely which physical data assets are used for their BI dashboards, Machine Learning and AI models, or Marketing campaigns. Maybe even more importantly, these diagrams enabled the cloud migration project team to understand how the unavailability of a specific data table or column impacts downstream data consumers.
Impact
By using Collibra to manage the metadata of consuming assets (BI reports, Machine Learning and AI models, Marketing campaigns) and link those consuming assets to physical data assets and logical data domains (Product, Internet Services, Customer, Legal, IT, …), everybody in our client’s organization had access to the entire lineage of these assets.
BI product owners could see what data is used to build the BI dashboards they are responsible for and when this data is planned for migration to the cloud. For any data table, data architects could see which consuming assets depend on this table and take this dependency into account when detailing the cloud migration plan. Data producers were able to overview all physical data assets linked to the logical data domain they are responsible for and which consuming assets depend on these physical data assets. Such transparency is vital in enabling all teams to act upon the available information and prevent issues during cloud migration. It is a prerequisite for a smooth process without unforeseen delays or service interruptions.