Challenge
Our client, a global enterprise embarking on a data strategy project, faced significant challenges approximately a year ago. Recognizing the importance of harnessing insights from their vast data, they grappled with the absence of a cohesive approach to unlock its full potential. The need to adopt a cross-domain perspective was apparent, yet establishing a future-proof framework proved elusive. Moreover, managing the immense volume of data from diverse partners added complexity to the endeavor.
Approach
In response to our client's challenges, Datashift implemented a systematic approach to reshape their data landscape. The initial step involved consolidating data from disparate domains, scattered across various global platforms, into a unified environment—AWS was chosen for this purpose. Facilitating the migration from Cloudera to AWS, Datashift conducted meticulous data checks to enhance overall data quality. We built standardized reports, so business owners can follow up the data quality themselves if they see strange things in their reports.
Subsequently, we focused on establishing master data to create a standardized lexicon used universally across the company. This ongoing process ensures consistency in data related to client sites and customers. From the moment we see the same concepts being used in different domains, we move the data to the master data.
The final step we took was creating reports on the data. The biggest challenge we got here was the enormous amounts of data. We have reporting needs on 24 months of data for all the shops of our client. This results in a dataset of over 60 million of lines. Since we knew we did not yet cover the whole network and thus the amount of data would fivefold in 2 years, a good future proof way of working was vital.
To address the formidable task of managing vast datasets, we employed a two-tiered strategy. On the AWS side, specific aggregated tables were created to handle reporting needs efficiently. A dataflow between the AWS platform and Power BI, the client's reporting tool, was established to facilitate the generation of calculated columns for reporting purposes. Notably, our approach prioritized creating fields with business value in the data warehouse, while the dataflow included only fields essential for reporting, ensuring efficiency and a future-proof setup.
The dataflow then was used as input for a dataset which contains measures that show the real business value. The reports are then based on these datasets. Datasets were strategically split from visuals into different files, promoting one version of the truth and facilitating reuse for multiple reports within the same domain.
Impact
While our work continues to evolve, its impact is already evident. The emphasis on data quality and the creation of standardized reports have empowered business owners to monitor data quality independently. Agile methodologies, implemented through sprints and sprint reviews, have fostered a collaborative environment across business domains. This has led to informed decision-making based on generated reports and has uncovered new use cases.
The client now witnesses tangible benefits and potential from a robust data strategy. The shift towards becoming a data-driven organization is underway, with Datashift playing a pivotal role in guiding the client toward their data-driven goals. The ongoing collaboration and the emerging culture of cross-domain insights position our client on a trajectory of sustained growth and informed decision-making.