The Problem
The company’s data landscape had evolved over time and was highly fragmented. Master and transaction data for customers and suppliers, as well as external information on markets and competitors, were stored across a multitude of heterogeneous data sources, including file-based systems, relational SQL databases, and document-oriented storage solutions.
This distributed structure led to significant inefficiencies in data usage. In particular, there was a lack of consistent and comprehensive documentation of the data assets, which severely limited transparency and traceability.
In addition, a pronounced silo mentality developed among the involved departments and stakeholders, which further complicated cross-departmental collaboration and data usage.
As a result, much of the available data potential remained untapped, and data-driven decisions could only be made to a limited extent or on an uncertain basis.
To achieve a sustainable improvement in the data landscape, a company-wide data governance policy was first designed and implemented. This policy defined clear responsibilities in the form of data ownership, as well as binding standards for handling data throughout its entire lifecycle.
A central component of the implementation was the structuring and standardization of incoming data streams as well as the ETL (Extract, Transform, Load) processes. This enabled data to be consistently integrated and provided with quality assurance.
Building on this foundation, a scalable data lake architecture was established, serving as a central repository for structured and unstructured data while simultaneously establishing clear processes and access rules.
Building on this, a high-performance data warehouse environment was implemented, specifically designed to provide business-critical metrics. This forms the foundation for informed decisions in the areas of sales, procurement, and marketing.
