Data Mesh: Three common pitfalls to avoid
Data mesh is a sociotechnical approach to build a decentralized data architecture by leveraging a domain-oriented, self-serve design (in a software development perspective), and borrows Eric Evans’ theory of domain-driven design and Manuel Pais’ and Matthew Skelton’s theory of team topologies. Data mesh mainly concerns about the data itself, taking the data lake and the pipelines as a secondary concern. The main proposition is scaling analytical data by domain-oriented decentralization. With data mesh, the responsibility for analytical data is shifted from the central data team to the domain teams, supported by a data platform team that provides a domain-agnostic data platform.
Among others, we see three common pitfalls that you may avoid when running your Data Mesh initiative:
- “Just start” without tailoring the Data Mesh concept to your organization. Terms and roles might be different in your company, so as culture and other important factors to consider that are not all written down in literature, yet.
- Data Mesh is a socio-technical approach to handle (enterprise) data. Do not forget the “socio” in socio-technical approach for data. Data Mesh is more about decentralizing domains and establishing roles and responsibilities than about technology and architecture.
- Interpreting data platform as a central system (to rule them all). This often leads to a centralized data lake approach rather than a data mesh. Cloud providers like Microsoft often focus on the technical aspect of a Data Mesh and describe it as “an architectural pattern for implementing enterprise data platforms in large, complex organizations. It helps scale analytics adoption beyond a single platform and a single implementation team.”