Data mesh is very similar to cooking at scale (as in Indian-style weddings!), where each ingredient is sourced, sliced, stored, and provided to the chef for creating delicious dishes. The task of finding each ingredient – fruits, vegetables, dairy products, spices – is done by the most knowledgeable people for each one. Team members are given self-service tools and techniques for prepping the ingredients. Finally, the whole process is governed by a set of global standards (i.e., procuring only from top vendors) and local standards (i.e., defining the right temperature). This decentralized approach ensures that every dish is consistent in taste and served at the right time.
Data mesh is a people-, process-, and technology-focused approach to building a decentralized analytical data architecture. It leverages a domain-oriented, self-service design and shifts the responsibilities for specific data sets to the people closest to that data. While the governance remains with a primary team, the responsibility for providing and managing analytical data shifts to the domain groups, supported by a common data platform team.
As agility, intelligence, automation, and real-time decision-making become critical pillars for businesses, data mesh is a great tool to have in your toolbox.
Let’s take a walk through the evolution of analytical data architectures. It all started in the 1960s when data warehouses and data marts were introduced as the solution to the analytical needs of organizations. As technology progressed and computing became more economical, the architecture shifted toward the present data lake and cloud-based data fabric solutions [Figure 1]. Analytical capabilities grew with each iteration of the architecture—from simple SQL-based analysis to more visual and advanced artificial intelligence/machine learning (AI/ML) predictive analytics.
Despite architectural advancements, the core assumptions of the solutions around monolithic and technology-driven architecture have been largely unchallenged. At the proverbial ten-thousand-foot view, the data platform architecture resembles Figure 2, an imposing architecture that aims to source data from everywhere, cleaning, enriching, and serving it to various consumers with diverse needs. This massive task is compounded by the central team of data and ML engineers who have limited domain and business understanding but are responsible for meeting consumption needs. For instance, consider a pharmaceutical company that launches a new “buy-and-bill” drug. Introducing just one new key performance indicator (KPI), such as “conversion rate” (from prescription to administration), requires the ingestion of new data, cleansing, preparation, and aggregation. The components of the whole pipeline must be changed, limiting the ability to achieve higher velocity for large-scale and complex analysis. This situation creates an inflection point where results are no longer proportional to the investments.
A 2022 McKinsey survey showed that high performers engage in “frontier” practices more often, enabling AI development and deployment at scale, or what some call the “industrialization of AI.” For example, leaders are more likely to have a data architecture that is modular enough to accommodate new AI applications rapidly.1
To build a data mesh that is both functional and efficient, follow these fundamental concepts:
In addition to the core framework described above, the following best practices will ensure the successful implementation of a data mesh:
Implement individual models based on data warehouse modeling techniques in each bounded context. Next, have the platform solution team standardize the language in which different contexts interact. For best results, use the “customer-supplier” pattern wherein upstream and downstream teams collaborate so each can succeed independently when defining the contracts between domains.
When “defining data as a product,” remember that this requires critical mindset changes, such as measuring success through adoption, satisfaction, and establishing trust.
Finally, your central governance team should include representatives from different stakeholder groups. How these groups define your global policies on interoperability, collaboration, security and access, privacy, compliance, and documentation is critical for the success of a data mesh.
Domain-driven design (DDD) should be the core of data mesh architecture. Using the concept of bounded context, the basic premise of DDD is that a business function, like customer relationship management (CRM), can be modeled differently depending on the context. Context mapping or “data contracts” define the relationships and integrations between different contexts.
Applying a DDD results in domain-oriented analytical data, which can be defined by three archetypes:
At Axtria, we integrate data mesh thinking and processes in our product, Axtria DataMAx™, to help our customers close the gap between how data is produced and how it is consumed. We move away from point-to-point extract, transform, and load pipelines, increasing trust and adoption while driving efficiency and bringing significant productivity gains.
The figure below shows a reference architecture for implementing data mesh:
In conclusion, data mesh can make organizations more agile during their analytics-at-scale journey.
We have outlined critical directional guidance for organizations seeking to embark on the data mesh journey. This effort can be costly and brings challenges across cultural shifts, change management, governance, skills, and data accessibility. By getting the fundamentals right early on, you will increase your chances of finding success.