Building a Modern Data Estate
Published 22 July 2021/Blog
Data, data, data. Data everywhere. Data from enterprise systems. External data. Customer data. People data. Data from social media channels. Raw data. Unstructured data. Semi-structured data. Is your organization able to unlock value from all these data sources? Can everyone in your organization access the real time data they need to make data-driven decisions at the right time? How mature and futureproof is your data strategy?
With the help of these questions, our partner TimeXtender has developed eight steps for how you can work even more data-driven and go from a data warehouse to a modern data estate.
Step 1: Define your goal in terms of data and analytics maturity
How mature is your organization in the area of data consumption? What do you want to achieve with your data estate in terms of analytics maturity? In the article ‘Take your Analytics Maturity to the Next Level’, Gartner cites that:
“In a recent Gartner survey, 87.5% of respondents had low data and analytics maturity, falling into ‘basic’ or ‘opportunistic’ categories. Organizations at the basic level have business intelligence (BI) capabilities that are largely spreadsheet-based analyses and personal data extracts. Those in the opportunistic category have individual business units that pursue their own data and analytics initiatives as stand-alone projects, but there is no common structure across them.”
In this step you ask yourself what is currently working and what is not working. You describe the bottlenecks, pain points and the data silos in your organization. At the end of this step, you have a clear understanding of what you want to achieve, which pain points you are going to solve and where the low hanging fruit can be found.
Step 2: Define the business needs of today and the future
“In a competitive environment, where data can make or break a businesses’ competitive advantage, corporate success might very well be measured by the maturity of its enterprise data program” mentions the earlier cited article from Forbes.
In this step, it’s not about IT. It’s not about limitations, restrictions or data silos. It’s all about a laser focus on business requirements. When transforming your data warehouse into a modern data estate, the biggest mistake one can make is a replication of the existing environment into a new environment. You need to ask the right questions to every stakeholder involved, from sales to marketing and from HR to operations.
What do they want to measure? Where will their business be headed to in the future? What trends do they see? How will digital transformation impact data insights? An HR-leader wants to measure traditional KPI’s today, such as employee engagement, diversity or belonging. Trends in HR, such as the war on talent will however mean that HR needs insights into the individual skills and competencies of employees so that scarce skills can be quickly allocated to the most strategic projects. Following your tour of the business it’s time to prioritize these business needs, as you will want to start out small.
Step 3: Describe the core business and data processes
You’ve defined your goal in terms of data and analytics maturity. You’ve prioritized the business needs of all your stakeholders. Now it’s time to identify the data sources you have available – and which data you want to initially find a home in your data estate to respond to the highest priorities from your list from step 2. In this step, it is both about business processes and data processes since they are connected. You look at a certain data point, such as customer data, and then you define the relational data models. How is this customer data is used in your business processes? The same applies to transactions, products and more.
Step 4: How will the data be accessed and by whom?
Employees are different in the data they have access to (security) and in the way they access the data (tooling). In this step you describe your security strategy and the tools for analyzing, reporting, and visualizing data. The first question to answer: How are you going to connect to the data sources? What connectors do you need? How often can you read data? What kind of data do you get? What metadata is available? How often is data updated and how often do you have access to data? The second question to answer: How will you manage security and access rights?
How people access the data
Consider the data consumers you want to serve, and how you want to serve them. You want to provide self-service BI for different types of consumers, ranging from power users such as data scientists, data miners, AI and ML algorithms to business users working ad-hoc with data and creating new reports, and casual users waiting for the routine reports and updated dashboards.
The data people can access
In this step you define roles and groups so that, building your data estate, you can identify users and access rights. This ensures that authenticated users only access the data, tables, or columns they are authorized to see.
Step 5: Define your architecture
Data needs to be extracted, processed, and refined to be useful. And just as oil can be refined into different types of fuel, data can be prepared for different uses when it comes to analytics and artificial intelligence. In this step you describe how your organization chooses to prepare data for these different uses, from reporting to analytics and artificial intelligence. Most data estates are split in 3 distinct layers: the data lake, the data warehouse, and the data marts. The result is an integrated architecture that significantly reduces costs, accelerates time-to-value, and supports your data compliance needs.
This layer is primarily for power users such as data scientists, who perform various types of analysis on raw data to look for anomalies and patterns, and eventually perform machine learning. This layer enables quick ingestion of raw data from all data sources and into Azure Data Lake or a SQL Database.
Raw data isn’t the best choice for business users, such as business analysts. These users need data that has been cleansed, enriched, and rationalized – in a modern data warehouse. In a layered data architecture, this data warehouse would be sourced from the data lake – but placed in a SQL-based database with semi-structured data transformed into structured data for analysis.
The data mart supports common users by delivering relevant datasets from the data warehouse, enabling self-service analytics across multiple analytics tools for line of business or function specific views, so that business users can explore data safely and efficiently.
Step 6: Cloud, On Premise or Hybrid
Where in real-life the foundation is essential for the building that’s constructed, in data-life the foundation of your data estate is equally important. You don’t want your data estate to end up like the leaning tower of Pisa, where your data estate becomes a costly affair to maintain. Consider the pros and cons of cloud, on premise and hybrid. There are great cloud solutions available on the market – such as Microsoft – but “Cloud should be thought of as a means to an end. The end must be specified first”, says David Smith, Distinguished VP Analyst and Gartner Fellow Emeritus in the article ‘The Top 10 Cloud Myths’.
Step 7: Selecting your construction partners
In this step you select your construction partners. Which software will you use for your data estate? Who will you build the estate? And how will you maintain the estate?
Data management and automation software
You should select the right software platform for today and the future. You will want to ensure that your data estate is built in with an integrated data management platform that is completely independent from developers, data sources, data platforms (SQL Server, Azure SQL, Data Lake, Synapse), front-end tooling (Power BI, Qlik) and deployment model (on premise, cloud, hybrid). You should be able to expedite development with automated code-generation, freeing data engineers to focus on data quality and business requirements and limit the required number and types of high-ly skilled resources by using a single tool to build your data lake, data warehouse, and data marts. Last, but not least, you will want to ensure you data estate is ‘future-proof’ meaning it is fully scalable and ready to adopt future releases without rebuilding.
Deployment and maintenance partner
Will you deploy and maintain the data estate yourself? Will you consider a deployment partner and then take on the maintenance yourself? The latter is what many organizations opt for – as they want to ultimately be able to take the control of their data in their own hands, and not having to depend on a business partner. However, whatever you decide – consider a partner with experience and a partner you can trust, as they will be deploying a future-proof foundation for your most valuable asset: data.
Step 8: Think big, start small and act agile
Have you followed all these seven steps? Then it is very likely that your data estate will help drive innovation and that you will be deploying a scalable, future-proof data environment. It is key to start small. For example, by developing an estate in the cloud (which you can scale to production), with only a couple of data sources, a few tools, and then testing and experimenting. Step by step you’ll then help your organization to transform into a data driven organization.
Do you want help with any of these eight steps? Request a meeting today.
What is Serverless? Pros & Cons
Published 19 October 2021
The term serverless is heard frequently, but what does it really mean to build an application with serverless? No servers? Not rea...
The Hidden Correlation Between Machine Learning and Data Management
Published 29 September 2021
For decades machine learning (ML) was slow to evolve because of the complexity of emulating human thought and the difficulty in pr...