Data warehousing is common in the world of tech management today and is an innovation that many organizations are moving towards. A data warehouse architecture is used for overall communication and data management for all end-users within the same enterprise or network. Almost all data warehouses are different in characteristics but are characterized by the standard vital components present inside of them.
All production applications such as inventory control, payable product purchasing and payroll accounting applications are designed for online transaction processing or OLTP. All such applications gather and use data during their day-to-day operations. Data warehousing applications support the ad-hoc requirements of all these datasets and provide forecasting, trend analysis, profiling and summary reporting among other tasks.
A warehouse database is automated from top to bottom and is usually updated during off hours. The data accumulated within the warehouse is sorted, summarized and combined for future use.
Most data warehouses and the architectures used for them at the backend vary based on the elements within an organization’s situation. The common architectures followed by a data scientist include:
- With staging area
- With data marts and staging areas
In this article, we take a look at data warehousing architecture and the various types used within them. We go deeper into the technology and explore options.
Properties of Data Warehouse
A typical data warehouse comes with multiple tools that can be enjoyed by end-users on the warehouse. The access tools include:
- Application development tools
- Querying and Reporting Tools
- Executive Information Tools
- Data Mining Tools
- Online Analytical and Processing Tools
All of these tools can be used by end users to devise key operations within the data architecture. Having understood the key access points, we now shed some light on the properties of a data warehouse. The properties are structured into 5 different heads, including;
- Separation: Separation is the first property of a data warehouse, as all transactional and analytical processing is meant to be kept as apart as possible.
- Extensibility: The architect working on the data warehouse should have the ability to perform new operations and run new technologies without having to work on and revamp the entire system from start to finish.
- Administrability: The administration or the management of the Data Warehouse should be simplified rather than being made more complicated.
- Security: The warehouse should have monitoring protocols and accesses in place to make sure that all endpoints and entrances to the system are heavily guarded at all times.
- Scalability: Growth is something that no data warehouse can negate. The hardware and software protocols in the data warehouse should be extremely simple to upgrade. The data volume on the warehouse should be managed and should be progressively flexible to meet the growth of the organization.
Types of Data Warehouse Structure
While data warehouses can come in varying forms, there are three basic types of data warehouses in the data-centric world today. These types include:
A warehouse with a single-tier architecture is not implemented comprehensively in practice. The single-tier warehouse architecture works with the primary purpose of minimizing redundancies in data. The single-tier warehouse is mostly physical in nature.
As most people would know, the only physical layer in a single-tier warehouse is the source layer. The data warehouse is created through specific middleware and is used for a multidimensional view of all operational data.
The single-tier data warehouse structure is vulnerable in nature because it is unable to understand the requirements for separation between transactional and analytical processing. Queries are agreed towards operational data and affect the transactional workloads governing the warehouse.
The downsides of the single-tier model and the requirement for additional separation are met through the two-tier data warehouse structure studied below:
While this type of data warehouse is known as two-tiered because it separates the physical sources in the data warehouse, there are actually four layers and subsequent stages present in this architecture.
These layers include:
- The Initial Source Layer: Most two-tiered data warehouses use heterogeneous sources of data. The data comes from informational systems within or outside the walls of the organization.
- Data Staging: The data staging layer oversees the extraction, cleansing and structuring of all data. The staging process sees the integration of data along with the removal of all gaps and inconsistencies. Processes such as transformation, extraction and ETL transpire here.
- Data Warehouse: The information staged and structured in the previous layer is now saved in a separate repository known as the data warehouse. The data warehouse can be accessed by itself as well, but is mostly replicated in the form of data marts to be given to separate departments.
- Analysis: This is the layer where all data present within the warehouse is flexibly and efficiently accessed. The analysis will help generate actionable insights and guide business decisions.
The three-tier data warehouse model adds a reconciled layer between the source layer and the data warehouse layer of the two-tier architecture. The reconciled layer is positioned between the source and the warehouse.
The reconciled layer helps create a referencing model for departments across the organization. The reconciled layer can also be used to generate data flows, feeding external processes and providing data integration across the board. The referencing layer can take the analysis layer a little further away from the source, creating a time-lapse of sorts.
Data warehouses and their architectures play an integral role in the data-driven, digital age of today. The three types of data warehouses help run actionable data insights in organizations and provide intelligence.