What is a Data Warehouse?
A data warehouse is a large-capacity repository that sits on top of multiple databases. Whereas the conventional database is optimized for a single data source, such as payroll information, the data warehouse is designed to handle a variety of data sources, such as sales data, data from marketing automation, real-time transactions, SaaS applications, SDKs, APIs, and more.
There are other differences, as well. For example, single-source databases are built for speed, employing online transactional processing (OLTP) to insert and edit small transactions. However, due to their structure, they do not lend themselves to advanced analytics. In contrast, a data warehouse uses online analytical processing (OLAP), which is designed for fast, sophisticated analysis.
Databases and data warehouses do have some similarities, however. Besides the fact that they are both repositories for large amounts of data, both can be queried. And they both have the ability to store data in tables (although databases only store data in two-dimensional tables; data warehouses contain multidimensional tables with layers of columns and rows).
Data warehouse types
Companies are increasingly moving away from on-premise data warehouses to the cloud, leveraging the cost savings and scalability managed services can provide. The architecture of these cloud-enabled data warehouses differs from that of their traditional, on-premise counterparts.
Traditional data warehouse
Traditional data warehouse architecture is separated into three tiers: one for the database server that extracts data from multiple data sources, one for the OLAP server (which transforms the data), and one for the client level.
Cloud data warehouse
Cloud-based data warehouses are an entirely different animal. Their architecture varies tremendously among vendors. For example, Amazon’s Redshift is essentially a cloud-based representation of on-premise data warehouses. BigQuery is serverless so it manages computing resources dynamically and hides resource-management decisions from the user.
The cloud offers some distinct advantages:
- It’s managed. Instead of hiring your own data-warehousing team, a cloud data warehouse lets you outsource the management hassle to professionals who must meet service level agreements (SLAs).
- It outperforms on-premise data warehouses. Cloud-based solutions offer superior reliability and speed. They are generally more secure than on-premise data warehouses, making them a good choice for the enterprise.
- It’s built for scale. Cloud-based data warehouses are elastic, so you can instantly add capacity.
- It’s more cost effective. With cloud, you pay for what you use. Some providers charge by throughput. Others charge per hour per node. In every case, you avoid the mammoth costs incurred by an on-premise data warehouse that runs 24 hours a day, seven days a week.
Do you need a data warehouse?
Some businesses and industries require more data analysis than others. For example, Amazon uses real-time data to adjust prices three or four times a day. Insurance companies track policies, sales, claims, payroll, and more. They also use machine learning to predict fraud. Gaming companies must track and react to user behavior in real time to enhance the player’s experience. Data warehouses make all of these activities possible.
If your organization has or does any of the following, you’re probably a good candidate for a data warehouse:
- Multiple sources of disparate data
- Big-data analysis and visualization — both asynchronously and in real time
- Custom report generation/ad-hoc analysis
- Data mining
- Machine learning/AI
- Data science
These activities and assets require more than the traditional single-source database can provide. They require an “industrial-strength” data warehouse.