DATA-WAREHOUSING
What is Data-warehousing?
A Data warehouse is the collection of the historic data in a multidimensional space for a business or an application or an organization which aids the Business Analyst of an organization in Decision making. With the increase in competition among the companies in any form of business, a platform for data analysis has become very essential in key decision making. A Data warehouse holds the historic information separately from the system; it provides the key business information which are extracted and cleansed from the system database using several methods of ETL(Extract, Transact and Load).In the last couple of decades the data-warehousing has evolved significantly due to millions of dollars invested by several companies around the globe.
As defined by William H. Inmon “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process”
The data available in the data warehouse is classified according to the subjects compared to the transaction data available in the system. The data-warehouse normally tend to have n number of sources like relational database, excel or text files or sometimes even another data warehouse. The data from these multiple sources are integrated in the data-warehouse after performing several data cleansing, filtering, processing techniques in the ETL stage. The data is normally presented according to the time scale which is the most important dimension. The data warehouse is separated from the actual database where the daily transactions take place. The main motivation for doing so is to have better performance in both which serves different purpose. The data warehouse is initially loaded with the data on its creation and from then onwards it is periodically updated with the latest data from its source database. The data warehouse has no relation with the transactions and its purely for the data analysis. More often the Data warehouse may not contain the most up to date information.
The organization’s normal day to day information like user registration, bank transactions, and products sold, products bought etc are stored in the database which is referred as on-line transaction processing (OLTP) systems. This is the primary source for the data-warehouse which is also called as the on-line analytical processing (OLAP) systems. There are several distinguishing features between them. The OLTP is used by several thousand compared to OLAP which is used only the executives for decision making. The OLTP stores the current information where transaction takes place where as the OLAP stores the historic information. The OLTP is designed using ER modeling where as the OLAP follows either the snowflake or star schema. The size of the OLAP database is much bigger compared to the OLAP.
OLTP VS OLAP
|
OLTP
|
OLAP
|
Purpose
|
Data Transaction
|
Data Analysis
|
Type of Access
|
Read/Write
|
Read(mostly)
|
Design Type
|
Entity Relationship model
|
Star/Snowflake schema
|
No of Users
|
1000s
|
100s
|
User Type
|
Software Professional, Admin
|
Analysts and Executives
|
Data Type
|
Flat data
|
Multidimensional data
|
Functionality
|
Day to day transaction data
|
Historic data
|
Database size
|
Less than 1 GB
|
More than 100 GB
|
Importance to
|
Quick transaction of data
|
Response time for data retrieval
|
Data is
|
Non summarized
|
summarized
|
CUBE: A MULTIDIMENSIONAL DATA MODEL