This is an introduction to Data Warehouse. In this tutorial you will learn…
- Why Data Warehousing ?
- The Need for a Data Warehouse
- Historical Overview
- Crisis of Credibility
- Difference Between Database Approaches and Data Warehouse Approaches
- What is a Data Warehouse
Why Data Warehousing ?
You as an information technology professional have been working in development of computer systems as a developer , programmer , analyst , database administrator or project manager. You may have developed different systems of order processing, general
ledger , inventory , in-patient billing , checking accounts , insurance claims , and so on. Without these system no modern business can survive because these system are used to process orders , maintain inventory , keep the accounting books , service the clients , receive payments , and process claims.
In the 1990s, as businesses grew more complex, corporations spread globally, and
competition became fiercer, business executives became desperate for information to stay
competitive and top in business. Thus, many countries in the world have
placed the modern information technology into their strategic plans. They regard it as the
most important strategic resource for the development their society, and are trying their
best to reach and occupy the peak of the modern information driven knowledge economy. Following Figure shows what an organizations can achieve by using data warehouse in their business.
The Need for a Data Warehouse
Now our focus is on the most crucial question and that is ” Why Organizations Need Data Warehouse ? “. The answer to this question is really important because if you understand why organizations really need data warehouse then you will be more interested to learn data warehousing.
Consider a bank which is losing customers, for reasons not known. However, one thing is
for sure that the bank is losing business because of lost customers. Therefore, it is
important, actually critical to understand which customers have left and why they have
left. This will give you the ability to predict going forward (in time), to identify which
customers will leave you (i.e. the bank). Operational system is not able to answer these kind of questions because it only saves day to day transactions in the business. We need strategic information to know the actual reason of loosing customers in the bank.
Who needs strategic information in an organization ? What exactly do we mean by strategic information ? The answer is : The executives and managers who are responsible for keeping the business company competitive need information to make proper decisions. They need information to formulate the business strategies, establish goals , set objectives, and monitor results of different things. Here are some examples of business objectives:
- Retain the present customer base
- Increase the customer base by 15% over the next 5 years
- Gain market share by 10% in the next 3 years
- Enhance customer service level in shipments
- Bring three new products to market in 2 years
Strategic information is not for running the day-to-day operations of the business. It is
not intended to produce an invoice, make a shipment, settle a claim, or post a withdrawal
from a bank account. Strategic information is far more important for the continued health
and survival of the corporation. Critical business decisions depend on the availability of
proper strategic information in an enterprise. Following figure shows characteristics of
It is interesting to note that DSS (Decision Support System) processing as we know it
today has reached this point after a long and complex evolution, and yet it continues to
evolve. The origin of DSS goes back to the very early days of computers.
The spreading of master files and massive redundancy of data presented some very serious problems, such as:
• Data coherency i.e. the need to synchronize data upon update.
• Program maintenance complexity.
• Program development complexity.
• Requirement of additional hardware to support many tapes.
Crisis of Credibility
Consider the CEO of an organization who is interested in the financial health of his
company. He asks the relevant departments to work on it and present the results. The
organization is maintaining different legacy systems, employs different extract programs
and uses different external data sources. As a consequence, Department-A which uses a
different set of data sources, external reports etc. as compared to Department-B (as shown
in Figure-1.3) comes with a different answer (say) sales up by 10%, as compared to the
Department-B i.e. sales down by 10%. Because Department-B used another set of
operational systems, data bases and external data sources. When CEO receives the two
reports, he does not know what to do. CEO is faced with the option of making decisions
based on politics and personalities i.e. very subjective and non-scientific. This is a typical
example of the crisis in credibility in the naturally evolving architecture. The question is
which group is right? Going with either of the findings could spell disaster, if the finding
turns about to be incorrect. Hence the secondimportant question, result of which group is
credible? This is very hard to judge, since neither had malicious intensions but both got a
different view of the business using different sources.
Difference Between Database Approaches and Data Warehouse Approaches
Now we will discuss the difference between operational and decision support systems.
Decision support systems
What is a Data Warehouse ?
The data warehouse is an informational environment that
- Provides an integrated and total view of the enterprise
- Makes the enterprise’s current and historical information easily available for decision making
- Makes decision-support transactions possible without hindering operational systems
- Renders the organization’s information consistent
- Presents a flexible and interactive source of strategic information
A Data Warehouse is not something shrink-wrapped i.e. you take a set of CDs and install
into a box and soon you have a Data Warehouse up and running. A Data Warehouse
evolves over time, you don’t buy it. Basically it is about taking/collecting data from
different heterogeneous sources. Heterogeneous means not only the operating system is
different but so is the underlying file format, different databases, and even with same
database systems different representations for the same entity. This could be anything
from different columns names to different data types for the same entity.
Companies collect and record their own operational data, but at the same time they also
use reference data obtained from external sources such as codes, prices etc. This is not the
only external data, but customer lists with their contact information are also obtained
from external sources. Therefore, all this external data is also added to the data
In a decision support environment, the end user i.e. the decision maker is interested
in the big picture. Typical DSS queries do not involve using a primary key or
asking questions about a particular customer or account. DSS queries deal with
number of variables spanningacross number of tables (i.e. join operations) and
looking at lots of historical data. As a result large number of records are processed
and retrieved. For such a case, specialized or different database architectures/topologies
are required, such as the star schema.