BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES DIVISION
FIRST SEMESTER 2017-2018
COURSE HANDOUT
Course No. : CSI ZG515 / SS ZG515
Course Title : DATA WAREHOUSING
Instructors : YASHVARDHAN SHARMA
Course Description
Corporate decision makers require access to all the
organization’s data, wherever it is located. To provide comprehensive
analysis of the organization, its business, its requirements and any trends,
require access to not only the current data in the database but also to
historical data. To facilitate this type of analysis, data warehouses have
been created to contain data drawn from several sources, maintained by
different departments of the organization. This course will involve an
in-depth study of various concepts needed to design, develop, and maintain a
data warehouse. It also provides an introduction to end user access tools
like OLAP and reporting.
|
Text Books
No
|
Author(s),
Title, Edition, Publishing House
|
T1
|
Ponniah P, “Data
Warehousing Fundamentals”, Wiley Student Edition, 2012
|
T2
|
Kimball R, “The
Data Warehouse Toolkit”, 3e, John Wiley, 2013
|
Reference Books
No
|
Author(s),
Title, Edition, Publishing House
|
R1
|
Anahory
S, & Dennis M, “Data Warehousing in the Real World”,
Pearson Education, 2008.
|
R2
|
Kimball
R, Reeves L, Ross M, & Thornthwaite, W, “The Data Warehouse
Lifecycle Toolkit”, John Wiley, 2e, 2012.
|
R3
|
Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining: Concepts and Techniques”,
Morgan Kaufmann Publishers 2012
|
R4
|
Krish Krishnan, “Data Warehousing in the Age of Big Data”, Morgan Kaufmann Publishers 2013
|
R5
|
William H Inmon, et al., “DW 2.0 : The Architecture for
the Next Generation of Data Warehousing”, Morgan Kaufmann 2012
|
Legend
M = Module, RL
= Recorded Lecture, CS = Contact Session, LE = Lab Exercises, SS
= Self Study, HW = Home Work Problems
RL2.1 =
Recorded Lecture 1 in module 2.
RL2.2.3 =
Third topic recording in recorded lecture RL2.2
Modular Structure
Module Summary
No
|
Title
of the Module
|
M1
|
Introduction
to Data Warehousing
|
M2
|
Introduction
to Dimensional Modelling
|
M3
|
Architectural
Components of a Data Warehouse
|
M4
|
Extraction,
Transformation & Loading
|
M5
|
Advanced
Dimensional Modelling
|
M6
|
Online
Analytical Processing (OLAP) & Multidimensional Databases (MDDB)
|
M7
|
Query
Performance Enhancing Techniques
|
M8
|
Metadata
|
M9
|
Support
for Data Warehousing in RDBMS/SQL
|
M10
|
Real-time
Data Warehousing
|
M11
|
Current
Trends in Data Warehousing
|
Detailed Structure
Video: Introduction
to the course
M1: Introduction to Data Warehousing
Type
|
Description
|
RL1.1
|
Introduction
to Data Warehousing - I
RL1.1.1
= Evolution of Data Warehousing
RL1.1.2
= Operational System Characteristics
RL1.1.3
= Data Warehousing System Characteristics
RL1.1.4
= Data Warehouse Architecture
|
RL1.2
|
Introduction
to Data Warehousing - II
RL1.2.1
= Introduction to ETL, Data Staging Area & Presentation Servers
RL1.2.2
= Introduction to Data Warehouse Design
RL1.2.3
= Continuum of Analysis
|
CS1.1
|
CS1.1.1
= Review of concepts of DW, ETL, Information Delivery (T1, Ch 1)
CS1.1.2
= Examples of Business Intelligence applications (T2, Introduction)
CS1.1.3
= Challenges in designing DW (T2, Ch 1)
|
CS1.2
|
CS1.2.1
= Compare DW with Data Mining (R1, Appendix B)
CS1.2.2
= DW, Unstructured data and Big data (T2, Ch 21)
|
SS1.1
|
Self
Study (T1, Chapter 1; T2, Chapter 1)
|
HW1.1
|
Do
exercises given at the end of chapter 1 of T1
|
M2: Introduction to Dimensional Modeling
Type
|
Description/Plan/Reference
|
RL2.1
|
RL2.1.1
= E R Modelling
RL2.1.2
= Dimensional Modelling Vocabulary
RL2.1.3
= ER Modelling vs. Dimensional Modelling - Qualitative Comparison
|
RL2.2
|
RL2.2.1
= Data Warehouse Design Steps
RL2.2.2
= Grocery Store Case Study
|
CS2.1
|
CS2.1.1
= Review Dimensional Modeling (T1, Ch 10)
CS2.1.2
= Compare ER and Dimensional Modeling (T1, Ch 10)
|
CS2.2
|
CS2.2.1
= Explain Retail Store analysis requirements (T2, Ch 3)
CS2.2.2
= Retail Store Dimensional Model (T2, Ch 3)
|
SS2.1
|
Self
Study (T1, Chapter 10; T2, Chapter 3)
|
HW2.1
|
Do
exercises given at the end of chapter 10 of T1
|
M3: Architectural Components of a Data
Warehouse
Type
|
Description/Plan/Reference
|
RL3.1
|
RL3.1.1
= Data Marts
RL3.1.2
= ODS
RL3.1.3
= Top-down vs. Bottom up approaches to Data Warehousing
|
RL3.2
|
RL3.2.1
= ETL & Data Staging Area
RL3.2.2
= Presentation Servers
RL3.2.3
= OLAP
|
CS3.1
|
CS3.1.1
= Review concepts of data mart (T1, Ch 2)
CS3.1.2
= Explain pros and cons of taking data mart approach towards EDW (T1, Ch 2)
|
CS3.2
|
CS3.2.1
= Explain concepts of staging and OLAP (T1, Ch 7)
CS3.2.2
= Explain desirable features for staging server and OLAP (T1, Ch 7)
|
SS3.1
|
Self
Study (T1, Chapter 7)
|
HW3.1
|
Do
exercises given at the end of chapter 7 of T1
|
M4: Extraction, Transformation, &
Loading (ETL)
Type
|
Description/Plan/Reference
|
RL4.1
|
RL4.1.1
= ETL Overview
RL4.1.2
= Major ETL Tasks
RL4.1.3
= ETL Requirements
RL4.1.4
= Data Extraction
|
RL4.2
|
RL4.2.1
= Data Transformation
RL4.2.2
= Data Loading
RL4.2.3
= Initial Load & Refresh Cycles
|
RL4.3
|
RL4.3.1
= Data Quality
|
CS4.1
|
CS4.1.1
= Explain concepts of ETL (T1, Ch 12)
CS4.1.2
= Why data requires pre-processing (R3, Ch 3)
|
CS4.2
|
CS4.2.1
= Major types of transformations (R3, Ch 3)
CS4.2.2
= Methods of applying data to the warehouse (T1, Ch 12)
|
SS4.1
|
Self
Study (T1, Chapter 12)
|
HW4.1
|
Do
exercises given at the end of chapter 12 of T1
|
M5: Advanced Dimensional Modelling
Type
|
Description/Plan/Reference
|
RL5.1
|
RL5.1.1
= Changing Dimensions, Surrogate keys & Look up tables
RL5.1.2
= Mini-dimensions & Outriggers
|
RL5.2
|
RL5.2.1
= Time Dimension
RL5.2.2
= Conformed Dimensions & Facts
|
RL5.3
|
RL5.3.1
= Multi-valued Dimensions (Bridge & Helper Tables)
RL5.3.2
= Dimension Hierarchies
RL5.3.3
= Role-playing Dimensions
RL5.3.4
= Factless Fact Tables
|
RL5.4
|
RL5.4.1
= Academic Warehouse Case Study
|
RL5.5
|
RL5.5.1
= Bank Data Warehouse Case Study
|
CS5.1
|
CS5.1.1
= Illustrate problems of changing dimensions (T2, Ch 2)
CS5.1.2
= Techniques to handle changing dimensions (T2, Ch 2)
CS5.1.3
= Examples for mini-dimensions and outriggers (T2, Ch 2)
|
CS5.2
|
CS5.2.1
= Illustrate time dimension (T2, Ch 3)
CS5.2.2
= Explain enterprise bus architecture (T2, Ch 4)
CS5.2.3
= Illustrate bridge, role-playing dimensions, factless fact tables (T2, Ch 2)
|
SS5.1
|
Self
Study (T1, Chapter 11)
|
HW5.1
|
Do
exercises given at the end of chapter 11 of T1
|
M6: OLAP
& Multidimensional Databases (MDDB)
Type
|
Description/Plan/Reference
|
RL6.1
|
RL6.1.1
= Introduction to OLAP & Multidimensional Analysis
RL6.1.2
= Limitations of Spreadsheets & SQL
|
RL6.2
|
RL6.2.1
= Major OLAP Features & Functions
RL6.2.2
= Cube Operator
|
RL6.3
|
RL6.3.1
= Introduction to Multidimensional Databases (MDDB)
RL6.3.2
= Multidimensional Analysis & MDDBs
|
RL6.4
|
RL6.4.1
= OLAP operations using MDDBs
|
RL6.5
|
RL6.5.1
= MDDB vs. RDBMS
|
RL6.6
|
Cube
Computation: Complexity & Optimization
|
CS6.1
|
CS6.1.1 = Explain concepts of OLAP
(T1, Ch 15)
CS6.1.2
= Various OLAP operations with examles (R3, Ch 4)
CS6.1.3
= Relative strengths of MOLAP, ROLAP, and HOLAP (R3, Ch 4)
|
SS6.1
|
Self
Study (T1, Chapter 15)
|
HW6.1
|
Do
exercises given at the end of chapter 15 of T1
|
M7: Query Performance Enhancing Techniques
Type
|
Description/Plan/Reference
|
RL7.1
|
RL7.1.1
= Aggregation
RL7.1.2
= Sparsity Failure
|
RL7.2
|
RL7.2.1
= Shrunken, Lost, & Collapsed Dimensions
|
RL7.3
|
RL7.3.1
= Aggregate Navigator
RL7.3.2
= Aggregate Navigation Algorithm
|
RL7.4
|
RL7.4.1
= Partitioning
RL7.4.1
= Partitioning wrt Time
|
RL7.5
|
RL7.5.1
= View Materialization
RL7.5.2
= Selection of Views to Materialize
|
RL7.6
|
RL7.6.1
= View Maintenance Strategies
RL7.6.2
= Incremental Maintenance Algorithms
|
RL7.7
|
RL7.7.1
= Bitmap Indices
RL7.7.2
= Bitmap Compression Strategies
|
CS7.1.
|
CS7.1.1
= Data Warehouse performance challenges(T1, Ch 18)
CS7.1.2
= Concepts of physical design (T1, Ch 18)
CS7.1.3
= Use of aggregates, partitions for performance and operations (R1, Ch 6, Ch
7)
|
SS7.1
|
Self
Study (T1, Chapter 18)
|
HW7.1
|
Do
exercises given at the end of chapter 18 of T1
|
M8:
Metadata
Type
|
Description/Plan/Reference
|
RL8.1
|
RL8.1.1
= Role of Metadata
RL8.1.2
= Types of Metadata
|
RL8.2
|
RL8.2.1
= Metadata Design & Implementation
|
CS8.1
|
CS8.1.1
= Concepts of Metadata (T1, Ch 9)
CS8.1.2
= Why Metadata becomes more important in DW (T1,Ch 9)
CS8.1.3
= Examples of Metadata (T1, Ch 9)
|
SS8.1
|
Self
Study (T1, Chapter 9)
|
HW8.1
|
Do
exercises given at the end of chapter 9 of T1
|
M9:
Support for Data Warehousing in RDBMS/SQL
Type
|
Description/Plan/Reference
|
RL9.1
|
RL9.1.1
= Support for Data Warehousing in RDBMSs
|
RL9.2
|
RL9.2.1
= New Features in SQL
|
CS9.1
|
CS9.1.1
= Commonality between DW and RDBMS(T1, Ch 1)
CS9.1.2
= SQL extensions for analytics (R4,Ch 2, Ch 3, Ch 18)
|
SS9.1
|
Self
Study (R4, Chapters 2, 3, 18)
|
M10:
Real-Time Data Warehousing (RTDWH)
Type
|
Description/Plan/Reference
|
RL10.1
|
RL10.1.1
= Need for Real-time Data Warehousing
RL10.1.2
= Solutions for Real-time Data Warehousing
|
RL10.2
|
RL10.2.1
= Real-time ETL
RL10.2.2
= Role of ODS in RTDWH
|
CS10.1
|
CS10.1.1
= Concepts of real-time solutions(T2, Ch 20)
CS10.1.2
= Challenges with real-time data warehousing (T2, Ch 20)
CS10.1.3
= ETL vs. ELT (R6, Ch 13)
|
SS10.1
|
Self
Study (T1, Chapter 3; T2, Chapter 20)
|
M11:
Current Trends in Data Warehousing
Type
|
Description/Plan/Reference
|
RL11.1
|
RL11.1.1
= Introduction to World's Largest Data Warehouse
|
RL11.2
|
RL11.2.1
= Big Data Analytics
RL11.2.2
= Extended RDBMS Architecture
RL11.2.3
= MapReduce/Hadoop Architecture
|
CS11.1
|
CS11.1.1
= Recent trends in data warehousing (T1, Ch 3)
CS11.1.2
= Place of structured DW vs. Unstructured data, Big Data (T2,Ch 21)
CS11.1.3
= Possibilities of convergence of DW and Big Data (R5, Ch 13)
|
SS11.1
|
Self
Study (T1, Chapter 3)
|
HW11.1
|
Do
exercises given at the end of chapter 3 of T1
|
Instructional Plan – I
There are 10 contact sessions; 5 before
mid-semester and 5 post mid-semester. Each contact session is of two hour;
typical gap between two contact sessions is on average around 2 weeks. In the
tabular cells quote appropriate legends from the modular structure table
Session
|
Pre-contact prep
|
Contact session
|
Post-contact sessions
|
1
|
RL1.1,
RL1.2
|
CS1.1,
CS1.2
|
SS1.1,
HW1.1
|
2
|
RL2.1,
RL2.2
|
CS2.1,
CS2.2
|
SS2.1,
HW2.1
|
3
|
RL3.1,
RL3.2
|
CS3.1,
CS3.2
|
SS3.1,
HW3.1
|
4
|
RL4.1,
RL4.2, RL4.3
|
CS4.1,
CS4.2
|
SS4.1,
HW4.1
|
5
|
RL5.1,
5.2, 5.3, 5.4, 5.5
|
CS5.1,
CS 5.2, Review of all topics covered so far
|
SS5.1,
HW 5.1
|
|
|
Mid-Semester
|
|
6
|
RL6.1,6.2,6.3,6.4,6.5,6.6
|
CS6.1
|
SS6.1,
HW6.1
|
7
|
RL7.1,7.2,7.3,7.4,7.5,7.6,7.7
|
CS7.1
|
SS7.1,
HW7.1
|
8
|
RL8.1,
RL8.2
|
CS8.1
|
SS8.1,
HW8.1
|
9
|
RL9.1,9.2
|
CS9.1
|
SS9.1
|
10
|
RL10.1,
RL10.2, RL11.1, RL11.2
|
CS
10.1, CS 11.1, Review of all topics covered
|
SS11.1,
HW11.1
|
|
|
Comprehensive
Exam
|
|
Evaluation Scheme:
Legend: EC = Evaluation Component; AN =
After Noon Session; FN = Fore Noon Session
No
|
Name
|
Type
|
Duration
|
Weight
|
Day, Date, Session, Time
|
EC-1
|
Quiz-I/ Assignment-I
|
Online
|
-
|
5%
|
August 26 to September
4, 2017
|
|
Quiz-II
|
Online
|
|
5%
|
September 26 to
October 4, 2017
|
|
Quiz-III/ Assignment-II
|
Online
|
|
5%
|
October 20 to 30,
2017
|
EC-2
|
Mid-Semester Test
|
Closed Book
|
2 hours
|
35%
|
24/09/2017 (AN) 2 PM TO 4 PM
|
EC-3
|
Comprehensive Exam
|
Open Book
|
3 hours
|
50%
|
05/11/2017 (AN) 2 PM TO 5 PM
|
Important
Information:
Syllabus for Mid-Semester Test (Closed
Book): Topics in Contact Hours : 1 to 5
Syllabus for Comprehensive Exam (Open
Book): All topics (Session Nos. 1 to 10)
Important links and
information:
Students are expected to visit the Elearn portal on a regular basis and
stay up to date with the latest announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the
schedule provided on the Elearn portal.
Evaluation Guidelines:
1. EC-1 consists of either two
Assignments or three Quizzes. Students will attempt them through the course
pages on the Elearn portal. Announcements will be made on the portal, in a
timely manner.
2. For Closed Book tests: No
books or reference material of any kind will be permitted.
3. For Open Book exams: Use of
books and any printed / written reference material (filed or bound) is
permitted. However, loose sheets of paper will not be allowed. Use of
calculators is permitted in all exams. Laptops/Mobiles of any kind are not
allowed. Exchange of any material is not allowed.
4. If a student is unable to
appear for the Regular Test/Exam due to genuine exigencies, the student should
follow the procedure to apply for the Make-Up Test/Exam which will be made
available on the Elearn portal. The Make-Up Test/Exam will be conducted only at
selected exam centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular
in maintaining the self study schedule as given in the course handout, attend
the online lectures, and take all the prescribed evaluation components such as
Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the
evaluation scheme provided in the handout
No comments:
Post a Comment