BITS WILP Data Warehousing Handout 2017-H2



BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES DIVISION

                                                            FIRST SEMESTER 2017-2018

COURSE HANDOUT

Course No.   :   CSI ZG515 / SS ZG515
Course Title :   DATA WAREHOUSING
Instructors   :  YASHVARDHAN SHARMA

Course Description
Corporate decision makers require access to all the organization’s data, wherever it is located. To provide comprehensive analysis of the organization, its business, its requirements and any trends, require access to not only the current data in the database but also to historical data. To facilitate this type of analysis, data warehouses have been created to contain data drawn from several sources, maintained by different departments of the organization. This course will involve an in-depth study of various concepts needed to design, develop, and maintain a data warehouse. It also provides an introduction to end user access tools like OLAP and reporting.



Text Books
No
Author(s), Title, Edition, Publishing House
T1
Ponniah P, “Data Warehousing Fundamentals”, Wiley Student Edition, 2012

T2
Kimball R, “The Data Warehouse Toolkit”, 3e, John Wiley, 2013

Reference Books
No
Author(s), Title, Edition, Publishing House
R1
Anahory S, & Dennis M, “Data Warehousing in the Real World”, Pearson Education, 2008.
R2
Kimball R, Reeves L, Ross M, & Thornthwaite, W, “The Data Warehouse Lifecycle Toolkit”, John Wiley, 2e, 2012.
R3
Jiawei Han, Micheline Kamber and Jian Pei,  Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers  2012
R4
Krish Krishnan, “Data Warehousing in the Age of Big Data”, Morgan Kaufmann Publishers  2013
R5
William H Inmon, et al., “DW 2.0 : The Architecture for the Next Generation of Data Warehousing”, Morgan Kaufmann 2012

Legend  
M = Module, RL = Recorded Lecture, CS = Contact Session, LE = Lab Exercises, SS = Self Study, HW = Home Work Problems

RL2.1 = Recorded Lecture 1 in module 2.
RL2.2.3 = Third topic recording in recorded lecture RL2.2


Modular Structure

Module Summary
No
Title of the Module
M1
Introduction to Data Warehousing
M2
Introduction to Dimensional Modelling
M3
Architectural Components of a Data Warehouse
M4
Extraction, Transformation & Loading
M5
Advanced Dimensional Modelling
M6
Online Analytical Processing (OLAP) & Multidimensional Databases (MDDB)
M7
Query Performance Enhancing Techniques
M8
Metadata
M9
Support for Data Warehousing in RDBMS/SQL
M10
Real-time Data Warehousing
M11
Current Trends in Data Warehousing


Detailed Structure

Video: Introduction to the course

M1: Introduction to Data Warehousing
Type
Description
RL1.1
Introduction to Data Warehousing - I
RL1.1.1 = Evolution of Data Warehousing
RL1.1.2 = Operational System Characteristics
RL1.1.3 = Data Warehousing System Characteristics
RL1.1.4 = Data Warehouse Architecture
RL1.2
Introduction to Data Warehousing - II
RL1.2.1 = Introduction to ETL, Data Staging Area & Presentation Servers
RL1.2.2 = Introduction to Data Warehouse Design
RL1.2.3 = Continuum of Analysis
CS1.1
CS1.1.1 = Review of concepts of DW, ETL, Information Delivery (T1, Ch 1)
CS1.1.2 = Examples of Business Intelligence applications (T2, Introduction)
CS1.1.3 = Challenges in designing DW (T2, Ch 1)
CS1.2
CS1.2.1 = Compare DW with Data Mining (R1, Appendix B)
CS1.2.2 = DW, Unstructured data and Big data (T2, Ch 21)
SS1.1
Self Study  (T1, Chapter 1; T2, Chapter 1)
HW1.1
Do exercises given at the end of chapter 1 of T1


M2: Introduction to Dimensional Modeling
Type
Description/Plan/Reference
RL2.1
RL2.1.1 = E R Modelling
RL2.1.2 = Dimensional Modelling Vocabulary
RL2.1.3 = ER Modelling vs. Dimensional Modelling - Qualitative Comparison
RL2.2
RL2.2.1 = Data Warehouse Design Steps
RL2.2.2 = Grocery Store Case Study
CS2.1
CS2.1.1 = Review Dimensional Modeling (T1, Ch 10)
CS2.1.2 = Compare ER and Dimensional Modeling (T1, Ch 10)
CS2.2
CS2.2.1 = Explain Retail Store analysis requirements (T2, Ch 3)
CS2.2.2 = Retail Store Dimensional Model (T2, Ch 3)
SS2.1
Self Study  (T1, Chapter 10; T2, Chapter 3)
HW2.1
Do exercises given at the end of chapter 10 of T1

M3: Architectural Components of a Data Warehouse
Type
Description/Plan/Reference
RL3.1
RL3.1.1 = Data Marts
RL3.1.2 = ODS
RL3.1.3 = Top-down vs. Bottom up approaches to Data Warehousing
RL3.2
RL3.2.1 = ETL & Data Staging Area
RL3.2.2 = Presentation Servers
RL3.2.3 = OLAP
CS3.1
CS3.1.1 = Review concepts of data mart (T1, Ch 2)
CS3.1.2 = Explain pros and cons of taking data mart approach towards EDW (T1, Ch 2)
CS3.2
CS3.2.1 = Explain concepts of staging and OLAP (T1, Ch 7)
CS3.2.2 = Explain desirable features for staging server and OLAP (T1, Ch 7)
SS3.1
Self Study  (T1, Chapter 7)
HW3.1
Do exercises given at the end of chapter 7 of T1


M4: Extraction, Transformation, & Loading (ETL)
Type
Description/Plan/Reference
RL4.1
RL4.1.1 = ETL Overview
RL4.1.2 = Major ETL Tasks
RL4.1.3 = ETL Requirements
RL4.1.4 = Data Extraction
RL4.2
RL4.2.1 = Data Transformation
RL4.2.2 = Data Loading
RL4.2.3 = Initial Load & Refresh Cycles
RL4.3
RL4.3.1 = Data Quality
CS4.1
CS4.1.1 = Explain concepts of ETL (T1, Ch 12)
CS4.1.2 = Why data requires pre-processing (R3, Ch 3)
CS4.2
CS4.2.1 = Major types of transformations (R3, Ch 3)
CS4.2.2 = Methods of applying data to the warehouse (T1, Ch 12)
SS4.1
Self Study  (T1, Chapter 12)
HW4.1
Do exercises given at the end of chapter 12 of T1

M5: Advanced Dimensional Modelling
Type
Description/Plan/Reference
RL5.1
RL5.1.1 = Changing Dimensions, Surrogate keys & Look up tables
RL5.1.2 = Mini-dimensions & Outriggers
RL5.2
RL5.2.1 = Time Dimension
RL5.2.2 = Conformed Dimensions & Facts
RL5.3
RL5.3.1 = Multi-valued Dimensions (Bridge & Helper Tables)
RL5.3.2 = Dimension Hierarchies
RL5.3.3 = Role-playing Dimensions
RL5.3.4 = Factless Fact Tables
RL5.4
RL5.4.1 = Academic Warehouse Case Study
RL5.5
RL5.5.1 = Bank Data Warehouse Case Study
CS5.1
CS5.1.1 = Illustrate problems of changing dimensions (T2, Ch 2)
CS5.1.2 = Techniques to handle changing dimensions (T2, Ch 2)
CS5.1.3 = Examples for mini-dimensions and outriggers (T2, Ch 2)
CS5.2
CS5.2.1 = Illustrate time dimension (T2, Ch 3)
CS5.2.2 = Explain enterprise bus architecture (T2, Ch 4)
CS5.2.3 = Illustrate bridge, role-playing dimensions, factless fact tables (T2, Ch 2)
SS5.1
Self Study  (T1, Chapter 11)
HW5.1
Do exercises given at the end of chapter 11 of T1


M6: OLAP & Multidimensional Databases (MDDB)
Type
Description/Plan/Reference
RL6.1
RL6.1.1 = Introduction to OLAP & Multidimensional Analysis
RL6.1.2 = Limitations of Spreadsheets & SQL
RL6.2
RL6.2.1 = Major OLAP Features & Functions
RL6.2.2 = Cube Operator
RL6.3
RL6.3.1 = Introduction to Multidimensional Databases (MDDB)
RL6.3.2 = Multidimensional Analysis & MDDBs
RL6.4
RL6.4.1 = OLAP operations using MDDBs
RL6.5
RL6.5.1 = MDDB vs. RDBMS
RL6.6
Cube Computation: Complexity & Optimization
CS6.1
CS6.1.1 = Explain concepts of OLAP                          (T1, Ch 15)
CS6.1.2 = Various OLAP operations with examles (R3, Ch 4)
CS6.1.3 = Relative strengths of MOLAP, ROLAP, and HOLAP (R3, Ch 4)
SS6.1
Self Study  (T1, Chapter 15)
HW6.1
Do exercises given at the end of chapter 15 of T1


M7: Query Performance Enhancing Techniques
Type
Description/Plan/Reference
RL7.1
RL7.1.1 = Aggregation
RL7.1.2 = Sparsity Failure
RL7.2
RL7.2.1 = Shrunken, Lost, & Collapsed Dimensions
RL7.3
RL7.3.1 = Aggregate Navigator
RL7.3.2 = Aggregate Navigation Algorithm
RL7.4
RL7.4.1 = Partitioning
RL7.4.1 = Partitioning wrt Time
RL7.5
RL7.5.1 = View Materialization
RL7.5.2 = Selection of Views to Materialize
RL7.6
RL7.6.1 = View Maintenance Strategies
RL7.6.2 = Incremental Maintenance Algorithms
RL7.7
RL7.7.1 = Bitmap Indices
RL7.7.2 = Bitmap Compression Strategies
CS7.1.
CS7.1.1 = Data Warehouse performance challenges(T1, Ch 18)
CS7.1.2 = Concepts of physical design (T1, Ch 18)
CS7.1.3 = Use of aggregates, partitions for performance and operations (R1, Ch 6, Ch 7)
SS7.1
Self Study  (T1, Chapter 18)
HW7.1
Do exercises given at the end of chapter 18 of T1

M8: Metadata
Type
Description/Plan/Reference
RL8.1
RL8.1.1 = Role of Metadata
RL8.1.2 = Types of Metadata
RL8.2
RL8.2.1 = Metadata Design & Implementation
CS8.1
CS8.1.1 = Concepts of Metadata (T1, Ch 9)
CS8.1.2 = Why Metadata becomes more important in DW (T1,Ch 9)
CS8.1.3 = Examples of Metadata (T1, Ch 9)
SS8.1
Self Study  (T1, Chapter 9)
HW8.1
Do exercises given at the end of chapter 9 of T1


M9: Support for Data Warehousing in RDBMS/SQL
Type
Description/Plan/Reference
RL9.1
RL9.1.1 = Support for Data Warehousing in RDBMSs
RL9.2
RL9.2.1 = New Features in SQL
CS9.1
CS9.1.1 = Commonality between DW and RDBMS(T1, Ch 1)
CS9.1.2 = SQL extensions for analytics (R4,Ch 2, Ch 3, Ch 18)
SS9.1
Self Study  (R4, Chapters 2, 3, 18)

M10: Real-Time Data Warehousing (RTDWH)
Type
Description/Plan/Reference
RL10.1
RL10.1.1 = Need for Real-time Data Warehousing
RL10.1.2 = Solutions for Real-time Data Warehousing
RL10.2
RL10.2.1 = Real-time ETL
RL10.2.2 = Role of ODS in RTDWH  
CS10.1
CS10.1.1 = Concepts of real-time solutions(T2, Ch 20)
CS10.1.2 = Challenges with real-time data warehousing (T2, Ch 20)
CS10.1.3 = ETL vs. ELT (R6, Ch 13)
SS10.1
Self Study  (T1, Chapter 3; T2, Chapter 20)


M11: Current Trends in Data Warehousing
Type
Description/Plan/Reference
RL11.1
RL11.1.1 = Introduction to World's Largest Data Warehouse
RL11.2
RL11.2.1 = Big Data Analytics
RL11.2.2 = Extended RDBMS Architecture
RL11.2.3 = MapReduce/Hadoop Architecture
CS11.1
CS11.1.1 = Recent trends in data warehousing (T1, Ch 3)
CS11.1.2 = Place of structured DW vs. Unstructured data, Big Data (T2,Ch 21)
CS11.1.3 = Possibilities of convergence of DW and Big Data (R5, Ch 13)
SS11.1
Self Study  (T1, Chapter 3)
HW11.1
Do exercises given at the end of chapter 3 of T1

Instructional Plan – I
There are 10 contact sessions; 5 before mid-semester and 5 post mid-semester. Each contact session is of two hour; typical gap between two contact sessions is on average around 2 weeks. In the tabular cells quote appropriate legends from the modular structure table
Session
Pre-contact prep
Contact session
Post-contact sessions
1
RL1.1, RL1.2
CS1.1, CS1.2
SS1.1, HW1.1
2
RL2.1, RL2.2
CS2.1, CS2.2
SS2.1, HW2.1
3
RL3.1, RL3.2
CS3.1, CS3.2
SS3.1, HW3.1
4
RL4.1, RL4.2, RL4.3
CS4.1, CS4.2
SS4.1, HW4.1
5
RL5.1, 5.2, 5.3, 5.4, 5.5
CS5.1, CS 5.2, Review of all topics covered so far
SS5.1, HW 5.1


Mid-Semester

6
RL6.1,6.2,6.3,6.4,6.5,6.6
CS6.1
SS6.1, HW6.1
7
RL7.1,7.2,7.3,7.4,7.5,7.6,7.7
CS7.1
SS7.1, HW7.1
8
RL8.1, RL8.2
CS8.1
SS8.1, HW8.1
9
RL9.1,9.2
CS9.1
SS9.1
10
RL10.1, RL10.2, RL11.1, RL11.2
CS 10.1, CS 11.1, Review of all topics covered
SS11.1, HW11.1


Comprehensive Exam






Evaluation Scheme:  
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No
Name
Type
Duration
Weight
Day, Date, Session, Time
EC-1
Quiz-I/ Assignment-I
Online
-
5%
August 26 to September 4, 2017

Quiz-II
Online

5%
September 26 to October 4, 2017

Quiz-III/ Assignment-II
Online

5%
October 20 to 30, 2017
EC-2
Mid-Semester Test
Closed Book
2 hours
35%
24/09/2017 (AN) 2 PM TO 4 PM
EC-3
Comprehensive Exam
Open Book
3 hours
50%
05/11/2017 (AN) 2 PM TO 5 PM


Important Information:
Syllabus for Mid-Semester Test (Closed Book): Topics in Contact Hours : 1 to 5
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 10)
Important links and information:
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn portal.
Evaluation Guidelines:
1.      EC-1 consists of either two Assignments or three Quizzes. Students will attempt them through the course pages on the Elearn portal. Announcements will be made on the portal, in a timely manner.
2.      For Closed Book tests: No books or reference material of any kind will be permitted.
3.      For Open Book exams: Use of books and any printed / written reference material (filed or bound) is permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4.      If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student should follow the procedure to apply for the Make-Up Test/Exam which will be made available on the Elearn portal. The Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self study schedule as given in the course handout, attend the online lectures, and take all the prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme provided in the handout




No comments:

Post a Comment