BITS WILP Data Warehousing Mid-Sem Exam 2017-H1


BITS WILP Data Warehousing Mid-Sem Exam 2017-H1
Data Warehousing (SSZG515) EC2R SEM2 2016-17 (Closed Book)

Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division
Second Semester 2016-2017

Mid-Semester Test
(EC-2 Regular)

Course No.                  : SS ZG515  
Course Title                 : DATA WAREHOUSING  

Nature of Exam           : Closed Book
Weightage                    : 35%
Duration                      : 2 Hours 
Date of Exam              : 26/02/2017    (AN)
No. of pages: 1
No. of quesetions: 6
Note:
1.       Please follow all the Instructions to Candidates given on the cover page of the answer book.
2.       All parts of a question should be answered consecutively. Each answer should start from a fresh page. 
3.       Assumptions made if any, should be stated clearly at the beginning of your answer.

Q.1.  What are the advantages and disadvantages of having finest granularity data in the data warehouse and data marts?    [6]

Q.2.        Why it is recommended that we maintain separate systems for OLTP and Data Warehousing in an organization? Does it make sense to have a single system? Justify your answer.  [6]
 
Q.3.     Design a data mart for analyzing attendance of students in courses in BITS - a multi-campus university. Show all the dimensions involved and the arrange them in a star schema. Show some examples tuples from fact table, assuming finest granularity.               [6]
Answer 3:
Data Warehouse Design Steps:
Step 1: Identify the Business Process
Step 2: Declare the Grain
Step 3: Identify the Dimensions
Step 4: Identify the Facts

Q.4.        (a) Explain how surrogate keys help in saving space in a Data Warehouse.
(b) Why there is a need to store the natural keys along with the surrogate keys in the dimension tables?       [3 + 3 = 6]

Q.5.        Discuss the role of views in a data warehouse environment. Give three situations where we use views. [6]
Answer 5: (From L7-P1-Lec11)
Dimension Role Playing:
l  A single table that plays multiple roles (using views) to create synonym dimension attributes.
l  Most common role playing dimension is the Date Dimension. I.e. separate role playing dimensions for order date and ship date.
l  What to do when a single dimension appears several times in the same fact table?
l  Consider a fact table to record the status and final disposition of a customer order
l  Dimensions of this table could be Order Date, Packaging Date, Shipping Date, Delivery Date, Payment Date, Return Date, Refer to Collection Date, Order Status, Customer, Product, Warehouse, and Promotion.
l  Note that the first 7 dimensions are all time
l  7 FKs from the FT to the time dimension!!
l  We can not join these 7 FKs to the same table
l  SQL would interpret such a seven-way simultaneous join as requiring that all of the dates be the same 
l  Is this what we want?
l  We need to make SQL believe that there are 7 independent time dimension tables
l  The column labels in each of these tables should also be different!
l  WHY?
l  We will not be able to tell the columns apart if several of them have been dragged into a report
l  How can we do this?
l  We cannot literally use a single time table
l  But we still want to build & maintain single time table behind the scenes
l  Create an illusion for the user
l  Make 7 identical physical copies of the time table
l  Make 7 “virtual” copies using the SQL’s SYNONYM command
l  Once these clones are in place, we still need to define a SQL view on each copy in order to make the field names uniquely different.

Use of ‘View’ as ‘Role playing Dimension’:
Another use of ‘Views’:
Figure 20-1 Cube and Views by Different Users

Q.6.        What are mini-dimensions? Give two situations which warrant the use of mini-dimensions. [5]
Answer 6:
Mini-Dimensions
¡  Single technique to handle browsing-performance & change tracking problems
¡  Separate out frequently analyzed or frequently changing attributes into a separate dimension, called mini-dimension
¡   
Demographic Key       AGE    GENDER       INCOME LEVEL
1                                  20-24   M                     < 20000
2                                  20-24   M                     20K-24999
3                                  20-24   M                     25K-29999
18                                25-29   M                     20K-24999
10                                25-29   M                     25K-29999

¡  Mini-dimension can not be itself allowed to grow very large
¡  5 demographic attributes
¡  Each attribute can take 10 distinct values
¡  How many rows in mini-dimension? Ans: 10,000
¡  Separate out a package of demographic attributes into a demographic mini-dimension
¡  Age, gender, marital status, no. of children, income level, etc.
¡  One row in mini-dimension for each unique combination of these attributes.
¡  Example:

• Advantages
            – frequent ‘snapshoting’ of customers profiles with no increase in data storage or data complexity
• Drawbacks
            – the demographic attributes are clumped into banded ranges of discrete values (it is impractical to change the set of value bands at a later time)
            – the demographic dimension itself can not be allowed to grow too large
            – slows down the browsing
• What if the fact table (connecting the demographic mini-dimension with the customer dimension) is sparse?
• What to do if the fact table (connecting the demographic mini-dimension with the customer dimension) is sparse?
            – Define a demographic transaction event, i.e., introduce a new fact table
            or
            – Add a current demographic key to the customer dimension table


********

Tags: BITS Pilani, Work Integrated Learning Program (WILP), Previous Year Question Papers, Data Warehousing

No comments:

Post a Comment