Sunday, June 14, 2020

Creating an OLAP Star Schema using Materialized View (Oracle DB 10g)



Database used for this demo is: Oracle 10g

Before we begin, we need to understand a few things:

1. Materialized View Log

In an Oracle database, a materialized view log is a table associated with the master table of a materialized view. When master table data undergoes DML changes (such as INSERT, UPDATE, or DELETE), the Oracle database stores rows describing those changes in the materialized view log. A materialized view log is similar to an AUDIT table and is the mechanism used to capture changes made to its related master table. Rows are automatically added to the Materialized View Log table when the master table changes. The Oracle database uses the materialized view log to refresh materialized views based on the master table. This process is called fast refresh and improves performance in the source database.

Ref: Materialized View Log (Oracle Docs)


Table 1: "SS_TIME"

CREATE TABLE "SS_TIME" ("ROW_IDENTIFIER" VARCHAR2(500), "TIME_LEVEL" VARCHAR2(500), "END_DATE" DATE, "TIME_SPAN_IN_DAYS" NUMBER); 

We are going to populate this table using a PL/SQL script.

DECLARE
  start_year number;
  end_year number;
  no_of_days_in_year number;
  date_temp date;
BEGIN
  start_year := 1996;
  end_year := 1997;
  no_of_days_in_year := 365;
  
  /* Inserting rows with level 'year' */
  /*
  FOR i IN start_year .. end_year LOOP
    no_of_days_in_year := MOD(i, 4);
    IF  no_of_days_in_year = 0 THEN
      no_of_days_in_year := 366;
    ELSE
      no_of_days_in_year := 365;
    END IF;
    insert into ss_time values(i, 'YEAR', add_months(to_date(start_year || '-JAN-01', 'YYYY-MON-DD'), 12) - 1, no_of_days_in_year);
  END LOOP;
  */
  
  /* Inserting rows with level 'MONTH' */ 
  FOR i IN start_year .. end_year LOOP
    FOR j in 0..11 LOOP
      date_temp := add_months(to_date(i || '-JAN-01', 'YYYY-MON-DD'), j) - 1;
      INSERT INTO ss_time VALUES(to_char(date_temp, 'MON') || i, 'MONTH', date_temp, extract(day from last_day(date_temp)));
    END LOOP;
  END LOOP;
END; 

Once we run the above script, we get a data as shown below:

Next, we have following tables: -- CustomerID CustomerName ContactName Address City PostalCode Country create table ss_customers (customerid number primary key, customername varchar2(1000), contactname varchar2(1000), address varchar2(1000), city varchar2(1000), postalcode varchar2(1000), country varchar2(1000)); -- OrderID CustomerID EmployeeID OrderDate ShipperID create table ss_orders (orderid number primary key, customerid number, employeeid number, orderdate date, shipperid number); -- ProductID ProductName SupplierID CategoryID Unit Price create table ss_products (productid number primary key, productname varchar2(1000), supplierid number, categoryid number, unit varchar2(1000), price number); -- CategoryID CategoryName Description create table ss_categories (categoryid number primary key, categoryname varchar2(1000), description varchar2(1000)); -- OrderDetailID OrderID ProductID Quantity create table ss_orderdetails (orderdetailid number, orderid number, productid number, quantity number); Next, we run insert statements for 'SS_CUSTOMERS' using an SQL file. The file is "Insert Statements for ss_customers.sql", and has contents: SET DEFINE OFF; insert into ss_customers values(1, 'Alfreds Futterkiste', 'Maria Anders', 'Obere Str. 57', 'Berlin', '12209', 'Germany'); insert into ss_customers values (2,'Ana Trujillo Emparedados y helados','Ana Trujillo','Avda. de la Constitucion 2222','Mexico D.F.','05021','Mexico'); insert into ss_customers values (3,'Antonio Moreno Taqueria','Antonio Moreno','Mataderos 2312','Mexico D.F.','05023','Mexico'); ... commmit; This file is ran from Command Prompt as shown below:
You can view the contents of the table in 'SQL Developer' or 'Oracle 10g Apex Application':
Similarly for "ss_orders":
Similarly for 'ss_products':
Similarly for "ss_categories":
Similarly for "ss_orderdetails":
Testing: select * from ss_orders; --GIVES THE NUMBER OF SALES select * from ss_customers; --GIVES THE COUNTRY WHERE SALES WERE MADE select * from ss_products; select * from ss_categories; select * from ss_time; select * from ss_orderdetails; select decode(sum(customerid),null,0,sum(customerid)) as cust_count from ss_customers where city='Delhi'; Our OLAP Star Schema will have: THREE DIMENSIONS: TIME, PRODUCT CATEGORIES (SINGLE VALUED DIMENSION), COUNTRY (SINGLE VALUED DIMENSION) AND ONE MEASURE: NUMBER OF ORDERS Query: select tim.row_identifier as time_interval, cust.country, cat.categoryname, count(ord.orderid) from ss_time tim, ss_customers cust, ss_categories cat, ss_orders ord, ss_orderdetails odd, ss_products prd where ord.customerid = cust.customerid and odd.productid = prd.productid and prd.categoryid = cat.categoryid and tim.end_date >= ord.orderdate and (tim.end_date - tim.time_span_in_days + 1) <= ord.orderdate group by tim.row_identifier, cust.country, cat.categoryname; OUTPUT:
Another Test Query: select tim.row_identifier as time_interval, ord.orderid from ss_time tim, ss_orders ord where tim.end_date >= ord.orderdate and (tim.end_date - tim.time_span_in_days + 1) <= ord.orderdate; Next, we create "Materialized View Log(s)": CREATE MATERIALIZED VIEW LOG ON SS_PRODUCTS WITH PRIMARY KEY, ROWID (categoryid) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON SS_ORDERDETAILS WITH ROWID (productid) INCLUDING NEW VALUES; /* The CREATE MATERIALIZED VIEW LOG command was issued with the WITH PRIMARY KEY option and the master table did not contain a primary key constraint or the constraint was disabled. Reissue the command using only the WITH ROWID option, create a primary key constraint on the master table, or enable an existing primary key constraint. */ CREATE MATERIALIZED VIEW LOG ON SS_ORDERS WITH PRIMARY KEY, ROWID (customerid, orderdate) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON SS_CATEGORIES WITH PRIMARY KEY, ROWID (categoryname) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON SS_CUSTOMERS WITH PRIMARY KEY, ROWID (country) INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON SS_TIME WITH ROWID (row_identifier, end_date, time_span_in_days) INCLUDING NEW VALUES; -- DROP MATERIALIZED VIEW LOG ON ss_time; Next, we create the 'Materialized View': CREATE MATERIALIZED VIEW ss_mv_total_sales_fact_table BUILD IMMEDIATE REFRESH FAST ON COMMIT ENABLE QUERY REWRITE AS SELECT tim.row_identifier AS time_interval, cust.country, cat.categoryname, count(ord.orderid) as count_orders FROM ss_time tim, ss_customers cust, ss_categories cat, ss_orders ord, ss_orderdetails odd, ss_products prd WHERE ord.customerid = cust.customerid AND odd.productid = prd.productid AND prd.categoryid = cat.categoryid AND tim.end_date >= ord.orderdate AND (tim.end_date - tim.time_span_in_days + 1) <= ord.orderdate GROUP BY tim.row_identifier, cust.country, cat.categoryname; Testing: SELECT * FROM ss_mv_total_sales_fact_table;
SELECT * FROM ss_mv_total_sales_fact_table where country = 'Germany' order by categoryname;
SELECT * FROM ss_mv_total_sales_fact_table where country = 'Germany' and time_interval = '1996'; -- No data found. SELECT * FROM ss_mv_total_sales_fact_table where country = 'Germany' and time_interval like '%1996%' order by categoryname;
All code files are here: Google Drive

No comments:

Post a Comment