SCD and its implementation in Datasage

Datawarehouse is a large storage of data derived from a multiple or different sources and this stored data is used to guide or make management decisions for a particular company.
Now to make these management decisions, one needs to track the changes in Dimensions attributes in order to report historical data and these dimensions that change slowly over time rather than changing frequently are called slowly changing dimensions i.e. SCD.
Below is the short descriptions of types of SCD.
Type 1 – SCD type 1 methodology is used when there is no need to store historical data in the dimension table. We can say it’s a passive method and it overwrites the old dimension value with new one in database. It is basically used to correct data in dimension table. To delete some characters, to correct spelling.
Type 2 – in SCD type 2, all the history of the dimension table is kept in the database. This history is kept in the format of FLAG, Effective Date or Versions.
Type 3 – In SCD Type 3, only current and previous values of particular column kept in database for the dimension table. This is very rarely used methodology as you need to create extra value for each and every column to store previous value of that particular column.
Type 4 – in SCD type 4, a separate table is used to store the history or to track the change in any attributes of the dimension table.
This is the summary of the SCD types. Now let’s see how to implement this in Datastage step by step.
Step 1 –
Create your job structure like below

Two sequential files, one for Fact data and other for Dimension data which will be in reference Sequential file from above diagram. Targets used here are Dataset files, one for Fact load and one for dimension load
Step – 2
Now let consider the fact and dimension data as below

If you have noticed, location of Ryan is updated in fact table and new record named Robert is inserted in Fact table.
Step 3 – Configure slowly changing dimension stage
Now to configure SCD properties, open the SCD stage and access the Fast path

Step 4 –
Click on next to access 2nd step out of 5 for SCD
In this stage we need to give purpose of each column from the dimension table as shown below.
It will perform the look up on Dimension table on the basis of Customer Key. So map ID column from Fact table to Customer_key from dimension table as shown below.

Also now let’s see the purpose column from below figure in details.
Surrogate key – It’s a key generated or associated for new dimension record
Business Key – This the key column which is used to compare Facts and dimension table or on which look up is performed.
Name & Location – These are the changing dimensions so marked as type 2
Validity – it’s the indicator.
StartDate and EndDate – Effective and expiry date for the changed dimension of the key column.

Step 5 –
In this stage, one need to specify the path of the surrogate file which is created.
I have created it separately and just mentioned the path here.

To generate surrogate key flat file you need to create one using Surrogate Key Generator stage with the below properties configured.

Step 6 –
In next stage, you need to configure DimensionLoad as shown below.
Here expired record Validity indicator will be set to ‘N’ and date will be the current date.

Step 7 – Compile and run the job.
Let’s check the DimensionLoad as I get below one.
If you remember the original dimension table, record associated with Custom Key and Surrogate Key 3 is set to Validity ‘N’ and StartDate, EndDate updated accordingly. Also new surrogate key equal to 1 is assigned to updated records ‘Ryan’.
Robert is added with new customer key and Surrogate key with Validity ‘Y’

Now Fact table as below,
It will be same as original but surrogate key column is added to link it with the dimension table.

So our purpose served.
Hope this will be helpful for you to understand SCD and its implementation. Please comment below for any queries or suggestions

Leave a Reply

© 2018 Database ETL. All rights reserved.