Tuesday 4 March 2014

Scenario 5A: How to Convert Column To Row (2)

Scenario 5A: How to Achieve Target below...

(Source)
ID, STATES
1,BNG,CHE,HYBD
2,PUNE,MUMBAI
3,DELHI

(Target)
1,BNG
1,CHE
1,HYBD
2,PUNE
2,MUMBAI
3,DELHI

Let's Chat: Ya, I know it's look little bit same as Scenario 5, but you can't feel the heat until you are in Hot Seat :-) Whenever you get scenario for converting Row to Column or Column to Row always analyse Source and Target because Solution for this kind of scenarios are always close circuit solution, means solution will work only for given format of source. It will behave differently you have different format of source.

Analyse/Theory:
Here, actually we have total four columns values which are coming in two Columns. Ok No issue we will read complete row as fixed width and we have substr(), instr() function to split second column value to three column value and Normalizer for converting column to row.
Second challenge:
 1st row have Three column value
 2nd row have two columns value
3rd row have only one columns value
So, same substr(), instr() logic to split column value won't work for all rows :-( (No issue we will split source into three pipelines)
Few more small challenges are there that we will see practically.

Practically:

Step 1A: Read your source as fixed width file.

Step 1: 
Expression: (For Splitting in different flow based upon column value available in the row)
Difference= (Output Port) = IIF(LENGTH(ID__STATES)-LENGTH(REPLACECHR(0,ID__STATES,',',''))=3, 3,IIF(LENGTH(ID__STATES)-LENGTH(REPLACECHR(0,ID__STATES,',',''))=2, 2,IIF(LENGTH(ID__STATES)-LENGTH(REPLACECHR(0,ID__STATES,',',''))=1,1)))

Step 2:
Router: Create Three Group
Three_Column--> Difference=3
Two_Column-->Difference=2
One_Column-->Difference=1

Step 3:
Expression 1 (Split records for all four columns, including ID)
ID (Output Port)=SUBSTR(ID__STATES1, 1, INSTR(ID__STATES1, ',',1,1)-1)
State 1 (Output Port)=SUBSTR(ID__STATES1, (INSTR(ID__STATES1, ',',1, 1)+1), ((INSTR(ID__STATES1, ',',1, 2)- INSTR(ID__STATES1, ',',1, 1)-1)))
State 2 (Output Port)=SUBSTR(ID__STATES1, (INSTR(ID__STATES1, ',',1, 2)+1),  ((INSTR(ID__STATES1, ',',1, 3)- INSTR(ID__STATES1, ',',1, 2)-1)))
State 3 (Output Port)=SUBSTR(ID__STATES1, (INSTR(ID__STATES1, ',',1, 3)+1))

Step 4
Expression 2 (Split records for all three columns, including ID)
ID (Output Port)=SUBSTR(ID__STATES3, 1, INSTR(ID__STATES3, ',',1,1)-1)
Sate 1 (Output Port)=SUBSTR(ID__STATES3, (INSTR(ID__STATES3, ',',1, 1)+1), ((INSTR(ID__STATES3, ',',1, 2)- INSTR(ID__STATES3, ',',1, 1)-1)))
State 2 (Output Port)=SUBSTR(ID__STATES3, (INSTR(ID__STATES3, ',',1, 2)+1))
State 3 (Output Port)=NULL

Step 5
Expression 3 (Split records for all two columns, including ID)
ID (Output Port)=SUBSTR(ID__STATES4, 1, INSTR(ID__STATES4, ',',1,1)-1)
Sate 1 (Output Port)=SUBSTR(ID__STATES4, (INSTR(ID__STATES4, ',',1, 1)+1))
State 2 (Output Port)=NULL
State 3 (Output Port)=NULL

Step 6
Union Transformation (To Combine all three pipeline)
Group Port (Create three Group)
Three_Column-->Connect from Expression 1
Two_Column-->Connect from Expression 2
One_Column-->Connect from Expression 3

Step 7
Normalizer Transformation
Normalizer Tab (Create Two Port)
ID       (Set Occurs)=0
State   (Set Occurs)=3

Step 8
Filter Transformation
(We need this because normalizer will return three records for each and every records and due to this we will get null for two/one columns value row)
Filter Condition=NOT ISNULL(STATES)

Step 9
Connect Filter Transformation to your Target

BINGO!!!




Feel free to post your doubt in comment section.

Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)

If you like the approach and solution feel free to share with others.

Thanks.

hpandey.harsh@gmail.com

1 comment: