Saturday 1 March 2014

Scenario 3: How to load multi-format (Delimited) column Flat File source data into two different targets?

Scenario 3: How to load multi-format (Delimited)  column Flat File source data into two different targets?

Ex:
cust_id,cname,phone,state_id
101,AMITH,9897045679,201
102,JAGAN,9908907869,203
103,MAHESH,890899089,201
state_id||Sname||Region
201||Karntaka||South
202||Delhi||North
203||Maharashtra||West

Theory/Analysis:
Read complete file as fixed width file. Remove first line through Start Importing row at option while Importing Source File metadata to informatica. First use REPLACESTR() function to replace '||' to ',' then use SUBSTR() till first Comma position. Check whether it's is number or not through ISNUMBER().
Assign and id for each and every record; Use Router to Route data according to ID less than/greater than Mapping Variable. Use SUBSTR, INSTR to split each record for columns.

Practical:

Step 1: Create a Mapping Variable $$SPLIT (Integer) Count

Step 2: Read File as Fixed Width (Though it's comma delimited)

Step 3:
Expression
 Create Total Eight Port
            RECORDS(Variable Port) = REPLACESTR(0,RECORDS_IN,'||', ',')
            COLUMN1 (Variable Port) = SUBSTR(RECORDS,1,INSTR(RECORDS, ',',1, 1)-1)
            Rec_ID (Variable Port) = OLD_ID + 1
            OLD_ID (Variable Port) = Rec_ID
            SPLIT_REC (Variable Port) = IS_NUMBER(COLUMN1)
            Var_Assign (Variable Port) = IIF(SPLIT_REC= 0, SETVARIABLE($$SPLIT, Rec_ID))
            Out_Rec_ID (Output Port) = Rec_ID
            Records_OUT(Output Port) = RECORDS

Note: RECORDS_IN is Input Port and coming from Source Qualifier.

Step 4:
Router:
 Create two Group
            Cust     Rec_ID < $$SPLIT
            State     Rec_ID > $$SPLIT

Step 5:
Expression:
Create Four Port
            CUST_ID (Output Port) = SUBSTR(RECORDS1,1,INSTR(RECORDS1, ',',1, 1)-1)
            CNAME (Output Port) = SUBSTR(RECORDS1, (INSTR(RECORDS1, ',',1, 1)+1), ((INSTR(RECORDS1, ',',1, 2)- INSTR(RECORDS1, ',',1, 1)-1)))
            PHONE (Output Port) = SUBSTR(RECORDS1, (INSTR(RECORDS1, ',',1, 2)+1),  ((INSTR(RECORDS1, ',',1, 3)- INSTR(RECORDS1, ',',1, 2)-1)))
            STATE_ID (Output Port) = SUBSTR(RECORDS1, (INSTR(RECORDS1, ',',1, 3)+1))

Step 6:
Expression: 
Create Three Port
            STATE_ID (Output Port) = SUBSTR(RECORDS3,1,INSTR(RECORDS3, ',',1, 1)-1)
            STATE (Output Port) = SUBSTR(RECORDS3, (INSTR(RECORDS3, ',',1, 1)+1),  ((INSTR(RECORDS3, ',',1, 2)- INSTR(RECORDS3, ',',1, 1)-1)))
            REGION (Output Port) = SUBSTR(RECORDS3, (INSTR(RECORDS3, ',',1, 2)+1))

BINGO!!!



Feel free to post your doubt in comment section.

Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)

If you like the approach and solution feel free to share with others.

Thanks.
hpandey.harsh@gmail.com

14 comments:

  1. I have a doubt that why u have used router and mapping variable .
    Can we derive all the expression in a single expression transformation rather then using router and two different exp transformation.

    ReplyDelete
    Replies
    1. The need of mapping variable and router is to route the data to two different target. If you see the source file we are getting customer and state information in single flat file with different delimiter and with the first expression we finding the row from where we have to split the record for second target and assigning it to Mapping variable and through router we are splitting record flow in two group one for customer and second one for state.

      Hope it would be clear now. If not feel free to ask you doubt.

      Thanks
      hpandey.harsh@gmail.com

      Delete
  2. In 1st expression transformation u have mentioned Var_Assign= IIF(SPLIT_REC= 0, SETVARIABLE($$SPLIT, Rec_ID)) , what is split_rec
    is it a variable port or what please explain me the above expression and also the router condition

    Thanks In Advance ,

    ReplyDelete
    Replies
    1. I am sorry Rajat, My typing mistake creates confusion. I have corrected it now.

      Yes, SPLIT_REC is variable port and in this we evaluating whether first column value is number or not. If first column value is not number then it will store 0 and if SPLIT_REC is 0 then assign that record_id to variable and in router I am routing record like this if rec_id < Variable value then it's customer data. If rec_id > Variable value then State data.

      I would suggest instead of only reading try to implement it practically and use debugger to analyze record. It would be more clear by that way.

      Thanks.

      Delete
  3. HI,

    what about false part of below IIF

    Var_Assign (Variable Port) = IIF(SPLIT_REC= 0, SETVARIABLE($$SPLIT, Rec_ID))

    ReplyDelete
  4. This is brilliant...took me a while to understand that we are just looking for the heading of the 2nd set of recs to set the value of the mapping variable $$SPLIT...got it eventually...Cheers!!!!

    ReplyDelete
  5. How to achieve if we have n number of dynamic files with different file structure and need to load to a single staging table which should be created during the runtime of the workflow.

    ReplyDelete
  6. Thank you very much ! This is very useful for beginners like me in self learning

    ReplyDelete
  7. Hi,

    When we have two delimiters , we can use a pre-session unix command like sed 's/||/,/g' input.txt > tmp.txt


    which standardize the delimiter .

    then we can simply make it a comma delimited file and split columns to multiple targets.

    ReplyDelete
  8. Hi Harsh,
    First of all I am total beginner in Informatica, and I have the following message after saving a mapping: Transformation: EXPTRANS Field: Rec_ID
    <> [+]: operand cannot be converted to a number
    ... >>>>OLD_ID<<<< + 1

    Could You help me with this?
    Many Thanks

    ReplyDelete
    Replies
    1. Hi Karol,

      There are two possibilities for this error.
      1. Check the Data Type of the port. It should be INTEGER.
      2. Check wheather OLD_ID variable port is available or not. OLD_ID port should must be available which is 4th port in Expression and it is also variable port.
      Hope it will help.
      Thanks.

      Delete
    2. Hi Harsh,
      Great, first of Your solution was correct. I've
      also change to INTEGER ports in Router because it was similiar problem. But it is still the same with port Var_Assign though I've change datatype to integer:
      <> [=]: function cannot resolve operands of ambiguously mismatching types.
      ... IIF(SPLIT_REC=>>>>0<<<<, SETVARIABLE($$SPLIT, Rec_ID))
      Thanks

      Delete
  9. Source: Multi record format
    Target: 3 tables Base, J1, J2

    1 row contains multiple fixed width records and dynamic in nature, need to load data into target tables.

    Eg:- (Base Record 100 length [c1 string 50, c2 string 50] +J1 Record 200 length [c1 string 50, c2 string 50, c3 string50, c4 string 50]+J2 Record length 150 [c1 string 50, c2 string 100])

    Base Record + J1 + J2+ J1+ J2+J1 -> 1 Record goes to base table, 3 records goes to J1 table , 2 records goes to J2 table
    Base Record + J2 + J2+ J1+ J2+J1 -> 1 Record goes to base table, 2 records goes to J1 table , 3 records goes to J2 table


    ReplyDelete