Tuesday 24 June 2014

Scenario 25: How to remove duplicate records and load only non duplicate records without using Aggregator Transformation.

Scenario 25: How to remove duplicate records and load only non duplicate records without using Aggregator Transformation.

Source
COL1
1
1
2
3
3
4

Target
COL1
2
4

Theory: The issue is grouping all duplicate records then eliminating. We will split Distinct and Duplicate records and once again we will join those two pipeline and eliminate all records which are matching from both pipeline.

Practically: 
Step 1: 
Drag and Drop your source and target.

Step 2:
Sorter
Sort the data based upon COL1

COL1 (Key Port)

Step 3:
Expression
Create Four Ports

NEW_REC_FLAG (Variable Port)= IIF(OLD_REC=col1, 2, 1)

NEW_REC (Variable Port)= IIF(col1!=OLD_REC, col1, OLD_REC)

OLD_REC (Variable Port)=NEW_REC

OUT_REC_FLAG (Output Port)= NEW_REC_FLAG

Step 4
Router
Create Two Group

New_Rec: OUT_REC_FLAG=1
Dup_Rec: OUT_REC_FLAG=2


Step 5
Sorter
Connect Dup_Rec group.

Select OUT_REC_FLAG3 as KEY Port.

Step 6
Joiner

Connect New_Rec Group to Joiner.
Connect ports from Sorter Transformation to Joiner

Join Type: Master Outer Join (Ports connected from sorter should be master port)

Condition Tab:
MASTER OPERATOR DETAIL
COL11      =                 COL13

Note: In properties tab select Sorted Input as we are joining two pipeline originated from same source qualifier.

Step 7
Filter
Connect all ports from Joiner to Filter

Filter Condition=ISNULL(COL13)

Step 8
Connect COL11 to Target from Filter.

BINGO!!!


Feel free to post your doubt in comment section or though contact form which you can find below the page.

Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)

If you like the approach and solution feel free to share with others.

Thanks.
hpandey.harsh@gmail.com

9 comments:

  1. Hi Harsh,
    Sorry i am wrong,dont u think the records that you should load have to be 2,4 and not 2,3?
    please correct me if i am wrong!!

    ReplyDelete
    Replies
    1. Thank You very much Sumit for pointing my typo error. Yes, it should be 2 and 4. I am on it will update it soon. Thanks

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi Harsh,
    Also i could not understand one thing.
    We need to copy all the records first from col1 to old_rec then only we can refer to those records..i mean match..am i correct?
    if yes, then if we use the condition which u used above i.e. IIF(col1!=OLD_REC, col1, OLD_REC), old_rec will have all the records which have their presence in col1.but ideally old_rec should be empty for that.
    Please correct me if i am wrong.

    ReplyDelete
    Replies
    1. Frankly, I didn't get your question.

      Same suggestion again: Implement it practically your doubt will be clear. Only reading and coming to conclusion it's too difficult and confusing.

      Let me know if you are getting error in implementation or some other approach you are trying.

      Thanks.

      Delete
  4. Hi Thanks for the nice illustration.

    ReplyDelete
  5. Thank you so much for providing information about informatica and other some practically implied things.

    Informatica Read Json


    ReplyDelete