Scenario 25: How to remove duplicate records and load only non duplicate records without using Aggregator Transformation.
Source
COL1
1
1
2
3
3
4
Target
COL1
2
4
Theory: The issue is grouping all duplicate records then eliminating. We will split Distinct and Duplicate records and once again we will join those two pipeline and eliminate all records which are matching from both pipeline.
Practically:
Step 1:
Drag and Drop your source and target.
Step 2:
Sorter
Sort the data based upon COL1
COL1 (Key Port)
Step 3:
Expression
Create Four Ports
NEW_REC_FLAG (Variable Port)= IIF(OLD_REC=col1, 2, 1)
NEW_REC (Variable Port)= IIF(col1!=OLD_REC, col1, OLD_REC)
OLD_REC (Variable Port)=NEW_REC
OUT_REC_FLAG (Output Port)= NEW_REC_FLAG
Step 4
Router
Create Two Group
New_Rec: OUT_REC_FLAG=1
Dup_Rec: OUT_REC_FLAG=2
Step 5
Sorter
Connect Dup_Rec group.
Select OUT_REC_FLAG3 as KEY Port.
Step 6
Joiner
Connect New_Rec Group to Joiner.
Connect ports from Sorter Transformation to Joiner
Join Type: Master Outer Join (Ports connected from sorter should be master port)
Condition Tab:
MASTER OPERATOR DETAIL
COL11 = COL13
Note: In properties tab select Sorted Input as we are joining two pipeline originated from same source qualifier.
Step 7
Filter
Connect all ports from Joiner to Filter
Filter Condition=ISNULL(COL13)
Step 8
Connect COL11 to Target from Filter.
BINGO!!!
Feel free to post your doubt in comment section or though contact form which you can find below the page.
Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)
If you like the approach and solution feel free to share with others.
Thanks.
hpandey.harsh@gmail.com
Source
COL1
1
1
2
3
3
4
Target
COL1
2
4
Theory: The issue is grouping all duplicate records then eliminating. We will split Distinct and Duplicate records and once again we will join those two pipeline and eliminate all records which are matching from both pipeline.
Practically:
Step 1:
Drag and Drop your source and target.
Step 2:
Sorter
Sort the data based upon COL1
COL1 (Key Port)
Step 3:
Expression
Create Four Ports
NEW_REC_FLAG (Variable Port)= IIF(OLD_REC=col1, 2, 1)
NEW_REC (Variable Port)= IIF(col1!=OLD_REC, col1, OLD_REC)
OLD_REC (Variable Port)=NEW_REC
OUT_REC_FLAG (Output Port)= NEW_REC_FLAG
Step 4
Router
Create Two Group
New_Rec: OUT_REC_FLAG=1
Dup_Rec: OUT_REC_FLAG=2
Step 5
Sorter
Connect Dup_Rec group.
Select OUT_REC_FLAG3 as KEY Port.
Step 6
Joiner
Connect New_Rec Group to Joiner.
Connect ports from Sorter Transformation to Joiner
Join Type: Master Outer Join (Ports connected from sorter should be master port)
Condition Tab:
MASTER OPERATOR DETAIL
COL11 = COL13
Note: In properties tab select Sorted Input as we are joining two pipeline originated from same source qualifier.
Step 7
Filter
Connect all ports from Joiner to Filter
Filter Condition=ISNULL(COL13)
Step 8
Connect COL11 to Target from Filter.
BINGO!!!
Feel free to post your doubt in comment section or though contact form which you can find below the page.
Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)
If you like the approach and solution feel free to share with others.
Thanks.
hpandey.harsh@gmail.com
Hi Harsh,
ReplyDeleteSorry i am wrong,dont u think the records that you should load have to be 2,4 and not 2,3?
please correct me if i am wrong!!
Thank You very much Sumit for pointing my typo error. Yes, it should be 2 and 4. I am on it will update it soon. Thanks
DeleteThis comment has been removed by the author.
ReplyDeleteHi Harsh,
ReplyDeleteAlso i could not understand one thing.
We need to copy all the records first from col1 to old_rec then only we can refer to those records..i mean match..am i correct?
if yes, then if we use the condition which u used above i.e. IIF(col1!=OLD_REC, col1, OLD_REC), old_rec will have all the records which have their presence in col1.but ideally old_rec should be empty for that.
Please correct me if i am wrong.
Frankly, I didn't get your question.
DeleteSame suggestion again: Implement it practically your doubt will be clear. Only reading and coming to conclusion it's too difficult and confusing.
Let me know if you are getting error in implementation or some other approach you are trying.
Thanks.
nice approach....
ReplyDeleteThanks Sheetal.
DeleteHi Thanks for the nice illustration.
ReplyDeleteThank you so much for providing information about informatica and other some practically implied things.
ReplyDeleteInformatica Read Json