Learn Informatica Practically : Scenario 16: How to handle comma in a comma delimited file?

Saturday, 22 March 2014

Scenario 16: How to handle comma in a comma delimited file?

Scenario 16: How to handle comma in a comma delimited file?

Theory Analysis: There are multiple cases in this. We will examine each case one by one. First of all, we need to do data profiling for our source file.

What is Data Profiling?
Data Profiling is a validation process through which we are analyzing the data whether it's fit for source or not. If not then what changes needs to be done to make it fit for source or even sometimes we need to push it back to client with reason that data is not in a format which would be treated like a source as per requirement.

CASE 1: More than one Comma [,] is appearing just after delimiter [,]
CASE 2: Comma is [,] appearing in a data itself. Ex: Address column has data like this 'CA,USA'
CASE 3: Any other Char like [+] Char is appearing anywhere in data. [,]

Practically

CASE 1: More than one Comma [,] is appearing just after delimiter [,]

Data
CID, CNAME, CITY, CNTRY, PHONE NUMBER
1234,,,SMITH,CA,USA,,,9845679000
1235,JOHN,NY,USA,9996805944
1236,NORMA,,,CA,USA,,,7804567893
1237,CARL,,,NY,USA,889950493

Solution: It's very easy to handle. Informatica has already provided an option to handle this case.

While Importing File metadata of source file in Flat File Import wizard Step 2 of 3 Check Treat consecutive delimiter as one

Now, it's good to go ahead with source file.

CASE 2: Comma is [,] appearing in a data itself. Ex: Address column has data like this 'CA,USA'

Data
CID, CNAME, ADD, PHONE_NUMBER
1234,SMITH,CA,USA,9845679000
1235,JOHN,NY,USA,9996805944
1236,NORMA,CA,USA,7804567893
1237,CARL,NY,USA,8899504934

Solution: There are two approach to resolve this issue.

Approach 1: Through Script modify your source and put Single Quotes or Double Quotes for each and every value.

Data After Modification
CID, CNAME, ADD, PHONE_NUMBER
'1234','SMITH','CA,USA','9845679000'
'1235','JOHN','NY,USA','9996805944'
'1236','NORMA','CA,USA','7804567893'
'1237','CARL','NY,USA','8899504934'

While Importing File metadata of in Flat File Import wizard Step 2 of 3 select Radio Button of Text Qualifier section Single Quotes. (If you have modified your source with double quotes then please select double quotes)

Approach 2: Read complete file as fixed width and in the expression with the help of SUBSTR() and INSTR() function split into columns. Only one thing we need to take care while splitting third column data we have to start from second occurrence of comma till fouth occurrence of comma.
(In many other scenarios we have already completed this.)

A real time Secret :-) We are creating Reusable transformation for splitting records into column based upon delimited char and we are using that whenever we need.
A second case where we are dependent upon reusable transformation is for removing unwanted space from data while reading from files. Yes, LTRM(RTRIM())

CASE 3: Any other Char like [+] Char is appearing anywhere in data. [,]
CID, CNAME, ADD, PHONE_NUMBER
1234,SMITH,CA,USA,+19845679000
1235,JOHN,NY,USA,+19996805944
1236,NORMA,CA,USA,+17804567893
1237,CARL,NY,USA,+18899504934

Solution: While Importing File metadata of in Flat File Import wizard Step 2 of 3 Check Remove escape character from data and give you Char + in Text Box.

BINGO!!!

Feel free to post your doubt in comment section.

Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)

If you like the approach and solution feel free to share with others.

Thanks.
hpandey.harsh@gmail.com

12 comments:

Lakshma Reddy24 March 2014 at 13:20
Hi Harsha,
Here in the Case3 we are eliminating (+) symbol but Before phone number we have +19866...........So we should eliminate 1 also here right? How can we achieve this?

Thanks in Advance
ReplyDelete
Replies
chetana25 March 2015 at 16:47
Hi Harsha ,
Your solution for the Delimeter in the data looks fine , but i is a fix for root problem ..what if the situation that we dunno where the delimiter will come as a part of Data. How can we handle this as Generic Solution. Any Idea or suggestions ? Thanks ! Chetana Naidu
ReplyDelete
Replies
chetana27 March 2015 at 08:02
Hi Harsh ,
Thanks for the reply . But again I have doubt like ..how will our script can distinguish ..whether its a data or delimeter ? Thanks! Chetana Naidu
ReplyDelete
Replies
Unknown27 March 2015 at 09:01
Hope you have read the post carefully. In the beginning of the post I have explained little bit about Data Profiling. So, you have to do the data profiling before creating your script.
Thanks.
ReplyDelete
Replies
Unknown15 October 2016 at 21:45
if there is multipal delimetes what will do ? just like |,.,'
ReplyDelete
Replies
nagarjuna1 December 2016 at 13:22
Hello Harsha,

My source table is from data warehouse. My target is a .csv file. The source column data contains Commas and it is resulting in additional fields in my .csv file due to delimiter as comma.

I have created my target flatfile in Informatica and not imported from local. So approach-2 will not be helpful for me.

can you help me out how could i eliminate comma from column data.

I have tried using REPLACECHAR() function in Source qualifier query, but its throwing me a invalid identifier error. Should i use this function in expression transformation instead of SQ query??

Appreciate your quick reply.

Thanks,
Nagarjuna
ReplyDelete
Replies
Deepthi24 March 2020 at 17:28
Source: Multi record format
Target: 3 tables Base, J1, J2

1 row contains multiple fixed width records and dynamic in nature, need to load data into target tables.

Eg:- (Base Record 100 length [c1 string 50, c2 string 50] +J1 Record 200 length [c1 string 50, c2 string 50, c3 string50, c4 string 50]+J2 Record length 150 [c1 string 50, c2 string 100])

Base Record + J1 + J2+ J1+ J2+J1 -> 1 Record goes to base table, 3 records goes to J1 table , 2 records goes to J2 table
Base Record + J2 + J2+ J1+ J2+J1 -> 1 Record goes to base table, 2 records goes to J1 table , 3 records goes to J2 table

ReplyDelete
Replies
Deepthi24 March 2020 at 17:29
Can you please help me on this.I'm new to informatica.
ReplyDelete
Replies
James Zicrov11 March 2021 at 14:14
I feel Informatica is vivid and deep to introspect and hence I genuinely feel one should look up more about it.

Informatica Read JSON

ReplyDelete
Replies

Learn Informatica Practically

Labels

Saturday, 22 March 2014

Scenario 16: How to handle comma in a comma delimited file?

12 comments:

Popular Posts

Contact Form