Wednesday 26 February 2014

Scenario 2 Split Total Number of Records in two parts and load it based upon session run.

Scenario 2 Split Total Number of Records in two parts and load it based upon session run.

Ex: Emp table has 14 records. On first time session run 7 records should be loaded and on second time run remaining 7 records should be loaded.
In other words we can say on each run only 7 records should be loaded.

Solution

Theoretical

We need
AA) We need Mapping Variable to keep counter of Session run.
BB) We need Workflow Variable to keep counter of Workflow Run.
A) We need two Instance of Source. One as Actual Source to load data from it and Second Source for Aggregator to Count total number of records in Source.
B) In Aggregator Count total number of records. Create one dummy port for joiner condition. Don't select any group port.
C) In Expression1 create one dummy port which we will be using joiner to get the total number of records for each and every row.
D) In joiner connect both pipeline based upon DUMMY column.
E) In Expression 2 Assign an ID for each every record passed through it and find the value of half. Ex: Total_Number_Records(Which returned by Aggregator)/2
F) In Router based upon Mapping Variable and Record_ID which is less/greater than Total_Number_Records/2 pass records.
G) Create Non Reusable Session so, that you can use Component tab PreSession Variable Assignment Option.
H) Assign Workflow Variable Value to Mapping Variable
I) Create Assignment Task and Increment or decrement workflow variable based upon current value of workflow variable.
G) Need to set link condition also between session and assignment task. (If prev task status is succeeded then only pass execution otherwise stop)

Practically:

Create One mapping Variable ($$TGT_HIT) Integer, Aggregetion=(Count) No need to initialize it.
Create One Workflow Variable ($$Sess_Run) Integer (Persistent) Default Value= 1.

Step 1 Drop your source twice for two pipeline.

Step 2
Aggregator
create two ports.
     Total_Record (Number)= Count(EMPNO)
     DUMMY  (Number) 0
Note-- We need this DUMMY port for Joiner Transformation.
     Don't Select any port as GROUP port.

Step 3
 Expression 1
 Create one Output Port.
     DUMMY  (Number) 0

Step 4
Joiner Transformation
     Join Type: NORMAL
     Condition: DUMMY=DUMMY1
Note: By joining based upon dummy port we will get Total_Record count for each and every record.

Step 5
 Expression 2
 Create Four ports.
     Rec_count  Number (Variable Port)= Old_count +1
     Old_Count  Number (Variable Port) = Rec_count
     Rec_ID  Number (Output Port)= Rec_count
     Half_Rec Number (Output Port)= ROUND(Total_Record / 2)

Step 6
 Router 
 Create Two Group
     First_Half: ($$TGT_HIT=1) AND (Rec_ID < = Half_Rec)
     Second_Half: ($$TGT_HIT=2) AND (Rec_ID > Half_Rec)

Step 7: Connect both group to same target instance or different target according to your requirement. (For better understanding I have taken two different Target)

Step 8 
Create Non-Reusable Session.
     Component Tab-->PreSession Varaible Assignment--> Create one field and set
      Mapping Variable/Parameter = Workflow Variable/Parameter
                 $$TGT_HIT      =   $$SESS_RUN

Step 9
 Create Assignment Task: Develop expression as below.
     $$SESS_RUN=IIF($$SESS_RUN=1, $$SESS_RUN+1, $$SESS_RUN -1)

Step 10 We also need to set link task as: PrevTaskStatus=SUCCEEDED

BINGO!!!



Feel free to post your doubt in comment section.

Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)

If you like the approach and solution feel free to share with others.

Thanks.
hpandey.harsh@gmail.com

Monday 24 February 2014

Scenario 1: How to load single source data in four different target based upon session run through single mapping.

Scenario 1: How to load single source data in four different target based upon session run through single mapping.

Ex: One Source (Emp Table)
      1st run of Session-->EMP_TGT1
      2nd run of Session-->EMP_TGT2
      3rd run of Session-->EMP_TGT3
      4th run of Session-->EMP_TGT4
      5th run of Session-->EMP_TGT1

Solution:

Theoretical 
We need 
AA) Mapping Variable to keep counter of mapping run.
A)    Two Instance of Source: First instance treat as actual source for loading data. Second   instance of source for Aggregator Transformation.
B)    Aggregator  to find total number of records in source. As we are not selecting any port as group by so, it will return last record with Total Number of Records.
C)    Joiner to Join both Source Qualifier pipeline based upon EMPNO as It's MASTER OUTER   type of join so, it will return matched record from master and all records from outer table.
D)    In Expression keep counting each and every records passed through it. When matching of   Count in Expression is matched with Total Number of Records then increment Mapping   Variable Counter with 1.
E)     Route records based upon Mapping Variable counter value.

Practical
1. Create Mapping Variable $$TGT_HIT (Integer) Aggregation (Count) -->Initialize with 1 (Initial Value=1)

Step 1: Drop your Source twice.

Step 2: Create Aggregator and connect Second Source Qualifier to it.
Aggregator
A) Create one port Total_Record (Output Port) COUNT(EMPNO)
B) Don't select any port as Group By port

Step 3: Create Joiner and connect First Source Qualifier to it.
Joiner
A) Connect all column from First Source Qualifier to Joiner
B) Connect Total_Record and EMPNO column from Aggregator
C) Set Join Type=MASTER OUTER JOIN
D) Condition: EMPNO1=EMPNO

Step 4: Create Expression and Connect all columns from Joiner to it.
Expression
A) Create Four ports
v_rec (Number) (Variable Port)-->v_rec_count + 1
v_rec_count (Number) (Variable Port)-->v_rec
Var_Assign (Number) (Variable Port)-->
IIF(Total_Record = v_rec_count AND $$TGT_HIT = 1, SETVARIABLE($$TGT_HIT, 2), IIF(Total_Record = v_rec_count AND $$TGT_HIT = 2, SETVARIABLE($$TGT_HIT, 3), IIF(Total_Record = v_rec_count AND $$TGT_HIT = 3, SETVARIABLE($$TGT_HIT, 4), IIF(Total_Record = v_rec_count AND $$TGT_HIT = 4, SETVARIABLE($$TGT_HIT, 1)))))

NEXTVAL (Number) (Output Port)-->$$TGT_HIT

Step 5: Create Router and Connect all columns from Expression.
Router
A) Create Four Groups
FIRST_TARGET=NEXTVAL=1
SECOND_TARGET=NEXTVAL=2
THIRD_TARGET=NEXTVAL=3
FOURTH_TARGET=NEXTVAL=4

Step 6: Connect each group to each target.

Create Session and Workflow and BINGO!!!!


Feel free to post your doubt in comment section.

Due to limitation, I can't attach exported .xml file which you can Import in your repository and see or analyse. (Feel free to mail me if you need .xml file with Scenario number and Question.)

If you like the approach and solution feel free to share with others.

Thanks.
hpandey.harsh@gmail.com



Saturday 22 February 2014

Informatica Installation

Follow steps in slides. If you are getting error feel free to mail me I will try my best to rectify that.
Thanks.
Prerequisite
1. Oracle
2. Identify your Oracle Service Name
3. Keep your Server/Client and product key ready :-) 




























Thursday 6 February 2014

What is Informatica and why we need it?

What is Informatica?

Informatica is an ETL (Extract/Transform/Load) or Data Integration tool. Informatica is first ETL tool which comes into the market. It's used to implement Data Warehouse.
It's extracting data from different source (Database/File) and transforming data according to business requirement and loading into Data Warehouse.

Few more points about Informatica

1. Informatica is a Data Integration tool.
2. It's a product of Informatica Company.
3. It's a GUI tool
4. It's developed on JAVA language.
5. It's works in Client-Server Architecture


Why we need it?

To understand this lets take a look on evolution of ETL tool.

Before ETL tool

When there was no tool for ETL job then how the ETL job was done?
We were developing complete Procedural Code according to database in which data warehouse exists.
Ex: If Data Warehouse was on Oracle then complete PL/SQL program, if Sql Server then TSQL program.

Challenges

1) As Code was database specific so, we were not able to read the data from another database.
2)  For a simple job we had to write thousands lines of codes.

Solution:

As a solution for the challenges above ETL Tool evolves and ETL Job simplified. Now through Informatica tool we can read data from any source, transforming it and loading to target.
It's a GUI tool so, it's too user friendly and very easy to implement ETL Job.