How to Read Multiple Flat Files Using a Single Datastage Job

Generally we use Sequential File stage to read any flat file by setting properties File name, file pattern, delimiter etc. We can read multiple files by setting “Read Method” property as “File pattern”. Here main distinguishable criteria is “Metadata” of multiple files.

Let’s discuss different methods with example.

  1. If metadata of all multiple files is same

Method 1 – Using valid unix expression

If there are multiple files with similar pattern then one can read them by setting “Read Method” as “File pattern”. After this put valid file pattern in the File pattern field.

For ex. /home/Sample_*.txt. Will pick all files like Sample_1234.txt, Sample_Test.txt, Sample_Dev.txt etc.

Or /home/Sample_?.txt will pick Sample_1, Sample_2, Sample_3 and so on.

If you want to pick only some top files, one can use command like below.

“ls /home/Sample_*.txt |head -3”

Method 2 – Using Specific File names

If files are not with similar pattern then one can use sequential file properties where we can provide number of file names to read.

Lets see how!!!

In ‘Properties’ Tab of Sequential file, as highlighted with Red box in below diagram, you notice the multiple blue squares unlike other properties which have only one.

These multiple blue squares indicates this is a repeatative property i.e. one can use this property number of times. To add another file, click on ‘Source’ then ‘File’ option will be available in right bottom corner “Available Properties To add” box. Click on that, you will get a new ‘File’ option.

Read Methods = ‘Specific File(s)’

Method 3 – Using Multiple Instance job

One need to enable “Allow Multiple Instances” property from the job Properties. You can parameterized input File path as well as Input file names and provide these parameter values while running the job and use different invocation id each time.

You can see log of each job instance from Datastage Director

Method 4 –

Another option is to have a command stage in job sequence which reads file name. And then pass the output of this command ($CommandOutput) to the file name parameter of sequential file stage


       2.  If Metadata of files are different

If the files have different metadata, then schema file option would have to be used. Schema file option is available in sequential file stage in the ‘Properties to add’ under Options menu. It provides the user an option to give the details of file metadata, its column structure and its file structure using a schema file.

We just need ensure that file and its metadata should be respective to each other and one have to checked RCP (Run Time Column Propagation) property of the is set to ‘True’.

Now using this Schema file one can use below methods to read multiple files.

Method 1 –

Using parameters – Create Parallel job Sequential Stage (with schema file property active). Add three job Parameters – pFilePath, pFileName, pSchemaPath. In the Sequential Stage add pFilePath and pFileName to stage file property. Add stage Schema Name property, add pSchemaName to Schema property. Then while running the job, give appropriate value of the three parameters. In the same, one can set multi instance property as True and run the job for multiple sets of file and its schema.

Method 2 –

Using Loop – One can use UserVariable activity and considering the list of files as variable and pass it to loop to run job with multiple iteration.

 


Leave a Reply

© 2017 Database ETL. All rights reserved.