Batch Processing in Mulesoft | ETL


Lets understand why batch processing is required and how this can be achieved in mulesoft. Batch processing is particularly useful when working with following scenarios:
  • Handling large quantities of incoming data from APIs into legacy systems.
  • Extracting, transforming, and loading (ETL) information into the destination system (i.e., uploading CSV or flat file data into the Hadoop system).
  • Engineering near real-time data integration (i.e., between SaaS applications).
  • Integrating datasets — small or large, streaming or not — to parallel process records.
Batches elegantly handle any record-level failures that might occur in processing so as to prevent failure of a complete batch job.
batch job is a top-level element in Mule which exists outside all Mule flows. Batch jobs split large messages into records which Mule processes asynchronously in a batch job; just as flows process messages, batch jobs process records.
A batch job contains one or more batch steps which, in turn, contain any number of message processors that act upon records as they move through the batch job. During batch processing, you can use record-level variables (recordVars) and MEL expressions to enrich, route or otherwise act upon records.
A batch job executes when triggered by either a batch executor in a Mule flow or a message source in a batch-accepting input.
When triggered, Mule creates a new batch job instance. When all records have passed through all batch steps, the batch job instance ends and the batch job result is summarized in a report to indicate which records succeeded and which failed during processing.
In this walk through, we will build a mule application that fetches data from database and process those in batch steps and loads valid records in a destination table.

At the end of walk-through, we would build the below mule application.
Now lets bring in a batch scope and drag database and set payload inside the batch scope input area and configuration are shown below,
To avoid unnecessary changes on the payload message enricher is used, so that information is stored in variable. 1st enricher  simply updates the payload this to read the loan id to record variable and in the 2nd enricher, the status of the record is updates in DB table and the result is stored in record variable.
Lets bring in choice router and if the loan type is termed as invalid, we would stop the process by throwing an exception using groovy component. On the default flow, we log the LoanID.
As a final step of batch, we will insert the valid loan records to another table using DB connector as shown below and on completion phase we log the payload.
Now lets build another simple mule flow that invokes the batch using batch execution.
Lets run the project and test the application. Note, Batch Scope Max Failed Records is set as -1. Let load the data in source LoanApp table as shown and target LoanDetails table would be loaded by the batch job invoked by mule application.
Lets send a Get request in POST-Man as shown, this would invoke the job and each records would process and the batch result is rendered back to client.
Flow logs show, out of 4 records 2 processed successfully and 2 failed.
Lets modify the flow to handle failure records, bring a batch step before completion tab and configure the batch step property to process failure records and place set payload and for each scope as shown. In order to access the failure instances, we need to use #[getStepException] and loop through each failure using  #[payload.values()].
Inside For Each scope, add a logger for log loanID and then using DB connector, we would update the ProcessFlag to E indication exception occurred.
Lets re-run the project and test it. From Post-man, lets place a get request and result is rendered back as shown below and the batch process updated the ProcessFlag approperiately and loaded the LoadDetails table for valid loan types. Before placing a request, as you can see, ProcessFlag is set to N in LoanApp table and LoanDetails Table has been truncated.
Entries of source and target table before and after the process along with flow logs are shown below.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.