Fine Tune Your Polling and Batching in Mule ESB

They say it's best to learn from others. With that in mind, let's dive into a use case I recently ran into. We were dealing with a number of legacy systems when our company decided to shift to a cloud-based solution. Of course, we had to prepare for the move — and all the complications that came with it.Use CaseWe have a legacy system built with Oracle DB using Oracle forms to create applications and lots and lots of stored procedures in the database. It's also been in use for over 17 years now with no major upgrades or changes. Of course, there have been a lot of development changes over these 17 years that taken the system close to the breaking point and almost impossible to implement something new. So, the company decided to move to CRM (Salesforce) and we needed to transfer data to SF from our legacy database. However, we couldn't create or make any triggers on our database to send real-time data to SF during the transition period.SolutionSo we decided to use Mule Poll to poll our database and get the records in bulk, then send them to SF using the Salesforce Mule connector.I am assuming that we all are clear about polling in general. If not, please refer to references at the end. Also, if you are not clear with Mule polling implementation there are few references at the bottom, too. Sounds simple enough doesn't it? But wait, there are few things to consider.What is the optimum timing of the poll frequency of your polls?How many threads of each poll you want to have? How many active or inactive threads do you want to keep?.How many polls can we write before we break the object store and queue store used by Mule to maintain your polling?What is the impact on server file system if you use watermark values of the object store?How many records can we fetch in one query from the database?How many records can we actually send in bulk to Salesforce using SFDC?These are few, if not all the considerations you have to do before implementation. The major part of polling is the WATERMARK of polling and how Mule implements the watermark in the server.Polling for Updates Using WatermarksRather than polling a resource for all its data with every call, you may want to acquire only the data that has been newly created or updated since the last call. To acquire only new or updated data, you need to keep a persistent record of either the item that was last processed, or the time at which your flow last polled the resource. In the context of Mule flows, this persistent record is called a watermark.To achieve the persistency of watermark, Mule ESB will store the watermarks in the object store of the runtime directory of a project in the ESB server. Depending on the type of object store you have implemented, you may have a SimpleMemoryObjectStore or TextFileObjectStore, which can be configured like below: Below is a simple memory object store sample: Below is text file object store sample: For any kind of object store, Mule ESB creates files in-server, and if the frequency of your polls are not carefully configured, then you may run into file storage issues on your server. For example, if you are running your poll every 10 seconds with multiple threads, and your flow takes more than 10 seconds to send data to SF, then a new object store entry is made to persist the watermark value for each flow trigger, and we will end up with too many files in the server object store.To set these values, we have consider how many records we are fetching from the database, as SF has limit of 200 records that you can send in one bulk. So, if you are fetching 2,000 records, then one batch will call SF 10 times to transfer  these 2,000 records. If your flow takes five seconds to process 200 records, including the network transfer to send data to SF and come back, then your complete poll will take around 50 seconds to transfer 2,000 records.If our polling frequency is 10 seconds, it means we are piling up the object store.Another issue that will arise is the queue store. Because the frequency and execution time have big gaps, the queue store's will also keep queuing. Again, you have to deal with too many files.To resolve this, it’s always a good idea to fine-tune your execution time of the flow and frequency to keep the gap small. To manage the threads, you can use Mule's batch flow threading function to control how many threads you want to run and how many you want to keep active.I hope few of the details may help you set up your polling in a better way.There are few more things we have to consider. What happens when error occurs while sending data? What happens when SF gives you error and can't process your data? What about the types of errors SF will send you? How do you rerun your batch with the watermark value if it failed? What about logging and recovery? I will try to cover these issues in a second blog post.Refrences: