Updated : 09/10/2018
It is better to use batch data processing or real time data integration?
It depends on the use case. Large files that can be processed overnight and don't have a time constraint might be better in batch, where small transactions and/or transactions that are synchronous in nature might need a real-time integration. This article will discuss the advantages of Command server vs Launcher or Integration Server.
Some companies might have time constraints in their processing. One example was an analytics company that was receiving point-of-sale (POS) data from multiple retail stores. They received close to 100 GB of log files per day. The data had to be ingested by the middle of the night, so their application had enough time to run a couple of hours of analytics on the data. The analytics was Critical to this company’s success, so it needed to be ready in the morning to offer a competitive advantage. Since the retail POS data was coming intermittently from the stores, this customer could use the HCL Integration Platform (HIP) Launcher to ingest files as they arrived. The Launcher would watch for the presence of a new file and process it as soon it had been delivered. They did not want to wait for a batch load. Even though there was a significant amount of data (100GB per night) to be loaded, that represented a lot of smaller files that could be ingested independently of the others.
Now, could an ETL tool do this? The challenge here is supporting the various data formats that need to be processed. If there was only one POS system, that simplifies things, but there were about 50 formats that needed to be ingested. These formats had many different record types, and even multiple delimiters within the same line of data. This was a very complex data structure that needed to be mapped. HCL Integration Platform (HIP) is designed for this complex type of data mapping. There were several back-end applications that needed to be populated for each file, and data had to be pulled from the header, and the body of the data and combined in a complex manner. ETL commonly focuses on a line in and a line out, and doesn’t necessarily correlate a group of data. Also, ETL tools tend to be batch oriented in nature, where the Launcher is event driven and starts processing on a new event such as a new file, an insert into a database, a new message on a queue, etc.
The other complexity to this use case is having to do a lookup from other data sources. When you have to read in cross reference information from files and databases, that adds additional complexity that many ETL tools do not support. HCL Integration Platform (HIP) is designed to read in from multiple sources, and write to multiple destinations in one pass. Because the POS data comes from different POS systems, they don’t all have the same information, and using a cross reference becomes vital to loading the right data.
Think about the healthcare industry. Say a large customer sends employee enrolment EDI (834) transactions with thousands of employee names in them. This could be done over the open enrolment timeframe. There is no urgent requirement to have them loaded immediately, as open enrolment tends to have a large time window and maybe multiple updates over that window. Once the Insurer receives that file, they might just use the command engine to load data overnight, since there is not time constraint.
But in contrast, if a person is at a drug store, and wants to fill a prescription, there is a need for a real-time eligibility (EDI 270/271) transaction to verify if the insurer will pay for the prescription. This can be handled by HCL Integration Platform (HIP) Launcher waiting for either web services calls or even REST calls. These calls can be done synchronously so that there is an immediate response verifying the eligibility. The employee does not want to wait overnight, an hour or even 5 minutes to get their prescription. Alternatively, the call could be made to another application such as Sterling B2B Integrator or IBM Integration Bus (IIB) to handle the web service call, then use the Integration Server to call HCL Integration Platform (HIP) to do the back-end call to transform the data from the EDI HIPAA format to whatever the eligibility look up requires.
The final example could deal with a company sending in files that need to be compared to a previous file. There could be thousands of records in there, and the requirement is to write out a separate file each for records that need to be added, a file for records that exist and need to be updated, and finally a file that shows deleted records (i.e. the record existed before, but is not in the current file). Since the key data were split into different fields in the record, we had to combine fields to do the comparison. Since the data was coming in once a month, and there was no time constraint, the Command engine could manage this in a batch mode. If the files were coming in as individual transactions, and a comparison had to be made immediately, the architecture could have called for the Launcher to be used.
So, different use cases call for different integration methods. With HCL Integration Platform (HIP), you have the flexibility to meet the requirement at hand. The same map works whether you run it with the Command Engine, the Launcher or the Integration Server. So, even if you have a requirement that changes, say moving from a batch integration to a more real-time requirement, you don’t have to redo the map, or look at a different technology, you can just the same map with a different engine. That’s the power of using HCL Integration Platform (HIP).
ITX Subject Matter Expert
HCL Integration Platform (HIP) trademark of IBM Corporation in at least one jurisdiction and is used under license.