Talend DI and Talend ESB are the two most famous and well-known integration version, in this post, I’ll speak about difference and personal feelings about these two open source versions.
Talend Data integration since a long time, helps me a lot to migrate data from ERP or CRM, building datawarehouse, create daily batch, read XML/json and many others stuffs without wasting too much time on writing excessive java source code.
When I deal with Talend ESB for the first time I didn’t know this version, even if it’s much more technical and specific, it was the most efficient in term of integration best practice and became at the end my favourite version 😉
Even if these both Talend version are java code generator, ESB version is generating Apache Camel code.
This java framework is an open source project which is the fourth most active repository at Apache software foundation for 2018, this framework based on EIP pattern follows standard rules.
So, it’s true, this framework is still java code in background but it should be more trusted than simple java batch. In fact, the java code behind had been strongly tested and more efficient (no offence) than the java code you could write (e.g: to consume a file) or generate with data integration version. Most of the needs or future needs are already solved and available in a simple customizing option.
You just build Apache Camel route from a starting point (consumer) to a ending point (producer) by using FROM and TO keywords.
Batch processing versus consumer/producer
Batch processing could be hell on earth, especially if you manage hundreds of them on production which are likely dependent from each other, there are a lot a chance that you’ll become quickly crazy in case of trouble.
I don’t even talk about the error management, I mean failing batch during long bulk load, how to manage faulty records or remained records ? People are often skip them and just let the process to be continued.
Consumer/producer way or real time processing is taking record one by one by splitting them, obviously you can also aggregate them like SQL. This approach seems to be heavy at a glance, but it’s not, remember a simple customizing option will give you the way to use multi-thread capability or even better, activate the lazy load type converter, avoiding to explode the java heap memory.
Process records one by one enable effective errors management, if a record is rejected, you can temporary route it in a dead letter queue and manage it later manually without informations loss, of course, you can setup management rules to include redelivery, meaning trigger another tries before records fall into the DLQ.
Data integration is great for data batch and ESB for live time data processing, ESB remains my favourite version but as a most technical version, you’ll need to read the fresh and second Apache Camel bible to understand this wonderful framework way of live.