At the Hadoop Summit in Dublin this week, Ted Malaska, Principal Solutions Architect at Cloudera, and I presented Ingest and Stream Processing - What Will You Choose?, looking at the big data streaming landscape with a focus on ingest. The session closed with a demo of StreamSets Data Collector, the open source graphical IDE for building ingest pipelines.
In the demo, I built a pipeline to read JSON data from Apache Kafka, augmented the data in JavaScript, and wrote the resulting records to both Apache Kudu (incubating) for analysis and Apache Kafka for visualization.