Apache Kafka Docker Image Example
Apache Kafka is a fault tolerant publish-subscribe streaming platform that lets you process streams of records as they occur. This post is a step by step guide of how to…
Apache Kafka is a fault tolerant publish-subscribe streaming platform that lets you process streams of records as they occur. This post is a step by step guide of how to…
Apache Kafka is a fault tolerant publish-subscribe streaming platform that lets you process streams of records as they occur. This Kafka Quickstart Tutorial walks through the steps needed to get…
It is generally recommended to always compress intermediate map output. This is because IO and network transfer are big bottlenecks in Hadoop, and compression can help with both of these…
Decoding URLs and strings can be a common task, especially when working with web data. This is easy to do in a language like Java or Python, but what about…
One of the great features of Spark is the variety of data sources it can read from and write to. If you already have a database to write to, connecting…
One of the great features of Spark is the variety of data sources it can read from. Loading data from a database into Spark using JDBC requires 3 major steps.…
Cloudera’s Quickstart Image is a fantastic way to get started quickly with the big data ecosystem. With software such as Hadoop, Spark, Hive, Pig, Impala, and Hue already set up,…