How to do Total Order Sorting in Hadoop MapReduce
Being able to sort by all keys in a data set is a common need in the world of big data. Those familiar with Hive or relational databases know that…
Being able to sort by all keys in a data set is a common need in the world of big data. Those familiar with Hive or relational databases know that…
If you have gone through other Hadoop MapReduce examples, you will have noticed the use of “Writable” data types such as LongWritable, IntWritable, Text, etc… All values in used in…
Getting the distinct values from a dataset is a very common task, and actually very easy to do in MapReduce. In psuedo code your mapper and reducer will look something…
Often when running MapReduce jobs, people prefer setting configuration parameters from the command line. This helps avoid the need to hard code settings such as number of mappers, number of…
Below is a simple Hadoop MapReduce example. This example is a little different than the standard “Word Count” example in that it takes (tab) delimited text, and counts the occurrences…
JSON is a very common way to store data. But JSON can get messy and parsing it can get tricky. Here are a few examples of parsing nested data structures…
Needing to read and write JSON data is a common big data task. Thankfully this is very easy to do in Spark using Spark SQL DataFrames. Spark SQL can automatically…