If you have gone through other Hadoop MapReduce examples, you will have noticed the use of “Writable” data types such as LongWritable, IntWritable, Text, etc… All values in used in Hadoop MapReduce must implement the Writable interface.
Although we can do a lot with the primitive Writables already available with Hadoop, there are often times when we want to transmit a variety of data and/or data types from Mapper to Reducer. Sometimes it is possible to convert all these data into strings and concatenate them to result in a single key or single value. However, this can get very messy, and is not recommended.
Implementing Writable requires implementing two methods, readFields(DataInput in)
and write(DataOutput out)
. Writables that are used as keys in MapReduce jobs must also implement Comparable (or simply WritableComparable). Overriding the toString()
method is not necessary, but can be very helpful when storing your output data as text in HDFS.
Below is an example of a custom Writable that is used to store both gender and login information. An example of using this might be to calculate login statistics based on gender. Notice that this custom class, GenderLoginWritable
, does not implement Comparable or WritableComparable, so it can only be used as a value in the MapReduce framework, not as a key.
package com.bigdatums.hadoop.mapreduce;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class GenderLoginWritable implements Writable {
private IntWritable male;
private IntWritable female;
private IntWritable maleLogins;
private IntWritable femaleLogins;
public GenderLoginWritable() {
male = new IntWritable(0);
female = new IntWritable(0);
maleLogins = new IntWritable(0);
femaleLogins = new IntWritable(0);
}
public GenderLoginWritable(IntWritable male, IntWritable maleLogins, IntWritable female, IntWritable femaleLogins) {
this.male = male;
this.female = female;
this.maleLogins = maleLogins;
this.femaleLogins = femaleLogins;
}
public IntWritable getMale() {
return male;
}
public IntWritable getFemale() {
return female;
}
public IntWritable getMaleLogins() {
return maleLogins;
}
public IntWritable getFemaleLogins() {
return femaleLogins;
}
public void setMale(IntWritable male) {
this.male = male;
}
public void setFemale(IntWritable female) {
this.female = female;
}
public void setMaleLogins(IntWritable maleLogins) {
this.maleLogins = maleLogins;
}
public void setFemaleLogins(IntWritable femalelogins) {
this.femaleLogins = femalelogins;
}
public void readFields(DataInput in) throws IOException {
male.readFields(in);
female.readFields(in);
maleLogins.readFields(in);
femaleLogins.readFields(in);
}
public void write(DataOutput out) throws IOException {
male.write(out);
female.write(out);
maleLogins.write(out);
femaleLogins.write(out);
}
@Override
public String toString() {
return male.toString() + "\t" + maleLogins.toString() + "\t" + female.toString() + "\t" + femaleLogins.toString();
}
}
Documentation for the Writable interface can be found here: Hadoop Writable Interface.