Apache Avro is a popular data serialization system that relies on schemas. The official Avro documentation can be found here: http://avro.apache.org/docs/current/.
This post walks through an example of serializing and deserializing data using Avro in Java. Maven is not necessary for working with Avro in Java, but we will be using Maven in this post.
Step 1 – Update pom.xml
Add the text below (versions might need updating) to your Maven pom.xml file. This lower section will allow us to use the convenience of code generation (discussed below).
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.7.7</version>
</dependency>
<build>
<plugins>
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.7.7</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</build>
Step 2 – Define your schema
You will need to create an Avro schema file in the location shown in the sourceDirectory
field in your pom.xml file. Here are the contents of our person.avsc
file:
{"namespace": "com.bigdatums.avro",
"type": "record",
"name": "BdPerson",
"fields": [
{"name": "id", "type": "int"},
{"name": "username", "type": ["string", "null"]},
{"name": "email_address", "type": ["string", "null"]},
{"name": "phone_number", "type": ["string", "null"]},
{"name": "first_name", "type": ["string", "null"]},
{"name": "last_name", "type": ["string", "null"]},
{"name": "middle_name", "type": ["string", "null"]},
{"name": "sex", "type": ["string", "null"]},
{"name": "birthdate", "type": ["string", "null"]},
{"name": "join_date", "type": ["string", "null"]},
{"name": "previous_logins", "type": ["int", "null"]},
{"name": "last_ip", "type": ["string", "null"]}
]
}
Step 3 – Compile your project
By compiling your project, you will trigger code generation. Code generation allows us to automatically create classes to work with BdPerson schema objects (from the previous step). After compiling, the BdPerson
class will appear in our com.bigdatums.avro
package (defined in schema). There are various ways to compile the project, including running mvn compile
from the command line in the project directory.
Step 4 – Create Schema Objects
Use the classes in the newly generated BdPerson class to create schema objects.
BdPerson p1 = new BdPerson();
p1.setId(1);
p1.setUsername("mrscarter");
p1.setFirstName("Beyonce");
p1.setLastName("Knowles-Carter");
p1.setBirthdate("1981-09-04");
p1.setJoinDate("2016-01-01");
p1.setPreviousLogins(10000);
BdPerson p2 = new BdPerson();
p2.setId(2);
p2.setUsername("jayz");
p2.setFirstName("Shawn");
p2.setMiddleName("Corey");
p2.setLastName("Carter");
p2.setBirthdate("1969-12-04");
p2.setJoinDate("2016-01-01");
p2.setPreviousLogins(20000);
Step 5 – Serialize Data to Disk
Use Avro classes and BdPerson schema to serialize data and write to bdperson-test.avro
.
//Serialize sample BdPerson
File avroOutput = new File("bdperson-test.avro");
try {
DatumWriter<BdPerson> bdPersonDatumWriter = new SpecificDatumWriter<BdPerson>(BdPerson.class);
DataFileWriter<BdPerson> dataFileWriter = new DataFileWriter<BdPerson>(bdPersonDatumWriter);
dataFileWriter.create(p1.getSchema(), avroOutput);
dataFileWriter.append(p1);
dataFileWriter.append(p2);
dataFileWriter.close();
} catch (IOException e) {System.out.println("Error writing Avro");}
Step 6 – Deserialize Avro file and Print Contents
Use Avro classes and BdPerson schema to deserialize Avro file and print contents.
//Deserialize sample avro file
try {
DatumReader<BdPerson> bdPersonDatumReader = new SpecificDatumReader(BdPerson.class);
DataFileReader<BdPerson> dataFileReader = new DataFileReader<BdPerson>(avroOutput, bdPersonDatumReader);
BdPerson p = null;
while(dataFileReader.hasNext()){
p = dataFileReader.next(p);
System.out.println(p);
}
} catch(IOException e) {System.out.println("Error reading Avro");}
Deserialized output:
{"id": 1, "username": "mrscarter", "email_address": null, "phone_number": null, "first_name": "Beyonce", "last_name": "Knowles-Carter", "middle_name": null, "sex": null, "birthdate": "1981-09-04", "join_date": "2016-01-01", "previous_logins": 10000, "last_ip": null}
{"id": 2, "username": "jayz", "email_address": null, "phone_number": null, "first_name": "Shawn", "last_name": "Carter", "middle_name": "Corey", "sex": null, "birthdate": "1969-12-04", "join_date": "2016-01-01", "previous_logins": 20000, "last_ip": null}