Loading Data from a Database into Spark
One of the great features of Spark is the variety of data sources it can read from. Loading data from a database into Spark using JDBC requires 3 major steps.…
One of the great features of Spark is the variety of data sources it can read from. Loading data from a database into Spark using JDBC requires 3 major steps.…
Cloudera’s Quickstart Image is a fantastic way to get started quickly with the big data ecosystem. With software such as Hadoop, Spark, Hive, Pig, Impala, and Hue already set up,…
UUID stands for Universally Unique Identifier. UUIDs are used as IDs (to identify) unique objects or records. These are very common in a big data environment where coordinating unique IDs…
AWS CLI has made working with S3 very easy. Once you get AWS CLI installed you might ask “How do I start copying local files to S3?” The syntax for…
The AWS CLI makes working with files in S3 very easy. However, the file globbing available on most Unix/Linux systems is not quite as easy to use with the AWS…
Being able to sort by all keys in a data set is a common need in the world of big data. Those familiar with Hive or relational databases know that…
If you have gone through other Hadoop MapReduce examples, you will have noticed the use of “Writable” data types such as LongWritable, IntWritable, Text, etc… All values in used in…