How to Decode URLs in Hive
Decoding URLs and strings can be a common task, especially when working with web data. This is easy to do in a language like Java or Python, but what about…
Decoding URLs and strings can be a common task, especially when working with web data. This is easy to do in a language like Java or Python, but what about…
One of the great features of Spark is the variety of data sources it can read from and write to. If you already have a database to write to, connecting…
One of the great features of Spark is the variety of data sources it can read from. Loading data from a database into Spark using JDBC requires 3 major steps.…
Cloudera’s Quickstart Image is a fantastic way to get started quickly with the big data ecosystem. With software such as Hadoop, Spark, Hive, Pig, Impala, and Hue already set up,…
UUID stands for Universally Unique Identifier. UUIDs are used as IDs (to identify) unique objects or records. These are very common in a big data environment where coordinating unique IDs…
AWS CLI has made working with S3 very easy. Once you get AWS CLI installed you might ask “How do I start copying local files to S3?” The syntax for…
The AWS CLI makes working with files in S3 very easy. However, the file globbing available on most Unix/Linux systems is not quite as easy to use with the AWS…