How to Load a Text File into Spark
Loading text files in Spark is a very common task, and luckily it is easy to do. Below are a few examples of loading a text file (located on the…
Loading text files in Spark is a very common task, and luckily it is easy to do. Below are a few examples of loading a text file (located on the…
Getting the hash code of a file is a common programming task. MD5 is a very popular and commonly used hashing algorithm. Getting the MD5 hash code of a file…
Generating fake data can be a common need when developing applications or loading test data into a database. jFairy is a great fake data generator library built in Java that…
A fat jar or uber jar is a jar that contains the classes of your current project as well as all of the classes on which it depends. For example,…
Often when working with files in S3, you need information about all the items in a particular S3 bucket. Below is an example class that extends the AmazonS3Client class to…
Counting the distinct/unique elements of text file is a common task. Below is an example of doing this is AWK, using sample_data_1.txt. cat sample_data_1.txt \ | awk 'BEGIN{FS="\t"} NR>1{names[$2]=1} END{print…