Often times it is useful to know the input file name you are processing in a Hive query. This is a common if useful metadata is stored in the file name. For example, logs from many different servers can be stored in S3, and these files’ names could contain the names or ip addresses of those servers.
Luckily, doing this in Hive is very easy using the INPUT__FILE__NAME “virtual column” which will give the input file’s name for a mapper task. Here is an example:
SELECT id, INPUT__FILE__NAME filename FROM people_v1;