Calculating business days in HiveQL

Posted in: Hadoop, Technical Track

One of the common tasks in data processing is to calculate the number of days between two given dates. You can easily achieve this by using Hive DATEDIFF function. You can also get weekday number by using this more obscure function:

SELECT FROM_UNIXTIME(your_date,'u') FROM some_table;

This will return 1 for Monday, 7 for Sunday and is based on Java SimpleDateFormat —
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

It becomes more challenging if you need to calculate the number of business days (excluding Saturdays and Sundays) between two dates. There is no built-in function in Hive to perform such calculation, but fortunately it is rather simple to write your own Hive UDFs. I couldn’t quickly find any existing open source functions to solve this problem, so I wrote my own using Scala — https://github.com/pythian/hive-udfs

There are actually three functions in hive-udfs package: CountBusinessDays, CountSaturdays and CountSundays. These functions accept start date and end date as UNIX_TIMESTAMP and return the count of different types of full days in this interval, excluding start and end points.

Here is how you can use this UDF in Hive:

ADD JAR hdfs:///user/hive/udfs/pythian-hive-udfs-assembly-0.1.jar;
CREATE TEMPORARY FUNCTION count_business_days AS 'com.pythian.udf.CountBusinessDays';
CREATE TEMPORARY FUNCTION count_saturdays AS 'com.pythian.udf.CountSaturdays';
CREATE TEMPORARY FUNCTION count_sundays AS 'com.pythian.udf.CountSundays';

SELECT count_business_days(UNIX_TIMESTAMP(start_date), UNIX_TIMESTAMP(end_date)) FROM some_table;

The code is open source and free to use. Comments and suggestions are always welcome.

email

Interested in working with Danil? Schedule a tech call.

8 Comments. Leave new

I try to apply this code and having this error
( StringIndexOutOfBoundsException String index out of range: 0)
Please help me out

Reply

Hey, I came across this page while trying to solve this problem and I wanted to share that we can achieve the same result by using HiveQL (replace StartDate and EndDate with your date fields)

SELECT DateDiff(StartDate, EndDate)
– (2 * (WEEKOFYEAR(StartDate) – WEEKOFYEAR(EndDate)) )
+ CASE WHEN from_unixtime(unix_timestamp(EndDate, ‘yy-MM-dd’), ‘EEE’) = ‘Sun’ THEN 1 ELSE 0 END
+ CASE WHEN from_unixtime(unix_timestamp(StartDate, ‘yy-MM-dd’), ‘EEE’) = ‘Sat’ THEN 1 ELSE 0 END

I hope some other poor slob finds this helpful – Cheers!

Reply

The ^^ doesn’t work. Try 2018-06-08 and 2018-06-12 for example

Reply

Nice work around, I did a slight change

select DateDiff( ‘2018-06-12’, ‘2018-06-08’) – (WEEKOFYEAR(‘2018-06-12’) – WEEKOFYEAR(‘2018-06-08’)) * 2

( CASE WHEN from_unixtime(unix_timestamp(‘2018-06-12’, ‘yy-MM-dd’), ‘EEE’) = ‘Sun’ THEN 1 ELSE 0 END
+ CASE WHEN from_unixtime(unix_timestamp(‘2018-06-08’, ‘yy-MM-dd’), ‘EEE’) = ‘Sat’ THEN 1 ELSE 0 END) No_Week_days

Reply

This doesn’t work for different year example : 2017-11-20,2018-02-23 >> 69
select
DateDiff(COALESCE (‘2018-02-23’ , TO_DATE(‘9999-12-31′)) ,’2017-11-20’)

(WEEKOFYEAR(COALESCE(‘2018-02-23’)) – WEEKOFYEAR(COALESCE(‘2017-11-20’))) * 2
+
(CASE WHEN dayofweek(COALESCE(‘2018-02-23’)) = 1 THEN 1 ELSE 0 END
+
CASE WHEN dayofweek(COALESCE(‘2017-11-20’)) = 7 THEN 1 ELSE 0 END
)

Reply
Chris Meurer
March 18, 2019 3:29 pm

Believe the following will handle date ranges greater than 1 year, and takes care of an error I made when submitting the initial post (DateDiff parameters were reversed) :

SELECT (DATEDIFF(ENDDATE, STARTDATE) +1 )
– (2* ( ((YEAR(ENDDATE) – YEAR(STARTDATE)) * 52)
+ (WEEKOFYEAR(ENDDATE)-WEEKOFYEAR(STARTDATE)) ) )
– CASE WHEN
FROM_UNIXTIME(UNIX_TIMESTAMP(ENDDATE, ‘YY-MM-DD’), ‘EEE’) = ‘SUN’
THEN 1 ELSE 0 END
– CASE WHEN
FROM_UNIXTIME(UNIX_TIMESTAMP(STARTDATE, ‘YY-MM-DD’), ‘EEE’)=’SAT’
THEN 1 ELSE 0 END
FROM (…)

I’d love to hear additional feedback on this approach!

Reply
sunderarajan s
July 13, 2019 11:39 am

Super Chris…to extend this logic WITHOUT writing a UDFs in HIVE as am NOT that good in java or scala…

? .how to calculate business working days EXCLUDING weekends(saturdays & sundays) and holidays(list of holidays) for which most of the blogs suggest to keep a temp table

Thanks – Sunderarajan

Reply
sunderarajan s
July 13, 2019 11:49 am

a slight correction to my earlier requirement post –
By the way – i have a requirement wherein i need NOT have a range of dates — start date and end date…instead have to just GO with a GIVEN DATE and need to calculate business working days(working days FORWARD FROM GIVEN DATE) EXCLUDING weekends and list of holidays…then add the business logic whatever 12, 15, 30, days etc., using dateadd…

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *