One of the common tasks in data processing is to calculate the number of days between two given dates. You can easily achieve this by using Hive DATEDIFF function. You can also get weekday number by using this more obscure function:
SELECT FROM_UNIXTIME(your_date,'u') FROM some_table;
This will return 1 for Monday, 7 for Sunday and is based on Java SimpleDateFormat —
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
It becomes more challenging if you need to calculate the number of business days (excluding Saturdays and Sundays) between two dates. There is no built-in function in Hive to perform such calculation, but fortunately it is rather simple to write your own Hive UDFs. I couldn’t quickly find any existing open source functions to solve this problem, so I wrote my own using Scala — https://github.com/pythian/hive-udfs
There are actually three functions in hive-udfs package: CountBusinessDays, CountSaturdays and CountSundays. These functions accept start date and end date as UNIX_TIMESTAMP and return the count of different types of full days in this interval, excluding start and end points.
Here is how you can use this UDF in Hive:
ADD JAR hdfs:///user/hive/udfs/pythian-hive-udfs-assembly-0.1.jar;
CREATE TEMPORARY FUNCTION count_business_days AS 'com.pythian.udf.CountBusinessDays';
CREATE TEMPORARY FUNCTION count_saturdays AS 'com.pythian.udf.CountSaturdays';
CREATE TEMPORARY FUNCTION count_sundays AS 'com.pythian.udf.CountSundays';
SELECT count_business_days(UNIX_TIMESTAMP(start_date), UNIX_TIMESTAMP(end_date)) FROM some_table;
The code is open source and free to use. Comments and suggestions are always welcome.
8 Comments. Leave new
I try to apply this code and having this error
( StringIndexOutOfBoundsException String index out of range: 0)
Please help me out
Hey, I came across this page while trying to solve this problem and I wanted to share that we can achieve the same result by using HiveQL (replace StartDate and EndDate with your date fields)
SELECT DateDiff(StartDate, EndDate)
– (2 * (WEEKOFYEAR(StartDate) – WEEKOFYEAR(EndDate)) )
+ CASE WHEN from_unixtime(unix_timestamp(EndDate, ‘yy-MM-dd’), ‘EEE’) = ‘Sun’ THEN 1 ELSE 0 END
+ CASE WHEN from_unixtime(unix_timestamp(StartDate, ‘yy-MM-dd’), ‘EEE’) = ‘Sat’ THEN 1 ELSE 0 END
I hope some other poor slob finds this helpful – Cheers!
The ^^ doesn’t work. Try 2018-06-08 and 2018-06-12 for example
Nice work around, I did a slight change
select DateDiff( ‘2018-06-12’, ‘2018-06-08’) – (WEEKOFYEAR(‘2018-06-12’) – WEEKOFYEAR(‘2018-06-08’)) * 2
–
( CASE WHEN from_unixtime(unix_timestamp(‘2018-06-12’, ‘yy-MM-dd’), ‘EEE’) = ‘Sun’ THEN 1 ELSE 0 END
+ CASE WHEN from_unixtime(unix_timestamp(‘2018-06-08’, ‘yy-MM-dd’), ‘EEE’) = ‘Sat’ THEN 1 ELSE 0 END) No_Week_days
This doesn’t work for different year example : 2017-11-20,2018-02-23 >> 69
select
DateDiff(COALESCE (‘2018-02-23’ , TO_DATE(‘9999-12-31′)) ,’2017-11-20’)
–
(WEEKOFYEAR(COALESCE(‘2018-02-23’)) – WEEKOFYEAR(COALESCE(‘2017-11-20’))) * 2
+
(CASE WHEN dayofweek(COALESCE(‘2018-02-23’)) = 1 THEN 1 ELSE 0 END
+
CASE WHEN dayofweek(COALESCE(‘2017-11-20’)) = 7 THEN 1 ELSE 0 END
)
Believe the following will handle date ranges greater than 1 year, and takes care of an error I made when submitting the initial post (DateDiff parameters were reversed) :
SELECT (DATEDIFF(ENDDATE, STARTDATE) +1 )
– (2* ( ((YEAR(ENDDATE) – YEAR(STARTDATE)) * 52)
+ (WEEKOFYEAR(ENDDATE)-WEEKOFYEAR(STARTDATE)) ) )
– CASE WHEN
FROM_UNIXTIME(UNIX_TIMESTAMP(ENDDATE, ‘YY-MM-DD’), ‘EEE’) = ‘SUN’
THEN 1 ELSE 0 END
– CASE WHEN
FROM_UNIXTIME(UNIX_TIMESTAMP(STARTDATE, ‘YY-MM-DD’), ‘EEE’)=’SAT’
THEN 1 ELSE 0 END
FROM (…)
I’d love to hear additional feedback on this approach!
Super Chris…to extend this logic WITHOUT writing a UDFs in HIVE as am NOT that good in java or scala…
? .how to calculate business working days EXCLUDING weekends(saturdays & sundays) and holidays(list of holidays) for which most of the blogs suggest to keep a temp table
Thanks – Sunderarajan
a slight correction to my earlier requirement post –
By the way – i have a requirement wherein i need NOT have a range of dates — start date and end date…instead have to just GO with a GIVEN DATE and need to calculate business working days(working days FORWARD FROM GIVEN DATE) EXCLUDING weekends and list of holidays…then add the business logic whatever 12, 15, 30, days etc., using dateadd…