Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility.
Recently, I got a chance to work on data migration from Teradata to Google BigQuery where the task at hand was to convert Teradata BTEQ’s to BigQuery SQLs. SQL, on a higher level, stays the same with few differences.
In this post, I will cover a few generic SQLs (written in Teradata) with a conversion to Google BigQuery just to give a glimpse of syntax differences between two platforms as all the differences cannot be covered in one blog post.
Task | Teradata SQL | BigQuery SQL |
Select Rows with a particular rank |
SELECT col1, col2 FROM <DB>.<TBL> QUALIFY ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY <DATE> DESC) = 1 |
SELECT * EXCEPT(rank) FROM ( SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY <DATE> DESC) as rank FROM <DB>.<TBL> ) where rank=1 |
Concatenate multiple data types |
select ‘ String 1 ‘ || ‘ String 2 ‘;
–Multiple data types can be concatenated e.g. integer, varchar, decimal etc. select ‘String 1 ‘|| 2 |
select concat(‘ String 1 ‘,’ String 2 ‘);
— with limitation, can only concat STRING or BYTES OR cast data types first and then concatenate select concat(‘String 1 ‘,cast(2 as STRING)); |
Truncate Table | Delete from <DB>.<TBL> OR Delete from <DB>.<TBL> ALL |
Delete from <DB>.<TBL> WHERE TRUE |
Combine two result sets with no duplicate row |
SELECT * from <DB>.<TBL1> union SELECT * from <DB>.<TBL2> |
SELECT * from <DB>.<TBL1> union distinct SELECT * from <DB>.<TBL2> |
Timestamp Difference to get Interval |
select (timestamp ‘2019-08-26 10:50:00.000000’ – timestamp ‘2019-08-25 07:40:00.000000’) DAY(4) to SECONDOutput:1 03:10:00.000000 where: 1 = day(s) |
WITH RESULT_IN_SECONDS AS ( SELECT TIMESTAMP_DIFF(TIMESTAMP ‘2019-08-26 10:50:00.000000’,TIMESTAMP ‘2019-08-25 07:40:00.000000′,SECOND) AS SECONDS ) SELECT CONCAT(CAST(DAYS AS STRING),’ ‘,HOURS,’:’,MINUTES,’:’,SECONDS,’.000000′) FROM ( SELECT CAST(DAYS AS INT64) AS DAYS , FORMAT(“%02d”,CAST(HOURS AS INT64)) AS HOURS, FORMAT(“%02d”,CAST(MINUTES AS INT64)) AS MINUTES, FORMAT(“%02d”,CAST((REM_SEC_2 – (MINUTES * 60)) AS INT64)) AS SECONDS FROM ( SELECT DAYS,HOURS, FLOOR(REM_SEC_2/60) AS MINUTES, REM_SEC_2 FROM ( SELECT DAYS,HOURS, (REM_SEC – (HOURS * 3600 ) ) AS REM_SEC_2 FROM ( SELECT DAYS,FLOOR((SECONDS – (DAYS * 86400))/3600) AS HOURS,SECONDS – (DAYS * 86400) AS REM_SEC FROM ( SELECT SECONDS, FLOOR(SECONDS/86400) AS DAYS FROM RESULT_IN_SECONDS )) ) ) )As there is no interval data type in Google BigQuery so above manipulation can be performed to get same ‘Interval’ output as we get in Teradata |
Search strings with text ‘_AB’ treating ‘_’ as data not a wildcard character |
select * from ( select ‘GGG_ABC’ AS TXT )A where TXT LIKE ‘%\_AB%’ ESCAPE ‘\’ |
select * from ( select ‘GGG_ABC’ AS TXT )A where TXT LIKE ‘%\\_AB%’ |
Date Difference/Subtraction | select date ‘2019-08-26’ – date ‘2019-08-25’
Default behavior is to return difference in days |
select DATE_DIFF(DATE ‘2019-08-26’, DATE ‘2019-08-25’,DAY)
In BigQuery we have to specify DAY as second parameter to get difference in Days |
2 Comments. Leave new
Really Helpful!!
Excited to see more posts on TD to GCP migration.
Thanks Sohail for such a nice explaination.
Really helpful !!!