Examining Teradata To Google BigQuery Migration

Posted in: Big Data, Cloud, Google Cloud Platform, Technical Track

Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility.

Recently, I got a chance to work on data migration from Teradata to Google BigQuery where the task at hand was to convert Teradata BTEQ’s to BigQuery SQLs. SQL, on a higher level, stays the same with few differences.

In this post, I will cover a few generic SQLs (written in Teradata) with a conversion to Google BigQuery just to give a glimpse of syntax differences between two platforms as all the differences cannot be covered in one blog post.

 

Task Teradata SQL BigQuery SQL
 Select Rows with a
particular rank
SELECT
col1,
col2
FROM <DB>.<TBL>
QUALIFY ROW_NUMBER()   OVER ( PARTITION BY col1   ORDER BY <DATE> DESC) = 1
SELECT
*
EXCEPT(rank)
FROM
(
SELECT
col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER         BY <DATE> DESC) as rank
FROM <DB>.<TBL>
)
where rank=1
Concatenate multiple
data types
select ‘ String 1 ‘ || ‘ String 2 ‘;

–Multiple data types can be concatenated e.g. integer, varchar, decimal etc.

select ‘String 1 ‘|| 2

select concat(‘ String 1 ‘,’ String 2 ‘);

— with limitation, can only concat STRING or BYTES

OR cast data types first and then concatenate

select concat(‘String 1 ‘,cast(2 as STRING));

Truncate Table Delete from <DB>.<TBL>
OR
Delete from <DB>.<TBL> ALL
Delete from <DB>.<TBL> WHERE TRUE
Combine two result sets
with no duplicate row
SELECT * from <DB>.<TBL1>
union
SELECT * from <DB>.<TBL2>
SELECT * from <DB>.<TBL1>
union distinct
SELECT * from <DB>.<TBL2>
Timestamp Difference
to get Interval
select
(timestamp ‘2019-08-26 10:50:00.000000’ – timestamp ‘2019-08-25 07:40:00.000000’)
DAY(4) to SECONDOutput:1 03:10:00.000000where:1 = day(s)
03 = Hour(s)
10 = Minute(s)
00 = Second(s)
000000 = Milliseconds
WITH RESULT_IN_SECONDS AS
(
SELECT TIMESTAMP_DIFF(TIMESTAMP ‘2019-08-26 10:50:00.000000’,TIMESTAMP ‘2019-08-25 07:40:00.000000′,SECOND)
AS SECONDS
)
SELECT
CONCAT(CAST(DAYS AS STRING),’ ‘,HOURS,’:’,MINUTES,’:’,SECONDS,’.000000′)
FROM
(
SELECT
CAST(DAYS AS INT64) AS DAYS ,
FORMAT(“%02d”,CAST(HOURS AS INT64)) AS HOURS,
FORMAT(“%02d”,CAST(MINUTES AS INT64)) AS MINUTES,
FORMAT(“%02d”,CAST((REM_SEC_2 – (MINUTES * 60)) AS INT64)) AS SECONDS
FROM
(
SELECT DAYS,HOURS, FLOOR(REM_SEC_2/60) AS MINUTES, REM_SEC_2
FROM
(
SELECT DAYS,HOURS, (REM_SEC – (HOURS * 3600 ) ) AS REM_SEC_2
FROM
(
SELECT DAYS,FLOOR((SECONDS – (DAYS * 86400))/3600) AS HOURS,SECONDS – (DAYS * 86400) AS REM_SEC
FROM
(
SELECT
SECONDS,
FLOOR(SECONDS/86400) AS DAYS
FROM RESULT_IN_SECONDS
))
)
)
)As there is no interval data type in Google BigQuery so above manipulation can be performed to get same ‘Interval’
output as we get in Teradata
Search strings with text
‘_AB’ treating ‘_’ as data
not a wildcard character
select * from
(
select ‘GGG_ABC’ AS TXT
)A
where TXT LIKE ‘%\_AB%’ ESCAPE ‘\’
select * from
(
select ‘GGG_ABC’ AS TXT
)A
where TXT LIKE ‘%\\_AB%’
Date Difference/Subtraction select date ‘2019-08-26’ – date ‘2019-08-25’

Default behavior is to return difference in days

select DATE_DIFF(DATE ‘2019-08-26’, DATE ‘2019-08-25’,DAY)

In BigQuery we have to specify DAY as second parameter to get difference in Days

 

email

Interested in working with Sohail? Schedule a tech call.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *