Oracle GoldenGate Extract Internals, Part I

Posted in: Technical Track

Since GoldenGate has been declared as a strategic direction for replication technology by Oracle, it sounds like it’s time to get up to speed with various aspects of how this technology works and performs.

As many of you are probably aware, up until recently, GoldenGate had been a third-party product. Technology-wise this presents an interesting challenge for the GoldenGate development team as they have to rely on whatever Oracle makes available to the outside world. Let’s see what kind of techniques they were able to utilize in order to achieve their goals.

I did a simple replication setup between two different databases with the Extract, DataPump and Replicat processes. I’m planning to take a look at all of these but today is the Extract’s process turn.

Oracle GoldenGate Extract Process
The main duty of the Extract process is to read and process Oracle redo logs in order to extract relevant changes and write these to a trail.

Reading from the Redo Logs
This is probably the most interesting aspect of the Extract process as this is where various technology stacks are being bridged together. Unless you configure Extract process to read strictly from the archived logs, it will try to read from online redo logs whenever possible. Let’s take a look at an strace of a running Extract process:

lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)                 = 0
...
read(18, "\1\"\0\0J\24\0\0U\0\0\0H\200\301\302\0\0\0\0\0\0\0\0\0\6\0\0\n\0\20\0"..., 1024000) = 1024000
...
lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)                 = 0
...
read(18, "\1\"\0\0J\24\0\0U\0\0\0H\200\301\302\0\0\0\0\0\0\0\0\0\6\0\0\n\0\20\0"..., 1024000) = 1024000
...
lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)

I’ve omitted a bunch of syscalls as I’ll get to these a bit later plus it makes it much easier to see and understand the loop. Judging by the nanosleep syscall you can figure out that this cycle repeats every second. I’ve left file descriptor 18 in place since it’s what pointing out to our current redo log file:

[[email protected] fd]$ ls -l 18
lr-x------ 1 oracle oinstall 64 Jan 19 16:18 18 -> /u02/oradata/GG1/onlinelog/o1_mf_1_5oco39l6_.log

Let’s take a closer look at what happens inside that loop. The loop begins with a lseek syscall which sets offset for the specified file descriptor. After a one second delay the Extract process performs a 1000K read starting at the offset sat by the lseek. You can see multiple lseek calls to set the offset to exactly the same position, one for each time the loop gets executed. This is necessary since the read syscall advances the current position in a file, hence we need an lseek syscall to get us back to where we’ve started.

As you’ve probably already guessed, this is nothing but a constant polling of the redo log file contents to see whether the new data has arrived. You can confirm that 2659328 is the current position in the online redo log by using a query against x$kcccp:

SQL> select cpodr_bno, cpodr_bno*512, (cpodr_bno-1)*512
  2  	from x$kcccp where indx=0;
 CPODR_BNO CPODR_BNO*512 (CPODR_BNO-1)*512
---------- ------------- -----------------
      5195       2659840           2659328

The Extract process starts reading from (and including) the last written block. When the new data arrives, the Extract process advances the offset appropriately:

lseek(18, 2729984, SEEK_SET)            = 2729984
nanosleep({1, 0}, NULL)                 = 0
...
read(18, "\1\"\0\0\324\24\0\0U\0\0\0\20\200b\227\204\0\0\0\5h\0\0\200\321\10\0\1\0w5"..., 1024000) = 1024000
...
lseek(18, 2731008, SEEK_SET)            = 2731008
nanosleep({1, 0}, NULL)                 = 0

The above polling continues until the Extract process detects that there is a supported operation in the redo stream which potentially requires capturing. Remember that redo records contain object ids, not object names. When the object identifier is encountered for the first time, the Extract process will issue a series of statements against the data dictionary to find out what it is. These queries are relatively lightweight, however, if you have a large number of objects inside your database then your data dictionary may became bombarded with these queries upon the Extract process start. The corresponding object name and type will be determined. In case of an index it will be checked whether it’s an IOT in order to resolve the table name properly.

Challenges
There are a number of interesting scenarios which may occur during online redo log reading. To begin with, you’re dealing with a file to which lgwr is actively writing. Processes inside an Oracle database use a vast amount of coordination to make sure they do not step on each other’s toes, however, when you’re a third-party process operating outside of the database then what’s left available to you is to constantly observe the situation and react accordingly while making sure you won’t get in anybody’s way. For example, it would be nice to ask lgwr to notify us when the new data had been written to the online redo log file, but we can’t. So we have to resort to a constant polling to see if something new has appeared. This is one of the areas to keep an eye on as GoldenGate integration with Oracle progresses over time.

Oracle redo logs are being written sequentially in the circular fashion which makes the task of reading these a bit less challenging, however, you may still find yourself in various nasty situations when you think about different possibilities. For example, while you’re reading the next 1000K, the lgwr might be writing to the same area so some (or all) of the data which you read back might be garbage. My guess would be that the Extract process just keeps reading forward until it encounters data that is no longer valid (due to invalid redo record or RBA not from the expected sequence) and then resumes from the last valid record encountered.

Here is something more interesting to think about. What if, while you’re reading the redo log file, lgwr manages to go the complete round (i.e. switch through all redo groups and get to the one you’re currently reading from) and overwrite the data you’re currently working with? This race condition is relatively unlikely to happen in a real life but still a possibility so I became curious to test this out.

I’ve blocked the Extract process, switched the redo logs around and made sure that the current online redo log gets overwritten with the new data past the current extract point. Here is an excerpt of the relevant data form the strace output for the Extract process:

lseek(18, 3027456, SEEK_SET)            = 3027456
nanosleep({1, 0}, NULL)                 = 0
fsync(19)                               = 0
lseek(15, 0, SEEK_SET)                  = 0
write(15, "H\0\0\10R\0\n\0\2\0\4\0A\0\1\0\1\0C\0\4\0\3\0\1\0S\0\30\0Sf"..., 2048) = 2048
fsync(15)                               = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
select(8, [7], NULL, NULL, {0, 0})      = 0 (Timeout)
read(18, "\1\"\0\0\31\27\0\0{\0\0\0\340\201\205N****************"..., 1024000) = 1024000
write(16, "\0$\0\0\6\0\0\0\0\0\3N3\7\0\0\0\2\0\0\0`\0\0\0\0\0\0\0\7\0011"..., 36) = 36
read(17, "\0\254\0\0\6\0\0\0\0\0\4\1\0\0\0002\24\1\0\0\0\0{\5\0\0\0\0\7\0\0\0"..., 8208) = 172
write(4, "2010-01-20 12:49:02.084  Redo th"..., 146) = 146
write(16, "\0.\0\0\6\0\0\0\0\0\3N4\16\0\0\0\1\0\0\0`\0\0\0\0\0\0\0\7\0031"..., 46) = 46
read(17, "\1\2\0\0\6\0\0\0\0\0\6\1\"\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 258
stat("/u02/fra/GG1/archivelog/2010_01_20/o1_mf_1_120_5ogjbvbj_.arc", {st_mode=S_IFREG|0640, st_size=3034112, ...}) = 0
open("/u02/fra/GG1/archivelog/2010_01_20/o1_mf_1_120_5ogjbvbj_.arc", O_RDONLY|O_DIRECT) = 20

Let’s go through what happened and how this situation has been handled. The loop starts with positioning the file descriptor at where we left off which is 3027456. After a one second delay, a 1000K redo log read follows. Upon startup, the Extract process establishes two connections with a database using whatever userid credentials you specified in the parameter file. File descriptors 16 and 17 are pointing out to one of these sessions (a dedicated Oracle processes in my case) which the Extract can use to query various information about the database. The query on line 12 checks whether the sequence and thread from which we’re currently reading got archived:

SELECT DECODE(archived, 'YES', 1, 0),       status  FROM v$log WHERE thread# = :ora_thread AND       sequence# = :ora_seq_no

In our case, that statement will return zero rows since this particular sequence (we’ve been reading seq# 120 when the wrap has occurred) is no longer there. My guess would be that that is one of the clues the Extract process might be using to deal with the situation. Another clue is by looking at what was read from the redo log file itself. Remember that the Extract process read includes the last written redo block which can potentially be compared with the same block from a previous read in order to detect if the wrapping has occurred. After realizing that the log has wrapped, the Extract process will write a warning into it’s report file (line 14):

2010-01-20 12:49:02.084  Redo thread 1: Online log /u02/oradata/GG1/onlinelog/o1_mf_2_5og4zot6_.log on sequence# 120 has missing trailing blocks.

The next step is to do a fall back and figure out where to pick up. This task is being accomplished by another query sent on line 15:

SELECT  name    FROM gv$archived_log   WHERE sequence# = :ora_seq_no AND         thread# = :ora_thread AND         resetlogs_id = :ora_resetlog_id AND         archived = 'YES' AND         deleted = 'NO' AND         name not like '+%'         AND standby_dest = 'NO'

The bind values are:

BINDS #14:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=96 off=0
  kxsbbbfp=2ba8c340f618  bln=32  avl=03  flg=05
  value="120"
 Bind#1
  oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=0 off=32
  kxsbbbfp=2ba8c340f638  bln=32  avl=01  flg=01
  value="1"
 Bind#2
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=0 off=64
  kxsbbbfp=2ba8c340f658  bln=32  avl=09  flg=01
  value="708599728"

In other words, we’re looking into gv$archived_log to find out where the required sequence went to (note how ASM files will be bypassed by the means of name not like ‘+%’ condition). Once the name of the archived log file has been returned, the Extract process will proceed with opening and reading from it. After that, the Extract process will check whether the next sequence is available to read from the online log file:

SELECT lf.member,        DECODE(l.status, 'CURRENT', 1, 0),        DECODE(l.archived, 'YES', 1, 0)   FROM v$logfile lf, v$log l  WHERE lf.group# = l.group# AND        l.thread# = :ora_thread AND        l.sequence# = :ora_seq_no AND        lf.member not like '+%' AND        rownum = 1 AND        (lf.status NOT IN        ('INVALID','INCOMPLETE') OR        lf.status is null)

If the sequence is available, the online redo log file will be opened, otherwise the query against v$archived_log will be issued to find out the required archived log file to open. As I’ve already mentioned, the Extract process prefers to read from the online log file whether possible.

It was indeed a pleasure to see that that situation has been handled correctly. The warning emitted into a report file was nice to see as well (now you know one of the situations when this warning may appear). The GoldenGate development team has earned some credit, at least on this particular situation.

Stay tuned for the next part.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

22 Comments. Leave new

Excellent Post

Reply

Nice info, thanks. I’m just curious, if anyone knows more:

Does GoldenGate really queries the data dictionary on the fly? How can it than handle dropped/renamed objects and columns, when it queries the dictionary at different SCN than the SCN of the data being mined?

Does GoldenGate support ASM storage? Does it use HTTP access or does it directly read the data on the disk? (And does it then handle properly when the extents move due to rebalance? As with the online redo problem, it cannot get the metadata change announced by ASM directly using Oracle internal communication.)

Reply
Alex Fatkulin
January 22, 2010 9:29 am

Hi Vit,

the first question is a good one. Oracle Streams, for example, is using MVDD (Multi-Version Data Dictionary) to avoid issues when the current state of dictionary does not reflect the point in time Streams are currently working with. As part of DDL support GoldenGate will create a number of tables and these tables will track the metadata history among some other things. Without DDL support configured, GoldenGate will always query online data dictionary and, as you probably already know, that doesn’t play well with object changes. I’ll explain interactions with a database a bit more in a second part since this is where the Extract process can be a bit “chatty” plus there are some implications worth mentioning.

Archive logs on ASM are not supported (at least if you judge by their docs and the way they excluding ASM file names in their queries seems to back this up). I don’t know what mechanics are being used to read online redo from ASM yet. They require access to ASM instance so I won’t be surprised if they (a) query the extents map themselves and/or (b) use ASM internal packages to read from the redo.

Alas, the GoldenGate documentation is somewhat poor when it comes to explaining how the technology work under the hood (and this knowledge is crucial if you want to get beyond the “black box” situation).

Reply
Shahul Mohamed
January 22, 2010 3:00 pm

GooD Posting on the internals. W.r.t to ASM, it queries the instance ASM instance for the redo info as a sysdba user.

Reply
Alex Fatkulin
January 22, 2010 5:37 pm

Shahul,

they use dbms_diskgroup.read in the ASM instance to read from the ASM disk group.

Reply
Oracle GoldenGate Extract Internals, Part II | The Pythian Blog
January 26, 2010 4:01 pm

[…] of the follow up questions to part I was about the way the Extract process reads from ASM storage. I’ve provided the answer, […]

Reply
Log Buffer #176: a Carnival of the Vanities for DBAs | The Pythian Blog
January 29, 2010 2:11 pm

[…] Pythian’s Alex Fatkulin discusses Oracle GoldenGate Extract Internals. […]

Reply
Oracle GoldenGate Extract Internals, Part III | The Pythian Blog
February 3, 2010 5:27 pm

[…] is the third post in Oracle GoldenGate Extract Internals series (links to part I and part […]

Reply

Does GoldenGate support turning off data replication at a session level such as you can do in Oracle Advanced Replication using
DBMS_REPUTIL.REPLICATION_OFF/ON?

Reply

Sam,

there are some options available for doing this. You can exclude updates executed by certain users (which is what they do to exclude cycling changed in case of bi-directional replication). Alternatively there is a legacy feature with using a trace table. Replicat can bypass all changes in the transaction which started by inserting into a trace table.

Of course, this is a bit less elegant than using tags in Streams.

Reply
Blogroll Report 15/01/2009 – 22/01/2010 « Coskan’s Approach to Oracle
February 12, 2010 2:01 pm

[…] 17-Internals of goldengate extract process Alext Fatkulin-Oracle GoldenGate Extract Internals, Part I […]

Reply

Hi Alex,

I have a scenario where i need to migrate my 9i databases from solaris SPARC to 11g on solaris X86 without any downtime. Apart from endian compatibility, i also need to change the character set from western to UTF, take care of columns with XML data and avoid any downtime (which means a near real time migration). Please advise, what should be a better approach – streams or GG??? I am really confused and in process of evaluating options. Available GG documentation is not sufficient :(

I think both GG and streams work by reading the redo logs for the source db changes

Thanks
Rohit

Reply

Hi Alex,
I have a question. Nice Article, For loading balancing, I want to write Extract to multiple machines. The changes are from a single table, e.g; subscriber table of more than 25 million. As changes happen, I want to write the trails across to 4 machines (preferably round robin) or some neat way of load balancing – to ensure the changes are written evenly (or near equal). As the target side is going to be a java process, we dont want 2 or 3 of them doing all the work.

we would use a javaue to process further on the target side.

Any suggestions?

Reply
Irfan Rasheed
April 1, 2011 12:51 am

I am getting error while inserting null in a column at golden gate

2011-03-31 18:30:45 ERROR OGG-01148 programming error, data type not supported for column TXID in table advoss.tblaudittrail.

I am replicating data from MySQL 5.5.9 to Oracle 11g rel 2

can any body tell me where I am going wrong?

Reply

At Ifran… GG does not support MySQL 5.5.9 yet.

Also the reason that GG documentation was ‘poor’ was because it was a 3rd party product it wanted to keep information from competitors.

Reply
Irfan Rasheed
April 7, 2011 5:53 am

OGG 11 is now supports MySQL 5.5 for both capture and delivery.
I am using the latest version of OGG

Reply
subrata saha
June 8, 2011 3:00 pm

How would you compare GG & streams for this scenario.
SOURCE:
DELETE from child-table where parent_id=p1.

When a single DELETE statement at source is actually deleting 1000 records, STREAMS converts this to 1000 DELETE statements at TARGET, each DELETE statement at TARGET deleting a single record (LCR). This makes APPLY extremely slow for these cases.
How does GG handle this?
Does it try to execute the same DML as in the source, or it works like STREAMS?

Thanks

Reply

Can Oracle Golden Gate be installed on a linux server different from where the Oracle database is running?

Reply
Kaustubh Dixit
March 14, 2013 2:06 am

Hi Alex ,
i am undergoing a training in golden gate. The instructor is telling the product is not that stable. Is that true . I doubt that he is telling false as he is not able to debug the issues that we are facing.

Reards,
Kaustubh Dixit

Reply
Prasanna Desai
August 26, 2014 2:31 pm

Could you please explain on how append mode and the overwrite mode works while while writing to trail files ?

Reply

thanks for posting this info, really gets you thinking how complicated CDC really is

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *