Making Oozie Logs A Little Easier On The Eyes

Posted in: Big Data, Technical Track

Today we’re having a quick one.

Earlier during the day, I had to peruse an Oozie log for the first time. And it looked like:


2014-02-11 20:13:14,211  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]:start:] Start action [[email protected]:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:14,212  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]:start:] [***[email protected]:start:***]Action status=DONE
2014-02-11 20:13:14,212  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]:start:] [***[email protected]:start:***]Action updated in DB!
2014-02-11 20:13:14,271  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] Start action [[email protected]] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:15,079  WARN HiveActionExecutor:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] credentials is null for the action
2014-02-11 20:13:18,306  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403[email protected]] checking action, external ID [job_201401070500_217582] status [RUNNING]
2014-02-11 20:13:18,408  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] [***[email protected]***]Action status=RUNNING
2014-02-11 20:13:18,409  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] [***[email protected]***]Action updated in DB!
2014-02-11 20:13:34,367  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] callback for action [[email protected]]
2014-02-11 20:13:34,424  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] action completed, external ID [job_201401070500_217582]
2014-02-11 20:13:34,443  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] action produced output
2014-02-11 20:13:34,653  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] Start action [[email protected]] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:35,418  WARN HiveActionExecutor:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] credentials is null for the action
2014-02-11 20:13:38,628  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] checking action, external ID [job_201401070500_217583] status [RUNNING]
2014-02-11 20:13:38,731  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] [***[email protected]***]Action status=RUNNING
2014-02-11 20:13:38,731  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] [***[email protected]***]Action updated in DB!
2014-02-11 20:13:57,659  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] callback for action [[email protected]]
2014-02-11 20:13:57,712  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] action completed, external ID [job_201401070500_217583]
2014-02-11 20:13:57,729  WARN HiveActionExecutor:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10044]
2014-02-11 20:13:57,895  INFO ActionEndXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] ERROR is considered as FAILED for SLA
2014-02-11 20:13:57,964  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111[email protected]] Start action [[email protected]] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:57,965  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] [***[email protected]***]Action status=DONE
2014-02-11 20:13:57,965  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[[email protected]] [***[email protected]***]Action updated in DB!
2014-02-11 20:13:58,036  WARN CoordActionUpdateXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
Finished: FAILURE

wut?

Okay, Java peeps have a predilection for verbose logs, but surely there is a way to make the whole thing a little more readable for poor, poor human eyes… So I quickly hacked the following:


use 5.10.0;

use strict;

my @lines = map { parse_line($_) } <>;

# the job id is big, and doesn't give us much, remove
for my $useless ( map { $_->{JOB} } @lines ) {
    for ( @lines ) {
        $_->{msg} =~ s/\Q$useless//g;
        $_->{ACTION} =~ s/^\Q$useless//;
    }
}

my %previous;
for my $l ( @lines ) {
    # we'll only print metadata that changed
    my @changes = grep { $l->{$_} ne $previous{$_} } 
                       qw/ USER GROUP TOKEN APP JOB ACTION /;

    say join ' ', map { $_ . "[" . $l->{$_} . "] " } @changes if @changes;

    say "\t", $l->{time}, " ", $l->{msg};
    %previous = %$l;
}

sub parse_line {
    my $line = shift;

    # try to parse the line as a typical log line
    my( $time, $info ) = $line =~ 
                         /^\d{4}-\d{2}-\d{2}\s*  # the date. Don't care
                           (\d+:\d\d:\d\d)       # the time, More interesting
                           ,\d+\s*.*?-           # log level and stuff. Meh
                           (.*)                  # the message itself
                         /x
        or return ();

    my %data = ( time => $time );

    # capture some repeated metadata
    for my $k ( qw/ USER GROUP TOKEN APP JOB ACTION / ) {
        $data{$k} = $1 if $info =~ s/$k\[(.*?)\]\s*//;
    }

    # useless and long, scrap it
    $info =~ s/\[\*{3}.*?\*{3}\]//;

    $data{msg} = $info;

    return \%data;
}

And there we go,  a log trace that is a mite easier on the eyes…


$ perl filter oozie_mess.log
USER[running_user]  GROUP[-]  APP[some-big-job-workflow]  JOB[0004636-140111040403753-oozie-W]  ACTION[@:start:] 
    20:13:14  Start action [@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:14  Action status=DONE
    20:13:14  Action updated in DB!
ACTION[@a-first-action] 
    20:13:14  Start action [@a-first-action] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:15  credentials is null for the action
    20:13:18  checking action, external ID [job_201401070500_217582] status [RUNNING]
    20:13:18  Action status=RUNNING
    20:13:18  Action updated in DB!
USER[-]  TOKEN[-]  APP[-] 
    20:13:34  callback for action [@a-first-action]
USER[running_user]  TOKEN[]  APP[some-big-job-workflow] 
    20:13:34  action completed, external ID [job_201401070500_217582]
    20:13:34  action produced output
ACTION[@some-action] 
    20:13:34  Start action [@some-action] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:35  credentials is null for the action
    20:13:38  checking action, external ID [job_201401070500_217583] status [RUNNING]
    20:13:38  Action status=RUNNING
    20:13:38  Action updated in DB!
USER[-]  TOKEN[-]  APP[-] 
    20:13:57  callback for action [@some-action]
USER[running_user]  TOKEN[]  APP[some-big-job-workflow] 
    20:13:57  action completed, external ID [job_201401070500_217583]
    20:13:57  Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10044]
    20:13:57  ERROR is considered as FAILED for SLA
ACTION[@fail] 
    20:13:57  Start action [@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:57  Action status=DONE
    20:13:57  Action updated in DB!
ACTION[-] 
    20:13:58  E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

4 Comments. Leave new

None of the “wtf” is Java’s fault. This is how Oozie is logging.
Please open a Jira ticket with how you’d like things to change so we can fix it! A patch would be even nicer :)

Reply
Yanick Champoux
February 13, 2014 10:11 am

[Not Java’s fault] Maybe not technically, but I think it’s hard to argue the fact that the Java sphere and its logging underworld are inclined to, ahem, let’s call it prolixity. ;-)

Also, I have to be fair: I just looked at that specific log. I’m pretty sure there are options and knobs one can adjust to make the logs lean more toward human consumption. … right?

[patches are welcome] … you realize how dangerous it is to say something like this to a guy like me, right? 8-D

Reply
Andrew Tindle
June 24, 2014 4:43 pm

Looks great – but when I ran it against your example oozie.log, it output nothing.

Did a bit of debugging and found that this snippet:

my( $time, $info ) = /^d{4}-d{2}-d{2}s* # the date. Don’t care
(d+:dd:dd) # the time, More interesting
,d+s*.*?- # log level and stuff. Meh
(.*) # the message itself
/x
or return ();

was always doing the “return ();”

Any advice ?

Thanks
Andrew

Reply
Yanick Champoux
June 25, 2014 7:59 am

Gah! I probably went overboard when I cleaned the code for the blog entry. That line you spotted is indeed the culprit, and it’s missing only one tiny thing: the variable on which we do the regex.:

my( $time, $info ) = $line =~ /^d{4}-d{2}-d{2}s*
(d+:dd:dd)
,d+s*.*?-
(.*)
/x
or return ();

That should already work muuuch better.

I’m currently at YAPC (the Perl conference), so my keyboard time is limited, but I’ll try to fix the example in the blog entry as soon as I’m back.

And mucho thanks for the bug report! :-)

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *