Cloudera Challenge 2014

Posted in: Big Data, Technical Track

Yesterday, Cloudera released the score reports for their Data Science Challenge 2014 and I was really ecstatic when I received mine with a “PASS” score! This was a real challenge for me and I had to put a LOT of effort into it, but it paid off in the end!

Note: I won’t bother you in this blog post with the technical details of my submission. This is just an account of how I managed to accomplish it. If you want the technical details, you can look here.

Once upon a time… I was a DBA

I first learned about the challenge last year, when Cloudera ran it for the first time. I was intrigued, but after reading more about it I realised I didn’t have what it would be required to complete the task successfully.

At the time I was already delving into the Hadoop world, even though I was still happily working as an Oracle DBA at Pythian. I had studied the basics and the not-so-basics of Hadoop, and the associated fauna and had just passed my first Hadoop certifications (CCDH and CCAH). However, there was (and is) still so much to learn! I knew that to take the challenge I would have to invest a lot more time into my studies.

“Data Science” was still a fuzzy buzzword for me. It still is, but at the time, I had no idea about what was behind it. I remember reading this blog post about how to become a data scientist. A quick look at the map in that post turned me off: apart from the “Fundamentals” track in it, I had barely idea what the rest of the map was about! There was a lot of work to do to get there.

There’s no free lunch

But as I started reading more about Data Science, I started to realise how exciting it was and how interesting were the problems it could help tackle. By now I had already put my DBA career on hold and joined the Big Data team. I felt a huge gap between my expertise as a DBA and my skills as a Big Data engineer, so I put a lot of effort in studying the cool things I wanted to know more about.

The online courses at Coursera, Edx, Stanford and the like were a huge help and soon I started wading through courses and courses, sometime many at once: Scala, R, Python, more Scala, data analysis, machine learning, and more machine learning, etc… That was not easy and it was a steep learning curve for me. The more I read and studied I realised there was many times more to learn. And there still is…

The Medicare challenge

But when Cloudera announced the 2014 Challenge, early this year, I read the disclaimer and realised that this time I could understand it! Even though I had just scratched the surface of what Data Science is meant to encompass, I actually had tools to attempt tackling the challenge.

“Studies shall not stop!!!”, I soon found, as I had a lot more to learn to first pass the written exam (DS-200) and then tackle the problem proposed by the challenge: to detect fraudulent claims in the US Medicare system. It was a large undertaking but I took it one step at a time, and eventually managed to complete a coherent and comprehensive abstract to submit to Cloudera, which, as I gladly found yesterday, was good enough to give me a passing score and the “CCP: Data Scientist” certification from Cloudera!

I’m a (Big Data) Engineer

What’s next now? I have only one answer: Keep studying. There’s so much cool stuff to learn. From statistics (yes, statistics!) to machine learning, there’s still a lot I want to know about and that keeps driving me forward. I’m not turning into a Data Scientist, at least not for a while. I am an Engineer at heart; I like to fix and break things at work and Data Science is one more of those tools I want to have to make my job more interesting. But I want to know more about it and learn how to use it properly, at least to avoid my Data Scientist friends cringing away every time I tell tell I’m going to run an online logistic regression!

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

DBA since 1998, having worked with Oracle from version 7.3.4 to the latest one. Working at Pythian since 2009.

9 Comments. Leave new

Andrey Goryunov
July 22, 2014 10:44 pm

Congratulations Andre!

André Araújo
July 23, 2014 2:53 am

Thanks, Andrey


How did you prepare for DS-200?

André Araújo
July 24, 2014 4:18 am

Hi, Tim,

I mainly followed the exam study guide.

A few days before the test I also took the practice test, which helped a lot. In hindsight, I should’ve taken the practice test much before, and use it to guide my studies as well.


Kamran Bakhshandeh
July 24, 2014 7:18 pm

Congratulations Andre , you are the rock man !
Also that was very interesting post.


Congratulations Andre

André Araújo
July 29, 2014 4:23 pm

Thanks, Patil!


Congrats Andrey..


BNR IT the android application development company in hyderabad we are well expertise in
developing e-comerce apps,booking apps and business apps at any compelxity.For more details refer at
web design services


Leave a Reply

Your email address will not be published. Required fields are marked *