It’s a fact that technology is always evolving—rapidly. What’s new and hot today, may be old news and on its way to becoming obsolete tomorrow. Traditional data warehousing is no exception. We have been seeing that the old school data…Read More >
As the Director of Big Data and Data Science at Pythian, I often get questions from clients about the many solutions available to them to address their big data needs. Between Hadoop, cloud-based, and hybrid solutions, finding the best option…Read More >
Data comes in different shapes. One of the these shapes is called a time series. Time series is basically a sequence of data points recorded over time. If, for example, you measure the height of the tide every hour for…Read More >
One of the common tasks in data processing is to calculate the number of days between two given dates. You can easily achieve this by using Hive DATEDIFF function. You can also get weekday number by using this more obscure…Read More >
Building a secure Hadoop cluster requires protecting a number of services which comprise Hadoop infrastructure. If you are using CDH distribution, then Cloudera Manager (CM) is one of the components that needs to be secured. There is a good step by step guide in CM documentation, and it’s easy to follow for one server, but what when you have hundreds of them? There are different approaches to the problem of managing server’s configuration at scale, but I’d like to focus on Ansible which is a neat framework for parallel commands execution and complex rollouts.Read More >
HDFS authentication model changed in recent releases, but documentation is stale which can lead people into thinking HDFS is using very primitive authenticationRead More >
I was presented with test results that showed that IN query was about 100 times faster than OR query. Where OR query took minutes to run, IN query took seconds! Okay, I said to myself, it is time to start digging. Here are my findings.Read More >
I spent last week at Collaborate 2012 in Las Vegas, and it was a really great experience in many ways. I am a MySQL DBA and have been working with MySQL for most of my career, so Collaborate didn’t seem like an obvious choice. It turned out that I had so much to learn from Oracle professionals and the Oracle community that could be applied in the MySQL world. For me, an indication of a good conference is when you come back inspired and full of ideas.Read More >
I had to refresh my knowledge on how InnoDB threads queue works the other day when debugging activity spikes on one of the customer’s production system and while I had general idea about InnoDB kernel and queue, thread concurrency and queue join delays I didn’t have a complete model of how InnoDB concurrency control works. So I started from manual…Read More >
MySQL Replication is a powerful tool and it’s hard to find a production system not using it. On the other hand debugging replication issues can be very hard and time consuming. Especially if your replication setup is not straightforward and you are using filtering of some kind. Let’s look at an issue I had..Read More >
© Copyright 2019 Pythian Group Inc. ® ALL RIGHTS RESERVED.
PYTHIAN®, LOVE YOUR DATA®, and ADMINISCOPE® are trademarks and registered trademarks owned by Pythian in North America and certain other countries, and are valuable assets of our company. Other brands, product and company names on this website may be trademarks or registered trademarks of Pythian or of third parties. Use of trademarks without permission is strictly prohibited.