In the course of a large cluster database administrator’s job, there are dozens of times a week it can be useful to visualise some data. You’re constantly working with machines that have hundreds of databases, directories, files, log files with often millions of entries each.
Wouldn’t it be nice if you could visualise these situations?
You have dozens of directories full of files. What’s the relative size of each?
You have a logfile with 1.5M entries. You grep out “ERROR” and that is 50,000 timestamped entries. What times of day have the most errors?
You have a process list with 30,000 entries, comprised of 7 unique commands. What is the relative frequency of each command?
You have a database with 10,000 tables, each having several million rows. Which are the 30 biggest, and what is their relative size?
You have slow query logs spanning a couple of weeks. You wish to know the days and times-of-day that had the most slow queries.
Typically, you can use awk, sort, grep -c, and uniq -c to give some output that contains “keys” and “values” and you can eyeball it to get a general sense. In fact, if the output is decimal, the number of digits is a base-10 logarithmic graph of the relative sizes of things. But sometimes that isn’t good enough. For example, a 7x difference will not be obvious from eyeballing a bunch of numbers.
You can copy the output, paste into a CSV, load it up in Libre Office Calc, and generate graphs if you want to spend another couple minutes every command to generate the graphs. Wouldn’t it be nice to just do it right there in the terminal where you already are?
If you did web searches for this sort of thing, up until a couple of weeks ago, the best you’d’ve found was a page referencing an amazing awk script that takes a list of numeric keys and presents an ASCII histogram. However, it couldn’t take non-numeric keys, and it could not help with several of the use-cases presented above.
Enter the Palomino-written “distribution.” It will take a large input, munge it, tokenise it, tally it, and give you beautiful in-terminal graphs with ANSI colour coding, unicode partial-width characters, and more. So go to the project page linked, download it, and play with it. You’ll find plenty of examples, and using a little imagination, you will begin to get a different view of the log files you’re constantly working with.
Happy administering those database clusters!