Building a secure Hadoop cluster requires protecting a number of services that comprise the Hadoop infrastructure. If you are using CDH distribution, then Cloudera Manager (CM) is one of the components that needs to be secured. CM consists of Server and Agents running on all cluster machines and web UI. CM provides you with 3 levels of security for its communication (good/better/best):
- Transport Layer Security (TLS) Encryption for Cloudera Manager
- TLS Authentication of Server to Agent and Users
- TLS Authentication of Agents to Server
There is a good step-by-step guide in CM’s documentation, and it’s easy to follow for one server. But what about when you have hundreds of them? There are different approaches to the problem of managing a server’s configuration to scale, but I’d like to focus on Ansible, which is a neat framework for parallel command execution and complex rollouts. And it’s written in Python! Ansible is easy to install and requires only a couple of Python libraries on a “master” node and nothing other than python2.6 on “slaves”. You don’t have to setup ssh-keys and configure passwordless access across all machines, which is important when talking about security. To give you and idea of how Ansible command looks like, here is how you can restart CM Agents on all Hadoop data nodes:
ansible hadoop_data_nodes -m service\ -a "name=cloudera-scm-agent state=restarted" -k --sudo
This command will read a list of hosts from your /etc/ansible/hosts file or any other file specified by ANSIBLE_HOSTS environment variable. It will find a section [hadoop_data_nodes] in the hosts file (which has ini-like structure) and will execute any given command for all servers in this section. Ansible will ask you for both ssh and sudo passwords only once and will use them to execute commands on target servers. Ansible relies on modules (specified by -m option) to perform specific tasks, like restarting services, executing shell commands, or manipulating text files. A full list of modules can be found here. You can write your own, of course, if you need.
Below is the step-by-step guide on configuring all 3 levels of TLS for CM. In some aspects, it repeats steps from CM’s documentation guide, but there are some important nuances that I have discovered. This guide is also scalable: You can apply it with very little modifications to clusters of any size.
Level I. TLS Encryption for Cloudera Manager
The following steps assume that you have installed Ansible on the same server as your CM Server and properly configured it to access all nodes in your cluster.
- Create a directory for CM keystore:
- Generate a certificate for CM. You will be prompted for a new Keystore password. Also make sure that the CN field you will be asked to specify matches the CM Server hostname. It’s not *your* name!
keytool -validity 1095 -keystore\ /etc/cloudera-scm-server/keystore/scm-keystore -alias jetty\ -genkeypair -keyalg RSA
note -validity option, new certificate will be valid for 3 years in this case
- Restrict permissions to the Keystore:
chown -R cloudera-scm:cloudera-scm /etc/cloudera-scm-server/keystore/ chmod o-r /etc/cloudera-scm-server/keystore/scm-keystore
- Enable TLS Encryptions for Agents and provide path and password to the Keystore in CM web UI. You can refer to CM’s documentation on how to do this.
- Next, we need to update CM Agent configuration files to set use_tls=1 option. For this, Ansible lineinfile module can be used:
ansible hadoop -m lineinfile -a\ "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='use_tls.*'\ line='use_tls=1'" -k -K
There are several assumptions made here. First of all, your Ansible host list has a [hadoop] section in it, which covers all Hadoop cluster nodes or has sections like [hadoop_namenodes], [hadoop_datanodes], etc. Ansible can recognize patterns. Second, your template for CM Agnet config.ini files has all security related options in it, but they are commented out. Maintaining a template for all configuration files is a good idea, and Ansible can help you here as well, but it’s beyond the scope of this guide.
- Restart CM Server (again, I assume that you run Ansible commands from the same machine):
sudo /etc/init.d/cloudera-scm-agent restart
- Restart CM Agents and check their health:
ansible hadoop -m service -a\ 'name=cloudera-scm-agent state=restarted' -k -K; ansible hadoop -m shell -a\ "tail /var/log/cloudera-scm-agent/cloudera-scm-agent.log" -k -K
You have already seen an example with service Ansible module. The new module used here is shell. It allows you to run arbitrary shell commands. It’s a good idea to verify that all agents started fine, so you can check the log files on all servers.
If all is fine, at this point you have encrypted communication between CM Server and CM Agents.
Level II. TLS Authentication of Server to Agent and Users
At this level, you will force CM Agents to check a certificate for CM Server to make sure they are talking to the right machine.
- Export CM Server Certificate from Keystore and convert it to .der format:
keytool -exportcert -keystore\ /etc/cloudera-scm-server/keystore/scm-keystore -alias jetty\ -file scm-server.der openssl x509 -out scm-server.pem -in scm-server.der -inform der
- Create new cert dirs on all agent servers:
ansible hadoop -m shell -a "mkdir /etc/cloudera-scm-agent/cert" -k -K
- Copy server cert to agents:
ansible hadoop -m copy -a "src=/tmp/scm-server.pem dest=/etc/cloudera-scm-agent/cert/ owner=root group=root" -k -K
- Change very_cert_file option in agent’s config.ini:
ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='verify_cert_file' line='verify_cert_file=/etc/cloudera-scm-agent/cert/scm-server.pem'" -k -K
- Enabled Use TLS Encryption for Admin Console in CM Web UI
- Restart CM server and Agents. See Steps 6-7 in Level I.
Level III. TLS Authentication of Agents to Server
This is similar to the previous level, but requires certificates to be generated for all agents so they can authenticate to the CM Server. Here I describe an approach with self-signed certificates, which is fine for development or POC clusters. For production clusters, you may need to comply with your organisation’s standards and obtain properly signed certificates.
- Generate password for Agent keys and copy it to all agents machines:
ansible hadoop -m shell -a\ 'echo PASSWORD > /etc/cloudera-scm-agent/cert/agent_cert.pwd' -k -K ansible hadoop -m shell -a\ 'chmod o-rx /etc/cloudera-scm-agent/cert/agent_cert.pwd' -k -K
- The next step is to generate a private key and certificate for each CM Agent. To automate this task I wrote a quick script — https://github.com/dazbur/morecerts. It takes a list of hosts in a plain text file and produces a private key and certificate for each host. Resulting files will have agent_HOSTNAME.key and agent_HOSTNAME.pem format. It also takes care of providing Distinguished Name options for keys-generating commands:
- Create a text file with list of Agent’s IP adress. You can just copy it from Ansible hosts lists.
- Create agent key password and put it into a text file.
- Generate keys:
./morecerts.py -f agentiplist.file -p agent.pass.file gencerts
- Add keys to CM Server keystore:
sudo ./morecerts.py -f agentiplist.file -p agent.pass.file -k\ /etc/cloudera-scm-server/keystore/scm-keystore\ -w KEYSTORE_PASSOWORD addtokeystore
- Copy keys and certs to agent machines (assuming you have generated keys in your home directory):
ansible hadoop -m copy -a "src=~/agent_$inventory_hostname.key dest=/etc/cloudera-scm-agent/cert/agent.key owner=root group=root" -k -K ansible hadoop -m copy -a "src=~/agent_$inventory_hostname.pem dest=/etc/cloudera-scm-agent/cert/agent.pem owner=root group=root" -k -K
Here you can see another nice trick Ansible can do: $inventory_hostname variable corresponds to the server on which command will be executed. This allows you to copy specific files to specific servers.
- Update agent config.ini files to set client_key_file, client_keypw_file, and client_cert_file options:
ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='client_key_file.*' line='client_key_file=/etc/cloudera-scm-agent/cert/agent.key'" -k -K ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='client_keypw_file.*' line='client_keypw_file=/etc/cloudera-scm-agent/cert/agent_cert.pwd'" -k -K ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='client_cert_file.*' line='client_cert_file=/etc/cloudera-scm-agent/cert/agent.pem'" -k -K
- Update CM’s configuration via Web UI to set Use TLS Authentication of Agents to Server, Path to Truststore, and Truststore Password. Truststore is the same as Keystore in our case.
- Restart CM Server and Agents.
You are done now! Ansible can also wrap individual playbooks, so you can execute them on demand later. In general, I found Ansible to be a great tool to manage and execute commands on many servers, and it is definitely worth exploring.
Discover more about our expertise in Hadoop
Thanks for the writeup!
Perhaps even easier and cleaner if converted to a playbook and using the file module versus calling chmod. You’ll also get upgraded output that way too
It’s great to see Pythian using Ansible! :) Such an awesome tool.
In a couple of steps instead of using shell with mkdir/chmod/chown you can use ‘file’ module with appropriate params.
Thanks for the feedback! Next step is definitely converting these into a playbook.
How’s that playbook coming along? ;-)
I’ve written a playbook: https://github.com/analytically/hadoop-ansible