Lately, several of our security conscious clients have expressed a desire to install and/or upgrade their Hadoop distribution on cluster nodes that do not have access to the internet. In such cases the installation needs to be performed using local repositories. Since I could not find a step-by-step procedure to accomplish this I thought I would publish it myself.
The following step-by-step procedure has been implemented using the following configuration and specifications:
Existing Version of Cloudera Manager: 5.4.3
Existing Version of CDH: 5.4.2
Upgrade to Version of Cloudera Manager: 5.5.0
Upgrade to Version of CDH: 5.5.0
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.6 (Santiago)
# cat /proc/version
Linux version 2.6.32-504.16.2.el6.x86_64 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC) ) #1 SMP Tue Mar 10 17:01:00 EDT 2015
Upgrade Steps
We will be completing the upgrade of the cluster in two steps. In the first step only Cloudera Manager will be upgraded to version 5.5. Once the cluster has been verified to be functional with Cloudera Manager 5.5 then we will upgrade CDH to version 5.5.
1. Upgrade Cloudera Manager
1. Let’s start by creating the local repository for Cloudera Manager. Download latest version of Cloudera Manager from link below on Local Repository Host:
2. Copy downloaded files to pub/repos/cloudera-manager directory on Local Repository Host. After that start a local web server with pub/repos root directory. You may use any webserver including Python SimpleHTTPServer or Apache. Following are steps to use the SimpleHTTPServer:
# cd pub/repos
# nohup python -m SimpleHTTPServer 8000 &
Expected output for https://Local Repository Host:8000/pub/repos/cloudera-manager
Expected output for https://Local Repository Host:8000/pub/repos/cloudera-manager/RPMS/x86_64
3. Make sure the local repository for Cloudera Manager is set as:
4. Log on to Cloudera Manager. Stop Cloudera Management Service:
5. Make sure all services are stopped. Sample screens after stopping below:
6. Stop the Hadoop Cluster:
7. SSH to Cloudera Manager Server. Stop Cloudera Manager Service:
# sudo service cloudera-scm-server status
cloudera-scm-server (pid 6963) is running…
# sudo service cloudera-scm-server stop
Stopping cloudera-scm-server: [ OK ]
# sudo service cloudera-scm-server status
cloudera-scm-server is stopped
8. Before proceeding with the upgrade make sure you backup the Cloudera Manager Databases used by CDH services like Hive Metastore, Oozie, Sentry etc.
9. When you are ready to upgrade issue command to upgrade Cloudera Manager:
# service cloudera-scm-server start
Starting cloudera-scm-server: [ OK ]
12. Monitor the Cloudera Manager Server Log for errors. The Cloudera Manager Server console is ready for use once you see the “Started Jetty Server” message in the log:
13. Log on to Cloudera Manager. You should now see the following screen. Note the running version:
14. Choose Option as below to upgrade Cloudera Manager Agents. Press Continue:
15. Choose Custom Repository:
In first box add: https://Local Repository Host:8000/pub/repos/cloudera-manager
In second box add: https://Local Repository Host:8000/pub/repos/cloudera-manager/RPM-GPG-KEY-cloudera
Press Continue.
16. Check JDK/Java options as below and press Continue:
17. Provide SSH credentials and Press Continue:
18. Cloudera Manager will now upgrade the Agents:
19. Verify Completion. Press Continue:
20. Inspect Hosts for Correctness. Press Continue:
21. You should now see a Confirmation Screen as below:
2. Create/Refresh the local repository for Cloudera Manager by copying the downloaded files to pub/repos/cloudera-cdh5/ directory on Local Repository Host.
Expected output for https://Local Repository Host:8000/pub/repos/cloudera-cdh5/
3. Back up HDFS metadata using the following command:
$ whoami
hdfs
$ hdfs dfsadmin -fetchImage ~
15/11/27 19:23:58 INFO namenode.TransferFsImage: Opening connection to https://ip-10-169-250-118.ec2.internal:50070/imagetransfer?getimage=1&txid=latest
15/11/27 19:23:58 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
15/11/27 19:23:58 INFO namenode.TransferFsImage: Transfer took 0.09s at 2715.91 KB/s
$ ls -l
total 244
-rw-rw-r–. 1 hdfs hdfs 244838 Nov 27 19:23 fsimage_0000000000000015418
4. Backup databases used for the various CDH services. The following screen shows the databases details used for various services like Oozie, HUE, Sentry etc:
5. Log on to Cloudera Manager.
6. Verify the parcel download setting is pointing to the local repository for CDH. Press the Parcels icon on the Cloudera Manager Home Page. Press Edit settings:
7. Choose the following Option to start upgrade of CDH:
8. Choose version 5.5:
9. Make sure you have backed up all databases:
10. The following screen indicates that we are all set to proceed. Press Continue:
11. CDH Version 5.5 parcels will now be downloaded, distributed to all nodes and unpacked. Press Continue:
12. Hosts will be inspected for correctness. Press Continue:
13. Verify that no party is using the HH-TEST Cluster. Choose Full Cluster Restart. Press Continue:
14. The HH-TEST cluster will now be stopped. Upgraded and restarted. Press Continue:
15. Confirmation screen show now show the upgraded version of CDH. Press Continue:
Hi Manoj,
Steps for CM upgrade are good.
I generally go thru the website on daily basis. I am looking for CDH upgrade. Do we have that in this site?
Thanks,
Kiran MS
4 Comments. Leave new
Hi Manoj,
Steps for CM upgrade are good.
I generally go thru the website on daily basis. I am looking for CDH upgrade. Do we have that in this site?
Thanks,
Kiran MS
Hi Kiran:
CDH upgrade steps are within the same blog. Please refer to section “Upgrade Cloudera Distribution”.
Manoj
Hi Manoj,
it’s a great guide, very detailed! Thanks a lot your time and for share with us your knowledge!
Kind regards,
Alex
Crystal Clear document … Well Done!