Cloudera Enterprise 5.15.x | Other versions

Upgrading from a Release Lower than CDH 5.4.0 to the Latest Release

Important: Use the instructions on this page to upgrade only from a CDH 5 release earlier than CDH 5.4.0. To upgrade from other releases, see Upgrading from CDH 5.4.0 or Higher to the Latest Release.

Continue reading:

Step 1: Prepare the Cluster for the Upgrade
Step 2: If Necessary, Download the CDH 5 "1-click" Package on Each of the Hosts in the Cluster
Step 3: Upgrade the Packages on the Appropriate Hosts
Step 4: In an HA Deployment, Upgrade and Start the JournalNodes
Step 5: Upgrade the HDFS Metadata
Step 6: Start MapReduce (MRv1) or YARN
Step 7: Set the Sticky Bit
Step 8: Upgrade Components
Step 9: Apply Configuration File Changes if Required
Step 10: Finalize the HDFS Metadata Upgrade
Troubleshooting: If You Missed the HDFS Metadata Upgrade Steps

Step 1: Prepare the Cluster for the Upgrade

Important: Before you begin, read the following topics, which contain important upgrade information:

Put the NameNode into safe mode and save thefsimage:
1. Put the NameNode (or active NameNode in an HA configuration) into safe mode:
```
$ sudo -u hdfs hdfs dfsadmin -safemode enter
```
2. Run a saveNamespace operation:
```
$ sudo -u hdfs hdfs dfsadmin -saveNamespace 
```
  This results in a new fsimage written with no edit log entries.
Shut down Hadoop services across your entire cluster by running the following command on every host in your cluster:
```
$ for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done
```
Check each host to make sure that there are no processes running as the hdfs, yarn, mapred or httpfs users from root:
```
# ps -aef | grep java
```
Ensure that the NameNode service is not running, and then back up the HDFS metadata on the NameNode machine, as follows.
Note: Cloudera recommends backing up HDFS metadata on a regular basis, as well as before a major upgrade.
1. Find the location of your dfs.namenode.name.dir. For example:
```
$ grep -C1 dfs.namenode.name.dir /etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>/mnt/hadoop/hdfs/name</value>
</property>
```
2. Back up the directory. The path inside the <value> XML element is the path to your HDFS metadata. If you see a comma-separated list of paths, you do not need to back up all of them; they store the same data. Back up the first directory by using the following commands:
```
$ cd /mnt/hadoop/hdfs/name
# tar -cvf /root/nn_backup_data.tar .
./ 
./current/
./current/fsimage 
./current/fstime 
./current/VERSION 
./current/edits 
./image/ 
./image/fsimage
```
  Important: If you see a file containing the word lock, the NameNode is probably still running. Re-run the procedure, beginning at step 1, and make sure that the NameNode service is not running before backing up the HDFS metadata.

Step 2: If Necessary, Download the CDH 5 "1-click" Package on Each of the Hosts in the Cluster

Before you begin: Check whether you have the CDH 5 "1-click" repository installed, and proceed as indicated.

Table 1. Checking for the 1-click Repository
Operating System	Command to Run	Results and Actions
RHEL-compatible	`rpm -q cdh5-repository`	If the command returns, `cdh5-repository-1-0`, the 1-click repository is installed. Skip to Step 3. If the command returns `package cdh5-repository is not installed`, go to the 1-click instructions for your OS: RHEL SLES Ubuntu and Debian
Ubuntu and Debian	`dpkg -l \| grep cdh5-repository`

On RHEL-compatible systems:

Download the CDH 5 "1-click Install" package (or RPM).
Click the appropriate RPM and Save File to a directory with write access (for example, your home directory).

OS Version Link to CDH 5 RPM

RHEL/CentOS/Oracle 5 RHEL/CentOS/Oracle 5 link

RHEL/CentOS/Oracle 6 RHEL/CentOS/Oracle 6 link

RHEL/CentOS/Oracle 7 RHEL/CentOS/Oracle 7 link

OS Version	Link to CDH 5 RPM
RHEL/CentOS/Oracle 5	RHEL/CentOS/Oracle 5 link
RHEL/CentOS/Oracle 6	RHEL/CentOS/Oracle 6 link
RHEL/CentOS/Oracle 7	RHEL/CentOS/Oracle 7 link

Install the RPM for all RHEL versions:

$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

For instructions on how to add a CDH 5 yum repository or build your own CDH 5 yum repository, see Installing CDH 5 On Red Hat-compatible systems.

Note: Clean repository cache.

Before proceeding, clean cached packages and headers to ensure your system repos are up-to-date:

sudo yum clean all

On SLES systems:

Download the CDH 5 "1-click Install" package.
Download the RPM file, choose Save File, and save it to a directory to which you have write access (for example, your home directory).

Install the RPM:

$ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm

Update your system package index by running the following:
```
$ sudo zypper refresh
```

$ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm

For instructions on how to add a repository or build your own repository, see Installing CDH 5 on SLES Systems.

Note: Clean repository cache.

Before proceeding, clean cached packages and headers to ensure your system repos are up-to-date:

sudo zypper clean --all

On Ubuntu and Debian systems:

Download the CDH 5 "1-click Install" package:

OS Version Package Link

Jessie Jessie package

Wheezy Wheezy package

Precise Precise package

Trusty Trusty package
Install the package by doing one of the following:
- Choose Open with in the download window to use the package manager.
- Choose Save File, save the package to a directory to which you have write access (for example, your home directory), and install it from the command line. For example:
```
sudo dpkg -i cdh5-repository_1.0_all.deb
```

OS Version	Package Link
Jessie	Jessie package
Wheezy	Wheezy package
Precise	Precise package
Trusty	Trusty package

For instructions on how to add a repository or build your own repository, see the instructions on installing CDH 5 on Ubuntu and Debian systems.

Important: Clean cached packages and headers to ensure that your system repos are up-to-date: sudo apt-get update

Step 3: Upgrade the Packages on the Appropriate Hosts

Upgrade MRv1, YARN, or both. Although you can install and configure both MRv1 and YARN, you should not run them both on the same set of hosts at the same time.

If you are using HA for the NameNode, do not install hadoop-hdfs-secondarynamenode

Before upgrading MRv1 or YARN: (Optionally) add a repository key on each system in the cluster, if you have not already done so. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:

For Red Hat/CentOS/Oracle 5 systems:

$ sudo rpm --import
https://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera

For Red Hat/CentOS/Oracle 6 systems:

$ sudo rpm --import
https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

For all SLES systems:

$ sudo rpm --import
https://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera

For Ubuntu Precise systems:

$ curl -s
https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key
| sudo apt-key add -

For Debian Wheezy systems:

$ curl -s
https://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key
| sudo apt-key add -

Step 3a: If you are using MRv1, upgrade the MRv1 packages on the appropriate hosts.

Skip this step if you are using YARN exclusively.

Install and deploy ZooKeeper as decribed in ZooKeeper Installation. Cloudera recommends that you install (or update) and start a ZooKeeper cluster, and ZooKeeper is required if you are deploying high availability (HA) for the NameNode or JobTracker.

Install each daemon package on the appropriate systems, as follows.

Where to install	Install commands
JobTracker host running:
Red Hat/CentOS compatible	$ sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-jobtracker
SLES	$ sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-jobtracker
Ubuntu or Debian	$ sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-jobtracker
NameNode host running:
Red Hat/CentOS compatible	$ sudo yum clean all; sudo yum install hadoop-hdfs-namenode
SLES	$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode
Ubuntu or Debian	$ sudo apt-get update; sudo apt-get install hadoop-hdfs-namenode
Secondary NameNode host (if used) running:
Red Hat/CentOS compatible	$ sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode
SLES	$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode
Ubuntu or Debian	$ sudo apt-get update; sudo apt-get install hadoop-hdfs-secondarynamenode
All cluster hosts except the JobTracker, NameNode, and Secondary (or Standby) NameNode hosts, running:
Red Hat/CentOS compatible	$ sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
SLES	$ sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
Ubuntu or Debian	$ sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
All client hosts, running:
Red Hat/CentOS compatible	$ sudo yum clean all; sudo yum install hadoop-client
SLES	$ sudo zypper clean --all; sudo zypper install hadoop-client
Ubuntu or Debian	$ sudo apt-get update; sudo apt-get install hadoop-client

Step 3b: If you are using YARN, upgrade the YARN packages on the appropriate hosts.

Skip this step if you are using MRv1 exclusively.

Install and deploy ZooKeeper as decribed in ZooKeeper Installation. Cloudera recommends that you install (or update) and start a ZooKeeper cluster, and ZooKeeper is required if you are deploying high availability (HA) for the NameNode or JobTracker.

Install each type of daemon package on the appropriate systems(s), as follows.

Where to install	Install commands
Resource Manager host (analogous to MRv1 JobTracker) running:
Red Hat/CentOS compatible	`$ sudo yum clean all; sudo yum install hadoop-yarn-resourcemanager`
SLES	`$ sudo zypper clean --all; sudo zypper install hadoop-yarn-resourcemanager`
Ubuntu or Debian	`$ sudo apt-get update; sudo apt-get install hadoop-yarn-resourcemanager`
NameNode host running:
Red Hat/CentOS compatible	`$ sudo yum clean all; sudo yum install hadoop-hdfs-namenode`
SLES	`$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode`
Ubuntu or Debian	`$ sudo apt-get update; sudo apt-get install hadoop-hdfs-namenode`
Secondary NameNode host (if used) running:
Red Hat/CentOS compatible	`$ sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode`
SLES	`$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode`
Ubuntu or Debian	`$ sudo apt-get update; sudo apt-get install hadoop-hdfs-secondarynamenode`
All cluster hosts except the Resource Manager (analogous to MRv1 TaskTrackers) running:
Red Hat/CentOS compatible	`$ sudo yum clean all; sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce`
SLES	`$ sudo zypper clean --all; sudo zypper install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce`
Ubuntu or Debian	`$ sudo apt-get update; sudo apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce`
One host in the cluster running:
Red Hat/CentOS compatible	`$ sudo yum clean all; sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver`
SLES	`$ sudo zypper clean --all; sudo zypper install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver`
Ubuntu or Debian	`$ sudo apt-get update; sudo apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver`
All client hosts, running:
Red Hat/CentOS compatible	`$ sudo yum clean all; sudo yum install hadoop-client`
SLES	`$ sudo zypper clean --all; sudo zypper install hadoop-client`
Ubuntu or Debian	`sudo apt-get update; sudo apt-get install hadoop-client`

Note:

The hadoop-yarn and hadoop-hdfs packages are installed on each system automatically as dependencies of the other packages.

Step 4: In an HA Deployment, Upgrade and Start the JournalNodes

Install the JournalNode daemons on each of the machines where they will run.
To install JournalNode on RHEL-compatible systems:
```
$ sudo yum install hadoop-hdfs-journalnode
```
To install JournalNode on Ubuntu and Debian systems:
```
$ sudo apt-get install hadoop-hdfs-journalnode 
```
To install JournalNode on SLES systems:
```
$ sudo zypper install hadoop-hdfs-journalnode
```
Start the JournalNode daemons on each of the machines where they will run:
```
sudo service hadoop-hdfs-journalnode start 
```

Wait for the daemons to start before proceeding to the next step.

Important:

In an HA deployment, the JournalNodes must be up and running CDH 5 before you proceed.

Step 5: Upgrade the HDFS Metadata

The steps for upgrading HDFS metadata differ for HA and non-HA deployments.

Section 5a: Upgrade the HDFS Metadata for HA Deployments

Make sure that the JournalNodes have been upgraded to CDH 5 and are up and running.
Run the following command on the active NameNode only:
```
$ sudo service hadoop-hdfs-namenode upgrade
```
Warning:
In an HDFS HA deployment, it is critically important that you do this on only one NameNode.
Monitor the progress of the metadata upgrade by running the following:
```
$ sudo tail -f /var/log/hadoop-hdfs/hadoop-hdfs-namenode-<hostname>.log 
```
Look for a line that confirms the upgrade is complete, such as: /var/lib/hadoop-hdfs/cache/hadoop/dfs/<name> is complete.
The NameNode upgrade process can take a while, depending on the number of files.

Wait for NameNode to exit safe mode, and then restart the standby NameNode.

If Kerberos is enabled:

$ kinit -kt /path/to/hdfs.keytab hdfs/<fully.qualified.domain.name@YOUR-REALM.COM> && hdfs namenode -bootstrapStandby

$ sudo service hadoop-hdfs-namenode start

If Kerberos is not enabled:

$ sudo -u hdfs hdfs namenode -bootstrapStandby
$ sudo service hadoop-hdfs-namenode start

Start the DataNodes by running the following command on each DataNode:
```
$ sudo service hadoop-hdfs-datanode start
```

Section 5b: Upgrade the HDFS Metadata for Non-HA Deployments

Run the following command on the NameNode:

$ sudo service hadoop-hdfs-namenode upgrade

Monitor the progress of the metadata upgrade by running the following:
```
$ sudo tail -f /var/log/hadoop-hdfs/hadoop-hdfs-namenode-<hostname>.log 
```
Look for a line that confirms the upgrade is complete, such as: /var/lib/hadoop-hdfs/cache/hadoop/dfs/<name> is complete.
The NameNode upgrade process can take a while, depending on the number of files.
Start the DataNodes by running the following command on each DataNode:
```
$ sudo service hadoop-hdfs-datanode start
```
Wait for NameNode to exit safe mode, and then start the secondary NameNode:
1. To check that the NameNode has exited safe mode, look for messages in the log file, or the NameNode's web interface, that say "...no longer in safe mode.
2. To start the secondary NameNode, enter the following command on the secondary NameNode host:
```
$ sudo service hadoop-hdfs-secondarynamenode start
```

Step 6: Start MapReduce (MRv1) or YARN

You are now ready to start and test MRv1 or YARN and the MapReduce JobHistory Server.

Important:

Do not run MRv1 and YARN on the same set of hosts at the same time. This degrades performance and can result in an unstable cluster deployment. Steps 6a and 6b are mutually exclusive.

Step 6a: Start MRv1

Start each TaskTracker:

$ sudo service hadoop-0.20-mapreduce-tasktracker start

Start each JobTracker:

$ sudo service hadoop-0.20-mapreduce-jobtracker start

Verify that the JobTracker and TaskTracker started properly:
```
$ sudo jps | grep Tracker
```
If the permissions of directories are not configured correctly, the JobTracker and TaskTracker processes start and immediately fail. If this happens, check the JobTracker and TaskTracker logs and set the permissions correctly.

Verify basic cluster operation for MRv1.

Before running production jobs, verify basic cluster operation by running an example from the Apache Hadoop web site.

Important:

For important cluster configuration information, see Deploying MapReduce v1 (MRv1) on a Cluster.

Create a home directory on HDFS for user joe:

$ sudo -u hdfs hadoop fs -mkdir -p /user/joe 
$ sudo -u hdfs hadoop fs -chown joe /user/joe

Perform steps a through f as user joe.

Make a directory in HDFS called input and copy some XML files into it by running the following commands:

$ hadoop fs -mkdir input 
$ hadoop fs -put /etc/hadoop/conf/*.xml input 
$ hadoop fs -ls input 
Found 3 items: 
-rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-site.xml
-rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-site.xml 
-rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-site.xml

Run an example Hadoop job to grep with a regular expression in your input data:

$ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar grep input output 'dfs[a-z.]+'

After the job completes, find the output in the HDFS directory named output which you specified to Hadoop:

$ hadoop fs -ls 
Found 2 items 
drwxr-xr-x - joe supergroup 0 2009-08-18 18:36 /user/joe/input 
drwxr-xr-x - joe supergroup 0 2009-08-18 18:38 /user/joe/output

List the output files:

$ hadoop fs -ls output 
Found 2 items 
drwxr-xr-x - joe supergroup 0 2009-02-25 10:33 /user/joe/output/_logs 
-rw-r--r-- 1 joe supergroup 1068 2009-02-25 10:33 /user/joe/output/part-00000 
-rw-r--r- 1 joe supergroup 0 2009-02-25 10:33 /user/joe/output/_SUCCESS

Read the results in the output file; for example:

$ hadoop fs -cat output/part-00000 | head 
1 dfs.datanode.data.dir 
1 dfs.namenode.checkpoint.dir 
1 dfs.namenode.name.dir 
1 dfs.replication 
1 dfs.safemode.extension 
1 dfs.safemode.min.datanodes

This confirms your cluster is successfully running CDH 5.

Important:

If you have client hosts, make sure you also update them to CDH 5, and upgrade the components running on those clients.

Step 6b: Start MapReduce with YARN

If you have not already done so, create directories and set the correct permissions.

Note: For more information about YARN configuration and permissions, see Deploying MapReduce v2 (YARN) on a Cluster.

Create a history directory and set permissions; for example:

$ sudo -u hdfs hadoop fs -mkdir -p /user/history 
$ sudo -u hdfs hadoop fs -chmod -R 1777 /user/history  
$ sudo -u hdfs hadoop fs -chown yarn /user/history

Create the /var/log/hadoop-yarn directory and set ownership:
```
$ sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn  
$ sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn 
```
You create this directory because it is the parent of /var/log/hadoop-yarn/apps, which is explicitly configured in the yarn-site.xml.

Verify the directory structure, ownership, and permissions:

$ sudo -u hdfs hadoop fs -ls -R /

You should see:

drwxrwxrwt - hdfs supergroup 0 2012-04-19 14:31 /tmp  
drwxr-xr-x - hdfs supergroup 0 2012-05-31 10:26 /user  
drwxrwxrwt - yarn supergroup 0 2012-04-19 14:31 /user/history  
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var  
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var/log  
drwxr-xr-x - yarn mapred 0 2012-05-31 15:31 /var/log/hadoop-yarn

Start YARN, and start the ResourceManager and NodeManager services:
Important:
Always start ResourceManager before starting NodeManager services.
1. On the ResourceManager system, run the following command:
```
$ sudo service hadoop-yarn-resourcemanager start 
```
2. On each NodeManager system (typically the same ones where DataNode service runs):
```
$ sudo service hadoop-yarn-nodemanager start 
```
Start the MapReduce JobHistory Server:
1. On the MapReduce JobHistory Server system, run the following command:
```
$ sudo service hadoop-mapreduce-historyserver start 
```
2. For each user who will be submitting MapReduce jobs using YARN, or running Pig, Hive, or Sqoop 1 in a YARN installation, set the HADOOP_MAPRED_HOME environment variable as follows:
```
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce 
```

Before running production jobs, verify basic cluster operation by running an example from the Apache Hadoop web site.

Note:

For important configuration information, see Deploying MapReduce v2 (YARN) on a Cluster.

Create a home directory for user joe:

$ sudo -u hdfs hadoop fs -mkdir -p /user/joe 
$ sudo -u hdfs hadoop fs -chown joe /user/joe

Perform the remaining steps as the user joe.

Make a directory in HDFS called input and copy XML files to it by running the following commands in pseudo-distributed mode:

$ hadoop fs -mkdir input 
$ hadoop fs -put /etc/hadoop/conf/*.xml input 
$ hadoop fs -ls input 
Found 3 items: 
-rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-site.xml 
-rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-site.xml 
-rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-site.xml

Set HADOOP_MAPRED_HOME for user joe:

$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Run an example Hadoop job to grep with a regular expression in your input data:

$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'

After the job completes, find the output in the HDFS directory named output23, which you specified to Hadoop:

$ hadoop fs -ls 
Found 2 items 
drwxr-xr-x - joe supergroup 0 2009-08-18 18:36 /user/joe/input 
drwxr-xr-x - joe supergroup 0 2009-08-18 18:38 /user/joe/output23

List the output files:

$ hadoop fs -ls output23 
Found 2 items 
drwxr-xr-x - joe supergroup 0 2009-02-25 10:33 /user/joe/output23/_SUCCESS 
-rw-r--r-- 1 joe supergroup 1068 2009-02-25 10:33 /user/joe/output23/part-r-00000

Read the results in the output file:

$ hadoop fs -cat output23/part-r-00000 | head 
1 dfs.safemode.min.datanodes 
1 dfs.safemode.extension 
1 dfs.replication 
1 dfs.permissions.enabled 
1 dfs.namenode.name.dir 
1 dfs.namenode.checkpoint.dir 
1 dfs.datanode.data.dir

This confirms that your cluster is successfully running CDH 5.

Important:

If you have client hosts, make sure you also update them to CDH 5, and upgrade the components running on those clients as well.

Step 7: Set the Sticky Bit

For security reasons, Cloudera strongly recommends you set the sticky bit on directories if you have not already done so.

The sticky bit prevents anyone except the superuser, directory owner, or file owner from deleting or moving the files within a directory. (Setting the sticky bit for a file has no effect.) Do this for directories such as /tmp. (For instructions on creating /tmp and setting its permissions, see Create the /tmp Directory).

Step 8: Upgrade Components

Cloudera recommends that you regularly update the software on each system in the cluster (for example, on a RHEL-compatible system, regularly run yum update) to ensure that all the dependencies for any given component are up to date. If you have not been doing this,the command may take a while to run the first time you use it.

Note:

For important information on new and changed components, see the CDH 5 Release Notes. To see whether there is a new version of a particular component in CDH 5, check the CDH Version and Packaging Information.

CDH 5 Components

Use the following sections to install or upgrade CDH 5 components: See also the instructions for installing or updating LZO.

Step 9: Apply Configuration File Changes if Required

Important: Configuration files

If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version. For details, see Automatic handling of configuration files by dpkg.

For example, if you have modified your zoo.cfg configuration file (/etc/zookeeper/zoo.cfg), the upgrade renames and preserves a copy of your modified zoo.cfg as /etc/zookeeper/zoo.cfg.rpmsave. If you have not already done so, you should now compare this to the new /etc/zookeeper/conf/zoo.cfg, resolve differences, and make any changes that should be carried forward (typically where you have changed property value defaults). Do this for each component you upgrade.

Step 10: Finalize the HDFS Metadata Upgrade

Important: Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS.

To finalize the HDFS metadata upgrade, do the following:

Make sure that the CDH 5 upgrade has succeeded and everything is running as expected. You can wait days or even weeks to verify a successful upgrade before finalizing it.
Before finalizing, run important workloads and ensure that they are successful. Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups.
Note:
- If you need to restart the NameNode during this period (after having begun the upgrade process, but before you have run finalizeUpgrade), restart your NameNode without the -upgrade option.
- Verifying that you are ready to finalize the upgrade can take a long time. Make sure you have enough free disk space, keeping in mind that the following behavior continues until the upgrade is finalized:
  - Deleting files does not free up disk space.
  - Using the balancer causes all moved replicas to be duplicated.
  - All on-disk data representing the NameNodes metadata is retained, which could more than double the amount of space required on the NameNode and JournalNode disks.
Finalize the HDFS metadata upgrade by using one of the following commands, depending on whether Kerberos is enabled (see Enabling Kerberos Authentication for Hadoop Using the Command Line).
Important: In an HDFS HA deployment, make sure that both the NameNodes and all of the JournalNodes are up and functioning normally before you proceed.
- If Kerberos is enabled:
```
$ kinit -kt /path/to/hdfs.keytab hdfs/<fully.qualified.domain.name@YOUR-REALM.COM> && hdfs dfsadmin -finalizeUpgrade
```
- If Kerberos is not enabled:
```
$ sudo -u hdfs hdfs dfsadmin -finalizeUpgrade
```

After the metadata upgrade completes, the previous/ and blocksBeingWritten/ directories in the DataNode data directories are not cleared until the DataNodes are restarted.

Troubleshooting: If You Missed the HDFS Metadata Upgrade Steps

If you skipped Step 5: Upgrade the HDFS Metadata, HDFS will not start; the metadata upgrade is required for all upgrades to CDH 5.4.0 and higher from any earlier release. You will see errors such as the following:

2014-10-16 18:36:29,112 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
        java.io.IOException:File system image contains an old layout version -55.An upgrade to version -59 is required.
        Please  restart NameNode with the "-rollingUpgrade started" option if a rolling  upgrade is already started; or restart NameNode with the "-upgrade"
        option to start a new upgrade.        
              at
        org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:231) 
              at
        org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:994) 
              at
        org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1476)
        2014-10-16 18:36:29,126 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
        2014-10-16 18:36:29,127 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false. Rechecking.
        2014-10-16 18:36:29,127 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false
        2014-10-16 18:36:29,127 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
        2014-10-16 18:36:29,128 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
        2014-10-16 18:36:29,128 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
        2014-10-16 18:36:29,128 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
        java.io.IOException: File system image contains an old layout version -55.An upgrade to version -59 is required.
        Please  restart NameNode with the "-rollingUpgrade started" option if a rolling  upgrade is already
        started; or restart NameNode with the "-upgrade"  option to start a new upgrade.        
              at
        org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:231) 
              at
        org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:994) 
              at
        org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410) 
              at
        org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1476)
        2014-10-16 18:36:29,130 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
        2014-10-16 18:36:29,132 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

To recover, proceed as follows:

Make sure you have completed all the necessary preceding steps (Step 1: Prepare the Cluster for the Upgrade through Step 4: In an HA Deployment, Upgrade and Start the JournalNodes; or Step 1: Prepare the Cluster for the Upgrade through Step 3: Upgrade the Packages on the Appropriate Hosts if this is not an HA deployment).
Starting with Step 5: Upgrade the HDFS Metadata, complete all the remaining steps through Step 10: Finalize the HDFS Metadata Upgrade.

Page generated May 18, 2018.