Saturday, November 15, 2014

HADOOP ADMINISTRATION - High Availability

HADOOP ADMINISTRATION - Admin operations and useful commands

HADOOP ADMINISTRATION - Hadoop Client steup

HADOOP ADMINISTRATION - Backup and Restoration methodologies

Data Backup: 


Backup the Data using DISCP

hadoop distcp hdfs://nn1.cluster1.com:9000/ hdfs://nn1.cluster2.com:9000/

distcp will run a mapreduce job.


Application Backup : scripts/map reduce programs etc.

hadoop dfsadmin -saveNameSpace

hdfs dfsadmin -metasave filename.txt

hdfs oiv -i /data/namenode/current/fsimage -o fsimage.txt 

HADOOP ADMINISTRATION - Check point node /Secondary Name node configuration

Secondary Name Node or Check point Node :
  • Secondary Name Node take metadata/name space backup of a name node in hadoop 1.
  • Not a hot standby of Name Node.
  • Connects to Name Node every hour*(time can be configured).
  • Saved meta data can build a failed name node.

A newly formatted namenode creates the following directory structure:

                          ${dfs.name.dir}/current/VERSION

                                                           /edits

                                                           /fsimage
                                                           /fstime

Filesystem image and edit log : 

When a filesystem client performs a write operation (such as creating or moving a file),it is first recorded in the edit log. The namenode also has an in-memory representation
of the filesystem metadata, which it updates after the edit log has been modified. The
in-memory metadata is used to serve read requests.

The edit log is flushed and synced after every write before a success code is returned to the client. For namenodes that write to multiple directories, the write must be flushed and synced to every copy before returning successfully. This ensures that no operation is lost due to machine failure.The fsimage file is a persistent checkpoint of the filesystem metadata. However, it is not updated for every filesystem write operation, since writing out the fsimage file, which can grow to be gigabytes in size, would be very slow. This does not compromise resilience, however, because if the namenode fails, then the latest state of its metadata
can be reconstructed by loading the fsimage from disk into memory, then applying each of the operations in the edit log.In fact, this is precisely what the namenode does when it starts up.

The fsimage file contains a serialized form of all the directory and file  inodes in the filesystem. Each inode is an internal representation of a file or directory’s metadata and contains such information as the file’s replication level, modification and access times, access permissions,block size, and the blocks a file is made up of. For directories, the modification time, permissions, and quota metadata is stored.

The fsimage file does not record the datanodes on which the blocks are stored. Instead the namenode keeps this mapping in memory, which it constructs by asking the datanodes for their block lists when they join the cluster and periodically afterward to ensure the namenode’s block mapping is up-to-date.

As described, the edits file would grow without bound. Though this state of affairs would have no impact on the system while the namenode is running, if the namenode were restarted, it would take a long time to apply each of the operations in its (very long) edit log. During this time, the filesystem would be offline, which is generally
undesirable.

The solution is to run the secondary namenode, whose purpose is to produce checkpoints of the primary’s in-memory filesystem metadata.

The checkpointing process proceeds as shown in the figure..


1. The secondary asks the primary to roll its edits file, so new edits go to a new file.
2. The secondary retrieves fsimage and edits from the primary (using HTTP GET).
3. The secondary loads fsimage into memory, applies each operation from edits, then creates a new    consolidated fsimage file.
4. The secondary sends the new fsimage back to the primary (using HTTP POST).
5. The primary replaces the old fsimage with the new one from the secondary, and the old edits file with the new one it started in step 1. It also updates the fstime file to record the time that the checkpoint was taken.


At the end of the process, the primary has an up-to-date fsimage file and a shorter edits file (it is not necessarily empty, as it may have received some edits while the checkpoint was being taken). It is possible for an administrator to run this process manually while the namenode is in safe mode, 
using the hadoop dfsadmin -saveNamespace command.

This procedure make clear that we need secondary Name node has same capacity as Name Node.


Secondary namenode directory structure :

directory structure of Secondary Name Node will be
                                    ${fs.checkpoint.dir}/current/VERSION
                                                                           /edits
                                                                           /fsimage
                                                                           /fstime
                                                                           /previous.checkpoint/VERSION
                                                                           /edits
                                                                           /fsimage

                                                                           /fstime


The layout of this directory and of the secondary’s current directory is identical to the namenode’s. This is by design, since in the event of total namenode failure (when there are no recoverable backups, even from NFS), it allows recovery from a secondary.

hdfs-site.xml

                                 <property>
                                     <name>fs.checkpoint.dir</name>
                                     <value></value>
                                  </property>


                        Making multiple copies of name space 
                             <property>
                                    <name>dfs.name.dir</name>
                                    <value>/disk1/copy1,/disk2/copy2</value>
                             </property>

if name node gone down due to 1 disk got failed , how to make use of another copy 

Add 

                            <property>
                                   <name>fs.checkpoint.dir</name>
                                   <value>/disk2/copy2</value>
                             </property>

then instead of namenode format command use below command to copy name space back to disk1.

                      hadoop namenode -importCheckpoint

Manually do checkpoint

                      hadoop secondarynamenode -checkpoint force


Secondary Name Node related configuration parameters : 

fs.checkpoint.period - 3600
fs.checkpoint.dir - /data/sec
fs.checkpoint.edits.dir - /data/scedits
dfs.secondary.http.address - ip:port


HADOOP ADMINISTRATION - hadoop installation and configuration

HADOOP ADMINISTRATION - Preparing nodes for hadoop installation