Installing Hadoop Using Apache Ambari and Amazon EC2 – Part 1

In order to run hadoop with Amazon EC2, I used Apache Ambari installation wizard to install Hadoop. According to its documentation, Apache Ambari provides an end-to-end management and monitoring application for Apache Hadoop. Also Ambari provides a graphical user interface (GUI) to deploy and operate a complete Hadoop stack, manage configuration changes, monitor services, and create alerts for all the nodes in your cluster from a central point.
My configuration for Apache Ambari looks like below using six Amazon AMI 64bit instances:
– m1.medium ambarimaster , we will call it p1_mar24
– m1.large hdpmaster1, we will call it p2_mar24
– m1.large hdpmaster2, we will call it p3_mar24
– m1.medium hdpslave1, we will call it p4_mar24
– m1.medium hdpslave2, we will call it p5_mar24
– m1.medium hdpslave3, we will call it p6_mar24
Here is how it looks like:

Now let’s configure everything step-by-step:
1) Connect to your first EC2 instance using ssh. I used Cygwin terminal client on my windows 7 machine.
First I downloaded my key file from Amazon EC2 configuration as pankaj_east_hadoop_20130324.pem. Then I copied that key file in my cygwin terminal home directory that I wanted to work on, which is /home/pankaj/20130324.

$ pwd
/home/pankaj/20130324
$ ls -ltra
total 4
-r-------- 1 pankaj mkgroup 1696 Mar 24 14:19 pankaj_east_hadoop_20130324.pem
drwxr-xr-x+ 1 pankaj mkgroup 0 Mar 24 14:41 ..
drwx------+ 1 pankaj mkgroup 0 Mar 24 14:56 .ssh
drwxr-xr-x+ 1 pankaj mkgroup 0 Mar 24 15:04 .

Once this is done, open new Cygwin terminal window. Go to directory /home/pankaj/20130324 and launch ssh using following command.
ssh -i pankaj_east_hadoop_20130324.pem ec2-user@ec2-174-129-88-73.compute-1.amazonaws.com
This command will fail initially as below.

$ ssh -i pankaj_east_hadoop_20130324.pem ec2-user@ec2-174-129-88-73.compute-1.amazonaws.com
The authenticity of host 'ec2-174-129-88-73.compute-1.amazonaws.com (174.129.88.73)' can't be established.
RSA key fingerprint is eb:e8:f8:35:23:f1:31:cf:29:82:82:fa:eb:4a:3d:b3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-174-129-88-73.compute-1.amazonaws.com,174.129.88.73' (RSA) to the list of known hosts.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0755 for 'pankaj_east_hadoop_20130324.pem' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: pankaj_east_hadoop_20130324.pem
Permission denied (publickey).

The reason is permission issue on your pem file. So run following command:

$chmod 400 pankaj_east_hadoop_20130324.pem

This will remove permission issues and allow you to connect to remote linux instance.

$ ssh -i pankaj_east_hadoop_20130324.pem ec2-user@ec2-174-129-88-73.compute-1.amazonaws.com

__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2012.09-release-notes/

There are 13 security update(s) out of 24 total update(s) available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-10-62-97-105 ~]$

Now you are connected to the EC2 instance as ec2-user. You can use” sudo su” command to run any command as root user if needed.
This instance will be used as our Ambari server host.
Now you generate public and private SSH keys on this Ambari Server host as below:

[ec2-user@ip-10-62-97-105 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/ec2-user/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/ec2-user/.ssh/id_rsa.
Your public key has been saved in /home/ec2-user/.ssh/id_rsa.pub.
The key fingerprint is:
2e:14:e4:2d:d2:5b:2f:fa:0b:9d:f1:47:71:bf:00:47 ec2-user@ip-10-62-97-105
The key's randomart image is:
+--[ RSA 2048]----+
| . E |
| + . . |
| . = o .... |
| . = . oo . |
| o S . .. .|
| . + = . . .|
| + + . . . |
| + . |
| o. |
+-----------------+
[ec2-user@ip-10-62-97-105 ~]$ ls -ltra
total 24
-rw-r--r-- 1 ec2-user ec2-user 124 May 22 2012 .bashrc
-rw-r--r-- 1 ec2-user ec2-user 176 May 22 2012 .bash_profile
-rw-r--r-- 1 ec2-user ec2-user 18 May 22 2012 .bash_logout
drwxr-xr-x 3 root root 4096 Feb 15 23:51 ..
drwx------ 3 ec2-user ec2-user 4096 Mar 24 18:28 .
drwx------ 2 ec2-user ec2-user 4096 Mar 24 18:47 .ssh
[ec2-user@ip-10-62-97-105 ~]$ cd .ssh
[ec2-user@ip-10-62-97-105 .ssh]$ ls -ltra
total 20
-rw------- 1 ec2-user ec2-user 409 Mar 24 18:28 authorized_keys
drwx------ 3 ec2-user ec2-user 4096 Mar 24 18:28 ..
-rw-r--r-- 1 ec2-user ec2-user 406 Mar 24 18:47 id_rsa.pub
-rw------- 1 ec2-user ec2-user 1671 Mar 24 18:47 id_rsa
drwx------ 2 ec2-user ec2-user 4096 Mar 24 18:47 .
[ec2-user@ip-10-62-97-105 .ssh]$

Now download “id_rsa.pub” file from “.ssh” directory of your home folder to your laptop or desktop using following command in separate Cygwin terminal window.

$pwd
/home/pankaj/20130324
$scp -i pankaj_east_hadoop_20130324.pem ec2-user@ec2-174-129-88-73.compute-1.amazonaws.com:/home/ec2-user/.ssh/id_rsa.pub .
id_rsa.pub 100% 406 0.4KB/s 00:00
$ ls
id_rsa.pub pankaj_east_hadoop_20130324.pem

Now we will upload the downloaded public key file “id_rsa.pub” to our remaining instances one by one using following command (we need to change instance name though).

$ scp -i pankaj_east_hadoop_20130324.pem ./id_rsa.pub ec2-user@ec2-204-236-208-203.compute-1.amazonaws.com:/home/ec2-user/.ssh/
id_rsa.pub 100% 406 0.4KB/s 00:00

———————————————————————————————————————————–
Now let’s go to our second instance using ssh on new Cygwin terminal window and check “.ssh” directory as below:

$ ssh -i pankaj_east_hadoop_20130324.pem ec2-user@ec2-204-236-208-203.compute-1.amazonaws.com
The authenticity of host 'ec2-204-236-208-203.compute-1.amazonaws.com (204.236.208.203)' can't be established.
RSA key fingerprint is 16:cf:09:4f:f2:0b:d2:8c:76:66:5c:76:33:eb:d0:df.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-204-236-208-203.compute-1.amazonaws.com,204.236.208.203' (RSA) to the list of known hosts.

__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2012.09-release-notes/

There are 13 security update(s) out of 24 total update(s) available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-10-118-74-223 ~]$ pwd
/home/ec2-user
[ec2-user@ip-10-118-74-223 ~]$ cd .ssh
[ec2-user@ip-10-118-74-223 .ssh]$ ls -ltra
total 16
drwx------ 3 ec2-user ec2-user 4096 Mar 24 18:28 ..
drwx------ 2 ec2-user ec2-user 4096 Mar 24 19:13 .
-rw-r--r-- 1 ec2-user ec2-user 406 Mar 24 19:13 id_rsa.pub
[ec2-user@ip-10-118-74-223 .ssh]$ cat id_rsa.pub >> authorized_keys
[ec2-user@ip-10-118-74-223 .ssh]$ chmod 640 authorized_keys
[ec2-user@ip-10-118-74-223 .ssh]$ chmod 640 id_rsa.pub

——————————————————————————————————————————————
Once above configuration is done on second instance, go the Cygwin terminal window for the first instance.

[ec2-user@ip-10-62-97-105 .ssh]$ cd ..
[ec2-user@ip-10-62-97-105 ~]$ chmod 700 .ssh
[ec2-user@ip-10-62-97-105 ~]$ cd .ssh
[ec2-user@ip-10-62-97-105 .ssh]$ ls -lta
total 24
drwx------ 2 ec2-user ec2-user 4096 Mar 24 19:15 .
-rw-r--r-- 1 ec2-user ec2-user 884 Mar 24 19:15 known_hosts
-rw------- 1 ec2-user ec2-user 815 Mar 24 18:50 authorized_keys
-rw------- 1 ec2-user ec2-user 1671 Mar 24 18:47 id_rsa
-rw-r--r-- 1 ec2-user ec2-user 406 Mar 24 18:47 id_rsa.pub
drwx------ 3 ec2-user ec2-user 4096 Mar 24 18:28 ..
[ec2-user@ip-10-62-97-105 .ssh]$ chmod 640 id_rsa.pub
[ec2-user@ip-10-62-97-105 .ssh]$ chmod 640 authorized_keys
[ec2-user@ip-10-62-97-105 .ssh]$ ls -ltra
total 24
drwx------ 3 ec2-user ec2-user 4096 Mar 24 18:28 ..
-rw-r----- 1 ec2-user ec2-user 406 Mar 24 18:47 id_rsa.pub
-rw------- 1 ec2-user ec2-user 1671 Mar 24 18:47 id_rsa
-rw-r----- 1 ec2-user ec2-user 815 Mar 24 18:50 authorized_keys
-rw-r--r-- 1 ec2-user ec2-user 884 Mar 24 19:15 known_hosts
drwx------ 2 ec2-user ec2-user 4096 Mar 24 19:15 .
[ec2-user@ip-10-62-97-105 ~]$ ssh ec2-user@ec2-204-236-208-203.compute-1.amazonaws.com
Last login: Sun Mar 24 19:11:27 2013 from bas1-malton23-2925222199.dsl.bell.ca

__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2012.09-release-notes/

There are 13 security update(s) out of 24 total update(s) available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-10-118-74-223 ~]$

Now you are connected to second instance from first instance without providing password.
Similar configuration can be done for all other instances.

I will add more detail on further installation in part 2.
Credits: Adam Muise from Toronto Hadoop User Group. He works as Solutions Engineer at Hortonworks.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>