Setup a multi-node Hadoop Cluster using Ansible | by Gaurav Gupta

Gaurav Gupta
3 min readApr 1, 2021

Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs.

Provision a Hadoop Cluster manually takes a lot of time and also lead to error sometimes, so we need some configuration management tool which does this job better than human.

I assume you have Ansible installed on either your workstation or an Amazon EC2 instance — Ansible has great documentation for installation…

http://docs.ansible.com/intro_installation.html

http://docs.ansible.com/intro_getting_started.html

Task:

  • Create an Ansible-collection for Hadoop Configuration.

Prerequisites:

  • Ansible Installed in your machine.

Tested On:

  • Redhat
  • Amazon Linux 2

Steps:

1. Define Hosts in Ansible Inventory file:

In your inventory file, define two different host groups…

  • master
  • slave

2. Download Ansible collection from Ansible Galaxy for Cluster Configuration:

To check the content, click here

Run this command to download collection in Ansible Master Node…

ansible-galaxy collection install gaurav_gupta_gtm.ansible_hadoop

You can see multiple files here…

3. Run the playbook:

For running the playbook, go to the same directory where collection download and then run…

ansible-playbook playbooks/deploy-hadoop.yml

And finally, our Hadoop Cluster is up and running…

We can check by do SSH in master node…

In master node, run below command to see the number of nodes connected…

hadoop dfsadmin -report

Github Link for Content:

Ansible Galaxy Link:

https://galaxy.ansible.com/gaurav_gupta_gtm/ansible_hadoop

Do Clap if you find it worth…👏🤗

Feel free to connect on linkedin…😊

Having any issue related to task, please DM me…

--

--