A common compute and data transformation service is Apache Spark. We have set up a development cluster in NDS-Labs project under Nebula to help test interactions between the NDS Workbench and this technology.


For this test I used the Hortonworks Hadoop Distribution to create a simple two node cluster.

Preparing the cluster

 In Nebula I provisioned three m4.large instances with the Centos 7 Latest image. I called them

  • nds-spark-master
  • nds-spark-slave-1
  • nds-spark-slave-2

I assigned a floating IP address to the master and assigned it to the following security groups:

  • Default
  • SSH
  • Extended HTTPS
  • Zeppelin (which opens up port 9995)

On each host  I performed 

% su yum update

To bring the image up to the latest releases of everything.

I copied my private key file to /home/centos/.ssh so I could use this accessible node as a bastion server to ssh into the slave nodes which are not connected to the internet.

Preparing for Installation of Hortonworks

I basically followed the HDP 2.6 installation instructions to

  • Create custom /etc/hosts entries to give the servers friendly names
  • Install network time daemon to keep clocks synchronized
  • Configured some network security settings that allow the install to work

Installing Hortonworks Data Platform 2.6.2

Hortonworks is installed using the Ambari server. I followed the instructions for installing Ambari to set up the server and run the new cluster wizard to provision the cluster.

You will need to make sure that the two slave nodes are assigned to host DataNodes and NameNodes. The master node needs to have all of the client software installed and host all of the other services since it is acting as your edge node with access to the internet.

You can log into your Ambari instance at the IP address of you master, on port 8080. Login as admin/admin.

Once Ambari has created the cluster you will need to go through the services that are highlighted in red to set passwords for the service accounts.

Securing the Cluster

Your cluster is up, with Ambari and Zeppelin web interfaces open with default passwords! Let's lock these consoles down.

Secure Ambari Admin User

Log into Ambari with admin/admin and select "Manage Ambari" from the admin user menu.

Click on the "Users" button in "Manage Users + Groups"

and edit the admin user. Click on "Change Password" to give the admin account a secure password.


Securing Zeppelin User Accounts

Zeppelin has several ways to manage users and passwords. The default is just to use a locally configured shiro security. To clear out the default users and create your own, secure users go to the Ambari dashboard and select the Zeppelin service on the left hand menu:

Select the "Configs" tab at the top of the Zeppelin page and expand "Advanced zeppelin-shiro-ini" section of the page

There are three rows in this config file that define entries for an admin user and two end users. Delete these rows and add new rows in the form of

<username>=<password>, role1

Click on the "Save" button at the top of the page. Enter an explanation of what you did and "Save" to dismiss the dialog. Ambari will notifiy you that you need to restart Zeppelin for the change to take effect. Allow it to restart all impacted services. Once the restart is complete, you should be able to access Zeppelin on the master's IP address on port 9995.


Using the Cluster

You can interact with the cluster by ssh'ing into the master node as user centos.

To manage HDFS it's best to su to user hdfs and issue the management command line statements. I created an nds user and a home directory in hdfs.

You can directly interact with spark using the % spark-shell command




  • No labels