Verbaryerba created page: home authored by Balazs's avatar Balazs
# "Hadoop as a Service" #
I have started creating a Hadoop as a Service project as a proof of concept that a cluster of a distributed computing framework can be deployed dynamically by [hurtle](http://hurtle.it). There are several things to take into account in order to make this work. Here, I'm going to show you some notes and experiences I've made during the development. Anyone who would like to re-create my results can take this site as a guide.
A major shortcoming of the current implementation is the lack of the master node's waiting for all the slaves being setup and ready. Even though the instances are certainly created, it is not assured that the SSH keys are already inserted into them at the time the master node tries to connect. It is implicitly assumed that the slave nodes will be ready until the download of the required frameworks on the master has been completed. This problem could be circumvent by the master node assuring each slave node's readiness before setup, by using OpenStack's SSH key provisioning or similar.
First of all, the Git repository where I've deployed my code can be found here under https://gitlab.switch.ch/Verbaryerba/disco.git . The Service Orchestrator (SO) for hurtle (including its Heat Orchestration Template (HOT) "templates") and the configuration for its Service Manager (SM) are both included in the repository.
For the sake of simplicity, many simplifications can be met.
1. HOT can be generated manually with the SO's functions and deployed manually on OpenStack. I won't cover this method as it's only necessary if you don't have admin/owner access to your tenant on OpenStack.
2. SO can be run locally without a SM. In this case, the commands to the SO have to be placed manually in the terminal. For further reference, look at the [sample SO](https://github.com/icclab/hurtle_sample_so).
3. SM can be started locally. In this case, a Server has to be provided in the SM configuration where the SO instances can be deployed. Some more Information about this is shown on hurtle's [SM page](https://github.com/icclab/hurtle_sm). In this case, there still have to be placed TCP calls in the terminal, but the orchestration is handled by hurtle.
4. I am developing a PHP application which can proxy the TCP calls and therefore hide the terminal interface to the end user.
What all cases have in common (except for case 1): on the machine where hurtle is being executed, OpenStack needs to be accessible for the heatclient.
## SO local deployment ##
For testing purposes, it's easiest to deploy the SO locally. This means that all state transitions (i.e. initialisation, deployment, provisioning and deletion) have to be started manually with the according TCP commands. The SO is in this case just a python program that is performing the requested commands. Here, this application can be found within directory bundle/wsgi/application. It includes the program so.py in the same directory which executes the requested actions.
For starting the SO, you first have to set the environment variable DESIGN_URI to the OpenStack installation where the cluster is to be deployed. At the moment, there are no parameters accepted by the SO; the amount of slaves to be created is hard-programmed into the SO. (slaveCount in file bundle/wsgi/so.py) You will see that many other important cluster settings are set within the same file as well - you should change those accordingly. For the rest of the setup, refer to the "sample SO" link above.
## SM local deployment ##
If you want to go one step further, you can have your SO deployed by the SM. For this case, you'll have to edit the file etc/sm.cfg. Here, it's the location of the service_manifest.json and the design_uri which matter. The bundle_location is set to the location of the current SO deployed as a Docker image on the Docker hub. You can modify the SO and create your own Docker image. When you don't run the Cloud Controller (CC) locally (which is usually the case), you have to upload the Docker image to the Docker hub. The above link "SM page" can lead you through the handling of the SM.
## Web interface ##
I'm developing a web interface for the service manager, which is showing an easy-to-understand interface so there is no more terminal interaction needed. It needs a set-up SM plus a keystone-client on the local machine in order to get a token. As soon as it's usable, I'm going to include it into the above Git repository.