OpenShift Origin Installation

OpenShift Origin Installation

Learn how to install an OpenShift cluster on VPS infrastructure like DigitalOcean.

When I wanted to window shop OpenShift, I had 2 options, either set up on my local using Minishift(previously using an all-in-one VM) or using the “oc cluster up” method. While they are good options, you don’t get the real feel of trying out OpenShift without deploying it on a real cloud-ish setup.

I had a few hiccups with cloud based setups, they were too big for my needs and came in with components I didn’t really need, and sometimes way over budget. If you check the hardware requirements, you’ll know what I’m talking.

My aim was to setup a production-ish cluster, not too brittle, at the same time not big enough to power the backend of a bank. I went with a VPS provider like DigitalOcean, so that I know what costs I will incur fairly well.

Before I jump into the costs calculation, let’s do a quick run through of OpenShift architecture. There are 3 kinds of nodes in an OpenShift cluster:

  1. The master node: One or more nodes which hold the OpenShift master component, which processes API requests and does the actual infrastructure management. If multiple masters are involved, they are served by a load balancer. This can contain etcd as well, which acts as the key value store for the whole cluster.
  2. The infrastrcture node: Does any housekeeping used for the functioning of the cluster, the internal docker registry etc.
  3. The compute node: Node which houses your app and service containers.

To ensure high availability, you need more than one master node. If you are running single master and it goes down, your applications will still run, you won’t be able to control or manage any of it since the master is down.

Also, you can have the same node act as both the master and infrastructure node by tagging with appropriate roles, although it is not recommended for a true production setup.

The number of compute nodes depends on the volume of applications and deployments you are going to have. You can always start with 1 and scale up, i.e. add more nodes to the cluster when you need them.

The next piece of infrastructure in your cluster is storage. You need storage if your app needs some form of persistence(think MySQL DB) and for other stuff, like storing your container images. OpenShift comes with multitude of options for setting up storage for the cluster. You can either wire it up with the de facto storage for that cloud(like use EBS for AWS) or use a tool like NFS. 

For our cluster, I chose 3 VMs with 4 gigs of RAM each(way less than recommended hardware). I went with NFS for storage, and because NFS needs a lot of space, I created 3 volumes, 100 Gb each mounted to each VM.

OpenShift 3 cluster architecture

OpenShift runs on RedHat flavoured OSes(like RHEL, Centos or Atomic) and requires quite some prerequisites. So, I chose the VM OS to be Centos 7 x64.

RedHat ships elaborate and well maintained Ansible scripts to create a custer and manage it. But that’s after the fact that you have booted the infrastructure, setup the DNS, installed prerequisites and everything. I wanted this setup to be consistent and reproducible so that anyone wanting to use OpenShift could follow the same set of instructions and get the exact same results. I automated the infrastructure creation and prerequisites part as well.

Enter Terraform

I’m not sure how many engineers it takes to turn a light bulb, but for creating any amount of complex infrastructure in a predictable, consistent and repeatable fashion, it takes only 1 engineer, if they are using Terraform.

Terraform helps you create the VMs, alter their DNS settings, create volumes and attach it to the VMs all using a single command. I could use Ansible to get the same thing done, but Terraform has a slight edge when compared to Ansible. It stores the previous state of the system. It takes the input as the end state of the system and alters the system until the desired state is achieved. This is not doable in Ansible. I still have a few gripes around Terraform and miss Ansible templating, but it is good enoughTM to get the job done.

In our case, Terraform does 2 things. It creates the infrastructure and generates an inventory file for the OpenShift playbooks to consume. Why would I use Terraform to generate my inventory? 2 reasons:

  1. The inventory file for OpenShift Ansible based installation is complex. OpenShift contains a lot of moving parts, which translates to lot of configuration to make the whole thing work. This is partly based on the infrastructure you choose to run your cluster on. Because Terraform  creates the infrastructure for you, it makes sense that Terraform generates the inventory for you. You can still tweak the dials, but only at the Terraform level, not in the inventory level.
  2. I wanted the installation to be predictable, repeatable and consistent. Handcoding the inventory is the best way to not meet all these criteria. On the other hand, if you give a fixed set of inputs to Terraform and install the cluster from the inventory file it generated for you, you get the same result every time you run it, anywhere you run it(Provided you are using the same version of OpenShift).

You first initiate the Terraform context by running:

$ terraform init

But before that, take a moment to inspect the variables which you can configure for your cluster, most notably,

The SSH key pair path you will use for this cluster.

variable "public_key_path" {
  default = "~/.ssh/"
  description = "The local public key path, e.g. ~/.ssh/"

variable "private_key_path" {
  default = "~/.ssh/tf"
  description = "The local private key path, e.g. ~/.ssh/id_rsa"

I recommend this to be exclusive to your setup.

Also, the VM size and region,

variable "master_size" {
  default = "4gb"
  description = "Size of the master VM"

variable "node_size" {
  default = "4gb"
  description = "Size of the Node VMs"
variable "region" {
  default = "blr1"
  description = "The digitalOcean region where the cluster will be spun."

Also, you need to supply terraform with the domain you are going to use for OpenShift. We can edit the file or pass it using environment variables.

$ export

Note that this must be a domain you own.

Most of these are sensible defaults.

Then, you supply Terraform with the DigitalOcean API token you generated in your settings.

Generate a DigitalOcean API token
$ export DIGITAL_OCEAN_TOKEN=abcxyz123

You can generate the infrastructure and inventory file by running:

$ terraform apply

Terraform does the grunt work of creating the VMs, mapping domains, creating volumes and attaching them, and finally generating the inventory files. The first inventory file called the preinstall-inventory.cfg, contains the master and compute nodes. 

$ ansible-playbook -u root --private-key=~/.ssh/tf -i preinstall-inventory.cfg ./pre-install/playbook.yml

This is the same key which you specified in the Terraform variables section.

Install OpenShift Origin using Ansible

Now is the time to fetch the Ansible scripts to setup the cluster.

git clone --branch release-3.10 --depth 1

Note the release-3.10 branch. The playbooks branching convention follows the cluster version closely. Also, any ansible commands going forward must be run using the Ansible version specified in this repo. I hit issues numerous times when I used a different version of Ansible other than the one specified in this repo. Let’s install Ansible recommended by this repo by creating a virtualenv.

$ cd openshift-ansible
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ which ansible # ensure that this is from the virtualenv

Let’s take a look at the second inventory file generated by Terraform.

Before you set out to install the cluster, you have to ensure that the hardware and software setup thus far meet the prerequisites recommended by OpenShift. This is not set on stone and we can take the liberty to tweak it as per our needs, as set in the inventory,

openshift_disable_check = disk_availability,docker_storage,memory_availability,docker_image_availability

We run the prerequisite check.

$ ansible-playbook -i ../inventory.cfg playbooks/prerequisites.yml

This will take around 5-10 minutes. This setup by default installs an all-in-one master node(which acts as master, infrastructure node and a compute node).

[masters] openshift_ip= openshift_schedulable=true

[etcd] openshift_ip=

[nodes] openshift_ip= openshift_schedulable=true openshift_node_group_name='node-config-all-in-one'

The openshift_node_group_name  is a way to tag an OpenShift node. We add 2 compute nodes in addition to the master node.

[nodes] openshift_ip= openshift_schedulable=true openshift_node_group_name='node-config-all-in-one' openshift_ip= openshift_schedulable=true openshift_node_group_name='node-config-compute' openshift_ip= openshift_schedulable=true openshift_node_group_name='node-config-compute'

The persistent storage is managed by GlusterFS. OpenShift comes with a ton of persistent storage options, but at the time of writing this, I found GlusterFS to be the best fit between ease of configuring and utility for DigitalOcean.

Now for the final playbook run to install the cluster.

$ ansible-playbook -i ../inventory.cfg playbooks/deploy_cluster.yml

Be warned that this takes a good 30 minutes to finish. Ideal time for a coffee break or to read the rest of this post 🙂

OpenShift Origin installation using Ansible

This cluster is configured with the HTPassword Identity provider, in other words, the plain simple username password based identity system.

openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]

You can configure your cluster to include multiple identity providers. This is useful if you are installing it for an organization, for example.

You can login to the master node using the command

$ ssh -i ~/.ssh/tf root@console.<whatever domain name you gave>

Let’s create an admin user by logging into the master node.

$ htpasswd -cb /etc/origin/master/htpasswd admin <your choice of password>

The next important thing is to configure a default storage class for the cluster. Although we install GlusterFS as part of our setup, we don’t make it the default storage option. Let’s do that.

$ kubectl get storageclass
NAME                PROVISIONER               AGE
glusterfs-storage   16m

$ oc patch storageclass glusterfs-storage -p '{"metadata":{"annotations":{"":"true"}}}'

Verifying the OpenShift installation

Now that we’ve installed the cluster, let’s verify whether our setup works as intended. Your first check will be to ensure that OpenShift lists all the nodes.

$ oc get nodes

Next check is to ensure that oc status shows nothing alarming.

$ oc status

Ensure that the registry and router pods are up and running.

$ oc get pods -n default

While we are at the master node, let’s create an actual application and deploy it. OpenShift comes with a set of default templates. We can fetch them using,

$ oc get templates -n openshift

First, we create a new project, or a namespace which will hold all our apps and services.

$ oc new-project myns

Then, we create a new app from the Node.js+MongoDB template, so that we can verify that features like persistence, creating and attaching volumes etc. work.

$ oc new-app --template=nodejs-mongo-persistent -n myns

You can stream the logs realtime using this command.

$ oc logs -f nodejs-mongo-persistent-1-build

Once the application is deployed successfully,

$ oc get pods -n myns # ensure that we have a pod which begins with the same name as our app
$ oc get routes # to get the app URL

hit the URL in your browser. It will be of the pattern nodejs-mongo-persistent-myns-<your-app-domain>. In the next posts, we will be looking at how to create applications on top of this newly minted OpenShift cluster.


If you are a video person, check out the video accompanying the blog post.

OpenShift vs Kubernetes
Up Next:

OpenShift vs Kubernetes

OpenShift vs Kubernetes