Now Reading
The Challenge of Continuous Delivery in Distributed Environments – The New Stack

The Challenge of Continuous Delivery in Distributed Environments – The New Stack

The Challenge of Continuous Delivery in Distributed Environments – The New Stack

Nati Shalom

Cloudify.io founder Nati Shalom serves as the CTO. He is a serial entrepreneur who is also a well-respected thought leader and speaker on open source, multicloud orchestration network virtualization and DevOps. Nati has received numerous recognitions, including YCombinator. He is also one of the leaders in the DevOps Israel Meetups and Cloud Native.

Modern software applications are distributed in a multitude of environments. This means that they can run on many different infrastructure environments simultaneously.

These highly distributed applications often share data that is scattered geographically. Email, the internet, telephone and cell networks, aircraft control systems and ride-share dispatch systems are all well-known high-end uses cases.

Many modern SaaS-based offerings today are highly distributed. They need to ensure low latency access across global distributions (as with Zoom and Netflix) or comply with regulations such as GDPR, which prohibit customer data from crossing certain geographic boundaries.

This makes distributed architecture more mainstream and common. Most SaaS companies now manage their data across multiple regions and sites. They also use hybrid SaaS architectures, where some services run in shared clouds and others on-premise.

Unfortunately, many of the current DevOps automation tools are not designed to support hybrid and distributed architectures. This leads many software companies and enterprises creating custom frameworks and processes to address the unique challenges presented by these architectures.

You have fewer moving parts when all of your IT infrastructure runs in the same place than in a distributed system. In a distributed system you may have hundreds, thousands, or millions of endpoints that your software must run correctly. Complexity of IT operations in distributed systems increases not only because of the sheer number of endpoints, but also because everything must work together.

Each environment in which an application runs can present its own operational challenges. These challenges can be both common and extreme. Common problems include system drift beyond the original configurations due to slight changes in infrastructure. Even something as simple and straightforward as a change to a security group can cause problems, rendering a previously accessible port unresponsive.

A distributed system is more susceptible to infrastructure drift than a centralized one. This is due to the fact that there are more moving parts and each part is evolving independently. Each endpoint in a distributed network can face extreme challenges such as severe weather events, which can cause major power outages or damage to physical equipment and facilities.

How can you manage continuous deployments to multiple environments, even though those environments are constantly changing? This is the reality of distributed systems and it is why it is so difficult to deploy and update highly distributed applications.

Day 2 Challenges in Managing Distributed Environments on a Scale

Let’s look at a simple example. Imagine that your company has two offices. One is in the east, the other in the west. Each runs your customer relation management (CRM software) on its own servers. This reduces lag and ensures compliance with data sovereignty regulations.

Now imagine your IT team wants a software update for the CRM application. An update must be sent to both the east and west. What happens if the update fails in the east, but succeeds in west? You will need to push the update again in the east. This partial failure scenario will require a completely new deployment process.

The example given above is quite simple. Let’s take this example and make it more practical for today’s multisite enterprises.

The DevOps team runs a CI/CD pipeline. An update is ready to deploy to 10-50 Kubernetes Clusters in multiple regions. The deployment process within CI/CD pipeline uses a task-based system where code is written and deployed assuming that everything downstream is running as expected.

Ideal scenario is that you create tasks and the system executes them according to your instructions. What if something isn’t right? If this happens, the process won’t work. How do you know if an update succeeded at each of the 10 Kubernetes clusters? Is it working? Is it not running Which part is it running? Which part failed?

Let’s say that the update fails in all three of these areas. How can you find out why it failed? (The cause of failure could be different at each location. Is there drift? How can you successfully update to an unknown environment if you don’t know? How can you continue the update process after a failure? You don’t want to have to update all 10 sites over again; you only want to update the three that failed. How can you roll back a site if it has a major flaw?

Complexity driven by Edge-First Environments

As you can see the challenges are mounting as DevOps teams try to rapidly innovate and push software upgrades multiple times per day in highly distributed environments.

Most teams don’t have enough insight into the environment at each endpoint. This means that failures take time to examine and can require unique tweaks or fixes to deal with each change in the system’s state.

That’s why DevOps engineers are doing so much hand-coding. Engineers have to stop the normal CI/CD flow and investigate what part is not running. Then, they can manually adjust the software and deploy code to make up the difference.

The truth is that there will always be changes in the system. Continuous deployment systems that rely on infrastructure environments are not always stable. DevOps engineers do not always know the status of every endpoint environment in a distributed network, so the CI/CD pipeline isn’t able to be adaptive enough.

The process of ensuring continuous deployment within distributed environments can prove to be very complicated and burdensome, slowing down business innovation.

How can DevOps teams manage continuous software updates and deployment across distributed environments?

As edge-first approaches become more common, forward-thinking organisations should look at open source solutions that can be used in distributed environments.

Cloudify is one such open-source project.

Remove Complexity from Deployment in Distributed Environments

Open source CloudifyDevOps can help to offload the complexity involved in managing distributed environments. In Figure 1, (below), you can see two parts of a pipeline. One is for application development and testing, while the other is responsible for creating the environments that will allow the software to run.

Distributed architecture

Figure 1.

Cloudify takes care of the second part, which is provisioning and maintaining the environments. DevOps teams simply set up their environments as they wish them to be, and then the software manages them continuously (as shown in Figure 1).

A Closer Look: Environment-as-a-Service in Your CI/CD Pipeline

CI/CD workflows are usually written with the assumption the infrastructure will always be in the same state it was before. The state of the infrastructure can change and the workflow will be affected. Users must update their workflow to deal with changes in state. If this is not done regularly, the entire CI/CD workflow becomes manual and less automated. This defeats the purpose of CI/CD.

Cloudify assumes instead that infrastructure is subjected to entropy, which means that it will drift over time, particularly in highly distributed environments.

This solves the problem by using a declarative approach that separates the environment’s state from the workflow. Environment-as-a-Service (EaaS) technology keeps track of the state of each environment and how it changes over time. It knows the configurations of all components, including storage, networking, and computing, as well as how they relate to each other.

The software detects drift over time and offers built-in workflows that can automatically correct some common drift situations. Software feeds information about each environment back into the workflow, allowing it to adapt to changing conditions.

Cloudify also uses a Transaktional workflowMechanism that continuously monitors the execution status and can resume failed workflows or trigger a rollback workflow.

This mechanism has been extended to support bulk operations in distributed environments with version 6. The software can run workflows simultaneously in hundreds of sites or thousands of environments. It also ensures that the workflows are executed successfully even in the event of a network failure or outage.

All of this information is visible in Cloudify’s map view (Figure 2). This allows users to see the services running at each location and the state of the entire cluster at a glance.

The map can be used for thousands of deployments in thousands locations. A new version of the map is also available. View on deploymentAllows users to switch between the table and map view. Users can execute operations such Day 2 workflows and deployment (provisioning), on the cluster from this view.

Cloudify Map View

Management of distributed workflows requires new management and monitoring tools. These tools will allow you to quickly see the current state of the system.

These are just a few examples of the capabilities Cloudify open-source Cloudify offers to SaaS businesses looking for a strong enabler of multisite administration and highly distributed computing.

It’s all about improving the DevOps Experience

Cloudify’s main goal is to remove a lot of the complexity that afflicts IT operators and DevOps engineers. We are specifically trying to make managing distributed systems as easy as running any other cloud service.

DevOps should be able push code into Git in order to describe the desired system end state. This will offload the job of figuring the delta between current and end states and doing what is necessary to keep the CI/CD pipeline operating at a rapid pace.

As an added bonus, whether youre using Cloudify to manage a dozen environments for your in-house development team or billions of endpoints in an advanced edge use case, the environment-as-a-service capability synchronizes the efforts of DevOps and IT management and can help your team break down some of the biggest silos that exist in enterprises today.

View Comments (0)

Leave a Reply

Your email address will not be published.