OpenShift Platform Day2 - Capacity Management
Capacity is the provision to handle a simultaneous planned and unplanned outage, without making the user experience unacceptable performance; this often results in an “N + 2” configuration, where peak traffic can be handled by N instances (possibly in degraded mode) while the largest 2 instances are unavailable.
While simple High Availability makes sure that the service is available, Capacity Management makes sure that it performs well. While Scalability ensures that the service can scale out, Capacity Management tells us when it needs to (and how much).
Some basic pattens (or best practices) for Capacity Management include:
- Validate prior demand forecasts against reality until they consistently match. Divergence implies unstable forecasting, inefficient provisioning, and risk of a capacity shortfall.
- Use load testing rather than tradition to establish the resource-to-capacity ratio: a cluster of X machines could handle Y queries per second three months ago, but can it still do so given changes to the system?
- Don’t mistake Day-1 load for steady-state load. Launches often attract more traffic, while they’re also the time you especially want to put the product’s best foot forward.
- Plan ahead to make sure that your existing infrastructure can handle expected future loads (future deployments, seasonal loads)
- Make sure that your Capacity Management methodology matches your deployment plans. For example, if you do blue/green deployment you need to have capacity for unexpected issues in both the blue and green environment.
Capacity Management maps very well to the Golden Signal called “Saturation”, which is commonly defined as:
How “full” your service is. A measure of your system fraction, emphasizing the resources that are most constrained (e.g., in a memory-constrained system, show memory; in an I/O-constrained system, show I/O). Note that many systems degrade in performance before they achieve 100% utilization, so having a utilization target is essential.
(from the Google SRE book)
Measuring your service’s Saturation (whether it be an infrastructure metric such as memory or application metric such as queue throughput) will give you an idea of how close you are to filling your capacity.
In a well configured OpenShift cluster, with spare capacity, we would expect to have a setup resembling the following, with the service distibuted evenly across the hosts, and with the capacity for 2 hosts to fail without affecting the system. For simplicity of this discussion, let us assume a node can run up to 4 pods.
In the following example, we have deployed a new version of the service in the cluster (coloured green) and the cluster has spread the pods across the available hosts:
The cluster is currently functioning with N+1 capacity, since we only have 1 spare server.
If we now experience a failure in one of the hosts, then the cluster will reach full saturation, without capacity for any futher work and no capability to handle any more failures. While from a user’s perspective the cluster and the services running on the cluster are fully functional, there is a high risk that any further failure will cause customer affecting incidents:
At this point the best course of action is to immediately scale out the cluster and add an addition node to handle unexpected failures in the existing nodes. This is above and beyond the actions required to solve the problem with the failed node, of course.
While the examples above are for Compute resources (CPU, Memory), they are equally relevant for resources such as storage, network bandwidth and so on.
The goal of Day-2 Capacity Management is to keep the available resources at an N+2 level, meaning that whenever resources are consumed there is a response which may include any of the following action types:
- Allocate more resources (e.g scale out by adding additional resources).
- Reduce the use of resources(e.g roll back any changes that caused the additional use of resources or throttle the use of the resources).
- Accept the additional risk and plan accordingly (e.g add additional monitors or prepare to allocate more resources).
Monitoring the deployment of services and use of resources will help determine the correct cause of action: For example, if there is a business peak on Thursdays which consumes a lot of resources it may be a legitimate plan of action to simply accept a lower level of capacity at that time, while making sure that no deployments may be scheduled for that time period, thus keeping the risk of capacity lowered to N+1 at a minimum.
On the other hand, if there is a requirement to deploy during peak hours, then a schedule of scaling out the resources before deployment may be a better plan. In the end, a business case must be made to decide which response to lowered capacity is better suited for scenario (lower velocity of deployment by limiting changes or raise costs by scaling out more hardware).
Infrastructure capacity for OpenShift covers the traditional resources of an IaaS environment - Compute, Memory, Storage and other assets such as available network addresses so on.
During Day 1 setup, we should have a set of monitoring dashboards and alerts which will measure the usage and saturation of the resources. Set your monitoring thresholds so that you get capacity notifications when you fail the N + 2 tests (e.g when you no longer have double redundancy or capacity).
A tool called
cluster-capacity can be used to assess how many more pods (of a standard type) can be deployed on the cluster.
During Day 2 operations, we want to allow regular changes to occur in the OpenShift cluster without risking a capacity failure. For example, while executing a canary deployment the cluster will be under a higher level of load because more instances of the deployed service will be running in parallel (the old version and the newer one). This may reduce our capacity to N + 1. If, during that period, one of the cluster nodes fails then our capacity will be further reduced to N. At this point we are at risk that additional application traffic will saturate available resources and the system will enter a failure state.
- Monitor cluster capacity
- Use Metering Reports
- Update Capacity Dashboards and Alerts
- Perform Chaos engineering
While the principles of capacity management for applications are similar to those of infrastructure resources, the implementation is more complex because application capacity does not scale with hardware or “simple” resources.
If the throughput of a queue is X messages per second, adding additional Compute, Network or Storage resources might not add to that value and it will probably not be linear. As part of the preparation to deploy applications on Day-1, the various parameters which are relevant to capacity must be determined.
For example, what are the bottlenecks of the application and how can they be resolved? Should the application be scaled vertically (i.e. more resources for each individual pod) or horizontally (i.e. deploy more pods)? In theory, a 12-factor application should have additional capacity when scaled horizontally, but in reality various bottlenecks such as queues, locks and other dependencies may affect behaviour. End-to-End load testing during application development should expose the bottlenecks, monitors developed to detect them and solutions should be put in place to handle cases where the applications become close to saturated.
During Day-2 operations, our goal is to monitor the behaviour and resource consumption of applications and make sure that they are never close to a saturation point where a single additional failure will cause a client affecting failure.
While the monitoring solution may notify us that either the resource usage of the application is rising or that the available resources are reduced due to a failure, the immediate response will be to lower the saturation level of the appliaction (lower the ratio of resource usage to resource availability). Example of solutions may include:
- Scaling out the application horizontally
- Scaling out the underlying infrastructure (as a pre-requisite to further application scaling)
- Scaling down other applications (thus freeing up resources for the capacity-bound application)
- Throttling access to the application so that less requests are sent and lowering capacity requriements
- Feature toggling the application so that high-capacity requests are not available
Again, the decision as to which solution is the best must be decided as a business case. For example, perhaps only some users should have their requests throttled or features deactivated, due to their roles.
During the lifcycle of an application, design changes should be implemented which will improve the way the application handles capacity problems. For example, if the capacity bottleneck of an application is a global queue (which cannot be scaled horizontally) then perhaps the application can be modified so that each pod has it’s own individual queue as a side-car container. This will mean that scaling out the application pod horizontally will now increase queue capacity. In this case, rasising capacity of the application requires development changes.
- Define saturation monitors as part of application design
- Configure JDK settings for containers
- Configure resource quotas
- Idle inactive applications
|SRE||Monitor Cluster Capacity|
|SRE||Use Metering Reports|
|SRE||Update Capacity Dashboards and Alerts|
|SRE||Perform Chaos Engineering tests|
|SRE & DevOps Engineer||Define saturation monitors as part of application design|
|SRE & DevOps Engineer||Configure JDK settings for containers|
|SRE & DevOps Engineer||Configure resource quotas|
|SRE & DevOps Engineer||Idle inactive applications|
cluster-capacity tool enables you to approximate the number of additional pods that the cluster can allocate.
After installing the tool, you create a template of a “standard” pod. Upon running the tool, it will tell you how many of these “standard” pods can be added to the existing cluster, based on the current load and hardware.
You can run this tool on a schedule, write out the results to a log and send out an alert when the number of additional pods gets too low.
Besides costs and accounting, metering is a useful tool to generate reports regarding the usage patterns of the cluster. Use metering to help plan your capacity needs to see if your assumptions are correct.
Make sure that you are familiar with the monitoring dashboards that represent the current capacity of the cluster infrastructure. Set alerts when thresholds are crossed (the specific thresholds will depend on the way you define your capacity planning).
Since you are operating at a capacity of N+2, performing periodic chaos engineering tests which will ensure that these assumptions are correct.
As part of the design of the application, find out which resources represent the bottlenecks and saturation metrics for the application (is it number of records written per second? number of items in queue? number of responses per second?) and try to figure out:
- How to measure these metrics
- Develop a runbook to detail what needs to be done when a threshold is met (scale out resources, throttle requests, etc…)
Note that both the metrics and the runbooks may change during the lifecycle of the application.
Without proper planning, the JDK running inside application containers may attempt to allocate a larger than necessary slice of the node’s memory. The default OpenJDK settings do not work well with containerized environments. As a result, some additional Java memory settings must always be provided whenever running the OpenJDK in a container.
- Configure OpenJDK settings
- Override JVM maximum heap size
- Configure memory release on node
- Configure JVM resource limits in pod
By setting quotas and limits, you can make sure that applications will not attempt to allocate too many resources and pose capacity problems for the entire cluster. Note however that by creating hard resource limits for applications you are limiting their capability to scale-out automatically and putting application performance at risk. A high quality of monitoring is necessary to detect when applications are nearing their quotas. Take care to balance the necessity of application performance vs keeping spare capacity for emergencies.
If you know that an application can be idled (perhaps it’s only needed on specific days of the month), then you can scale them down and free up resources. Note that these applications will scale back up as soon as network traffic is directed back to them.
All of the tasks detailed above are relevant to vanilla Kubernetes, but the specific tools may not be available out of the box. One exception is the
cluster-capacity tool which is available through the Kubernetes Special Interest Group:
OpenShift comes with a series of tools which help SREs manage the cluster capacity - both measuring it and freeing up available resources. The monitoring stack which comes with OpenShift is also useful in detecting potential capacity issues. The following series of articles How Full is my Cluster describe a Capacity Management in OpenShift 3.x, which is relevant for 4.x too:
- How Full is My Cluster? Capacity Management and Monitoring on OpenShift
- How Full is My Cluster – Part 2: Protecting the Nodes
- How Full is My Cluster – Part 3: Capacity Management
- How Full is My Cluster – Part 4: Right-Sizing Pods with Vertical Pod Autoscaler
- How Full is My Cluster – Part 5: A Capacity Management Dashboard
In addition to the capabilities in OpenShift, OpenShift on IBM Cloud which is IBM’s managed offering has the additional monitoring service with SysDig which can present potential capacity problems in infrastructure and applications.
In addition to the capabilities in OpenShift, IBM Cloud Pak for Multicloud Management has the additional monitoring tool IBM Cloud App Management (ICAM) which can present potential capacity problems in infrastructure and applications. ICAM specializes in monitoring and alerting against the golden signals, of which Saturation reflects capacity problems.