Skip to main contentIBM Cloud Pak Playbook

OpenShift Platform Day2 - Security

Security Overview

In the Security lifecycle of Assess, Prevent, Detect and Respond, Day2 Operational Security generally focuses on Detect and Respond. It does, however, also depend of strong preventative measures being in place, from Day 1.

Throughout the infrastructure - the Operating system, Container Platform, Pods, Container, Images etc, a ZeroTrust model must be adopted. This means to ensure that by default the most restrictive security controls are in place, and access is only grated explicitly when needed.

In some cases, where higher-level administrative access is required, extended user privileges are necessary to allow administrators to perform their jobs. In these cases it is critical that all access and actions performed by these administrators are logged and monitored for bad behavior, either malicious or accidental.

Day 1 Platform (Pre-requisites)

There is sometimes an overlap between what constitutes Day 1 versus Day 2 activities. It is generally assumed that the following tasks have been performed as Day 1 activities, and are therefore considered pre-requisites. If they have not been completed, they should be considered Day 2 activities.

Day 2 Platform

Day 1 Application

  • Secure Coding Practices (Application Code scanning. Security TDD. Validating Input/Outputs etc.)
  • Application Hardening – ensure all backdoor accesses are removed! (local users / default passwords etc.)
  • External Application Penetration Testing
  • Threat Modeling
  • Application Continue Delivery Pipeline setup

Day 2 Application

Mapping to Personas

Personatask
SRERotate Administrative credentials
SREUpdate Platform (and Host) security patches (Rolling Updates)
SRECreate new Namespaces for new application(s)
Security Expert, SREContainer security (signing container images etc.)
Security Expert, SREManage certificates
Security ExpertPerform regular (daily/weekly) vulnerability scans
Security ExpertRepeat Pen testing (6 months - one year)
Security ExpertSecurity Threat Monitoring with integrated Incident Response
Security Expert, SREApplication Security Policy
Security Expert, SREValidate onboard application to threat platform
Security Expert, SRESetup external Certificates (application)
Security Expert, SREApplication Users access channels policies setup: authentication, Firewalls, NATs, Load-balance, Network security, routes

Rotate Administrative Credentials: [ SRE ]

The shorter the lifetime of a secret or credential the harder it is for an attacker to make use of that credential. Set short lifetimes on certificates and automate their rotation. Use an authentication provider that can control how long issued tokens are available and use short lifetimes where possible. If you use service account tokens in external integrations, plan to rotate those tokens frequently. For example, once the bootstrap phase is complete, a bootstrap token used for setting up nodes should be revoked or its authorization removed. It’s also recommended that ownership of resources are rotated, making it hard for a single person to always have access to a single resource indefinitely makes it harder for a compromise to happen.

Perform Regular Vulnerability Scans: [ Security Expert ]

With the increase in adoption of Kubernetes among companies, there is an ever-increasing risk of exposure of the hosted business applications.With the increasing number of complex configuration controls, chances of human error in the administration are gradually increasing, leaving the entire environment of an organization vulnerable to cybersecurity incidents. With that in mind, the Center of Internet Security (CIS) Provided companies with a set of guidelines into security their Kubernetes environment as best practices into securing it, one of them being running regular platform scans to help mapping where the platform can be vulnerable and required to be patched. One of the tools used in openshift to map vulnerabilities can be openscap, which is an open-source tool that maps container vulnerabilities or black-duck. Defining a schedule for such task can be a challenge as well as become overwhelming if done in a short period of time. The recommended approach is to use an automated tool to execute at least a weekly scan and create a pipeline of vulnerabilities to be fixed on a prioritized order so that high score vulnerabilities will be prioritized sooner than low score ones but in a way that all of them will be tackled in future releases.

Update Platform (and Host) Security Patches (Rolling Updates): [ SRE ]

A major step into preventing vulnerabilities to allow your system to be compromised is to map the current vulnerabilities and based on them, apply updates that will address such vulnerabilities. The update policy shifts from a purely feature based update into a security-based update that will keep your system healthier from a risk perspective. Evaluating what update makes sense is also critical during this phase because a newer version might introduce a higher risk-vulnerability to your system. The recommendation is that each version is evaluated from a risk perspective and that updates, after being vetted, are applied through execution of rolling updates. Rolling updates allow Deployments’ update to take place with zero downtime by incrementally updating Pods instances with new ones.

Create and Manage a Namespaces for application(s) : [ SRE ]

In order to improve access-control to applications a namespaced access control setup can be leveraged so that policies are applied on a namespace basis, allowing for a given user or application to have or not access to a certain action, such as viewing secrets or updating a deployment.

Container security (signing container images): [ Security Expert, SRE ]

When building images that will ultimately run in a production Kubernetes cluster, be sure to follow best practices for packaging and distributing applications. For example, don’t build containers with passwords baked in—and this includes not just in the final layer, but any layers in the image. One of the counterintuitive problems introduced by container layers is that deleting a file in one layer doesn’t delete that file from preceding layers. It still takes up space, and it can be accessed by anyone with the right tools—an enterprising attacker can simply create an image that only consists of the layers that contain the password.
Secrets and images should never be mixed. If you do so, you will be hacked, and you will bring shame to your entire company or department.

Just like an RPM signing, container image singing can be used to verify authenticity of a container image created by users/developers throughout image life-cycle. Container image signing on Red Hat Enterprise Linux (RHEL) systems provides a means of:

  1. Validating where a container image came from,
  2. Checking that the image has not been tampered with, and
  3. Setting policies to determine which validated images can be pulled to a host.

We explain Images in more detail in here.

More on how to sign container images in here and here.

Security Threat Management: [ Security Expert ]

Threat management can be defined by: 1. Monitor: Enabling log and flow (network) data generation from applications to be fed into a centralized system that can correlate information and provide actionable insight and context of a given information, often called SIEM (Security Information Event Management). This platform can be a log aggregator or a more robust platform that allows for detection of threats such as the IBM QRadar. 2. Respond: System with pre-configured set of actions to be executed in case of an incident detection. Such system has the capabilities to automate specific tasks as well as collect artifacts and help on incident response. A system with such capabilities is IBM Resilient 3. Investigate: Capabilities provided by a combination of the Monitor and Respond systems that allows for a user to search and investigate through data providing insights over a particular and possible threat such as Behavior analytics, AI, Reporting and more.

Execute periodically penetration testing: [ Security Expert ]

Although scans will provide a holistic view on current security issues that need to be addressed in the platform, there needs to be proof that a vulnerable system or code can be exploited and cause harm to a company such as leaking sensitive content. In any scenario, a threat exploits a vulnerability that leads to risk of exposure. The way of preventing it is to apply a countermeasure and the way of mapping the most effective countermeasure is by asking our own penetration testing team to exploit the vulnerabilities found in the scanners regularly to define which has the highest risk of leading to exposure. Penetration testing has to generate accurate reports on how the exploits took place and which vulnerabilities were exploited so that countermeasures can be defined effectively afterwards. The recommended approach is that at least once a quarter this is executed as well as after the release of a major version so that responses and hardening policies are applied more frequently as well as more tactically.

Application Security Policy: [ Security Expert, SRE ]

In this activity we setup the Applications User ID, and the Application’s Administrative User.

You would provide an application with a unique identity in order to control access to resources on a fine-grained level. You can create a service account and use it in a pod specification.

We explain Service Account in here.

Cluster administrators can use the cluster roles and bindings to control who has various access levels to the OpenShift Container Platform platform itself and all projects. Developers can use local roles and bindings to control who has access to their projects. Note that authorization is a separate step from authentication, which is more about determining the identity of who is taking the action and keep track of such user.

More on how to set up RBAC in here and here.

We also explain Role-Based Access Control in here.

Setup external Certificates (application): [ Security Expert, SRE ]

When setting up applications facing public web, it’s recommended that a good PKI (Public Key Infrastructure) is in place to guarantee that only trusted certificates and certificate authorities are in place, applying strong reliability to the application via digital certificates. In order to setup certificates in a Kubernetes cluster, Kubernetes provides a certificates.k8s.io API, which lets you provision TLS certificates signed by a Certificate Authority (CA) that you control. These CA and certificates can be used by your workloads to establish trust. In order to get an external certificate, first it’s necessary to contact a valid, external certificate authority. Each customer tends to hire services from a determined certificate authority, the application expert needs to contact the customer to find which team handles requests of certificates and then, after this is done, create a CSR (Certificate Signing Request). A CSR will allow the certificate authority to sign the certificate for the cluster and then return the certificate to be installed in the application.

Application Users access channels policies setup: authentication, Firewalls, NATs, Load-balance, Network security, routes: [ Security Expert, SRE ]

Let’s define chanel first:

  • It’s component in a media that a user has to cross in order to reach a resource.

For this particular example, in order to make it more clear we will use the following:

|Web-application: Port A| ----| |—|Load-balancer|—|Switch|—|Firewall|—|Switch|—|User| |Web-application: Port B| ----|

The topology above has the following components:

  • Application web cluster: A number of application instances that are a part of a load-balancer farm
  • Load-Balancer: A device responsible for managing multiple requests and balance them through the configured instances in a farm
  • Farm: A combination of application instances
  • Switch: Physical connection (network) in an infrastructure
  • Firewall: Controls inbound/outbound traffic through filters, sometimes can be a stateful or stateless firewall
  • User: Individual trying to access a resource

In summary, each of these are a media and have their own channel policies, for example, the application that resides in the )penShift container will have its own firewall rules, the Load-balance will have their own load-balancing policies as well as access control policies as well as the switches and even the user-machine has its own firewalls and access policies.

Its imperative that each channel has its access policies well defined and applied, creating what’s called a layered security approach, protecting the application that is in the last layer of the channels from being compromised. Each component in the channel can act as a compensative control for a vulnerability if necessary so policies also have to be updated as the need arises from a vulnerability scanner or a penetration testing. Keeping those channel policies up to date and well defined and documented help securing the application even more.

The recommendation is that each of the policies is tested by a security professional during day-2 to guarantee that there’s no opened policies that would allow for a compromise and that during day-2, teams are ready to update policies in the necessary channels as they arise.

Validate Onboard Application to Threat Management Platform: [ Security Expert, SRE ]

During the deployment phase, applications have to be onboard into the monitoring component of the Threat Management system, allowing for analysts to monitor its behavior and patterns in order to be able to report into security incidents in case those are detected. it’s recommended that all applications are onboarded through a secure connection and send syslog data to the monitoring system. Once the application is deployed it requires validation that it has been onboarded in both parties, the monitoring system and the application itself. Start by checking if the application has this configuration. Move onto accessing the monitoring system and validating it has syslog data for the onboarded application by creating a filter in the monitoring application, in this case we will use an example of the IBM QRadar filter demonstrated in here.

RBAC: [ ]

Security default Polices for Administrators with just enough access to do their job. Cluster administrators can use the cluster roles and bindings to control who has various access levels to the OpenShift Container Platform platform itself and all projects.

More on how to set up RBAC in here and here.

We also explain Role-Based Access Control in here.

Create Cluster-wide Pod Security Policy

Securing Pods: [ ]

You can define the Security Context for an application on the pod level. For example, you want to run the application as a nonpriviledged process or restrict the types of volumes the application can access.
You can use the securityContext field in a pod specification to enforce policies on the pod level in OpenShift cluster.

PodSecuirtyPolicy: [ ]

By using Pod’s SecurityContext function, Pod security can be enhanced. However, we cannot guarantee that the SecurityContext is correctly specified for all pods deployed in the cluster. PodSecurityPolicy is a function that prohibits the deployment of Pod that does not satisfy the policy by defining the policy that the setting of the Pod should satisfy. Multiple policies can be defined and different policies can be applied depending on the user who creates the pod. By using this feature, you can maintain a consistent quality of Pod security settings across the cluster.

Securely Manage All Platform Secrets

Encryption Secrets: [ ]

You can encrypt data stored in etcd. Enabling etcd encryption for your cluster provides an additional layer of data security. When you enable etcd encryption, the following OpenShift API server and Kubernetes API server resources are encrypted:

  • Secrets
  • ConfigMaps
  • Routes
  • OAuth access tokens
  • OAuth authorize tokens

Set minimum privilege with seccomp profiles: [ ]

Seccomp is a system call filtering facility in the Linux kernel which lets applications define limits on system calls they may make, and what should happen when system calls are made. Seccomp is used to reduce the attack surface available to applications.

More on Seccomp on here and here.

NetworkPolicy: [ ]

If a container is taken over by an attacker, it is an effective measure to limit the containers that can communicate in advance in order to prevent the damage from spreading to other containers deployed in the cluster. OpenShift has a function called NetworkPolicy to restrict communication between pods.

By default, all Pods in a project are accessible from other Pods and network endpoints. To isolate one or more Pods in a project, you can create NetworkPolicy objects in that project to indicate the allowed incoming connections. Project administrators can create and delete NetworkPolicy objects within their own project.

Setup Platform Certificates

The Ignition config files that the installation program generates contain certificates that expire after 24 hours. You must complete your cluster installation and keep the cluster running for 24 hours in a non-degraded state to ensure that the first certificate rotation has finished.

Certificate Management: [ SRE ]

OpenShift Container Platform creates certificates by default for the Ingress Operator, the API server, and for services needed by complex middleware applications that require encryption. At some point, you may need to change, add, and rotate these certificates.

Certificates are used to provide secure connections to master and nodes, Ingress controller and registry, etcd.

Automated service CA rotation is available. In previous versions, the service CA did not automatically renew, leading to service disruption and requiring manual intervention. The service CA and signing key now auto-rotate before expiration. This allows administrators to plan for their environments in advance, avoiding service disruption.

Optionally configure external endpoints to use custom certificates.
You cannot replace the Root CA of an OpenShift 4 cluster. There are currently no plans to change this.

Here is a list of tasks for Day 2 Security domain:

Understanding and replacing the default ingress certificate: [ SRE ]

By default OpenShift Container Platform uses the Ingress Operator to create an internal CA and issue a wildcard certificate that is valid for applications under the .apps sub-domain. Both the web console and CLI use this certificate as well.

The internal infrastructure CA certificates are self-signed. While this process might be perceived as bad practice by some security or PKI teams, any risk here is minimal. The only clients that implicitly trust these certificates are other components within the cluster.

Replacing the default wildcard certificate with one that is issued by a public CA already included in the CA bundle as provided by the container userspace allows external clients to connect securely to applications running under the .apps sub-domain.
After you replace the certificate, all applications, including the web console and CLI, will have encryption provided by specified certificate.

Adding API server certificates: [ SRE ]

The default API server certificate is issued by an internal OpenShift Container Platform cluster CA. Clients outside of the cluster will not be able to verify the API server’s certificate by default. This certificate can be replaced by one that is issued by a CA that clients trust.
You can add additional certificates to the API server to send based on the client’s requested URL, such as when a reverse proxy or load balancer is used.

Securing service traffic using service serving certificate secrets: [ DevOps Engineer ]

Service serving certificates are intended to support complex middleware applications that require encryption (in other words, it needs out-of-the-box certificates). These certificates are issued as TLS web server certificates which means that it has the same settings as the server certificates generated by the administrator tooling for nodes and masters.

The service-ca controller uses the x509.SHA256WithRSA signature algorithm to generate service certificates.

The generated certificate and key are in PEM format, stored in tls.crt and tls.key respectively, within a created secret.

The service CA certificate, which signs the service certificates, is only valid for one year after OpenShift Container Platform is installed.
The certificate and key are automatically replaced when they get close to expiration.
You can rotate the service certificate by deleting the associated secret. Deleting the secret results in a new one being automatically created, resulting in a new certificate.

Setting up additional trusted certificate authorities for builds: [ DevOps Engineer ]

You can add certificate authorities (CA) to be trusted by builds when pulling images from an image registry.

Creating a secret from a .gitconfig file for secured git: [ DevOps Engineer ]

If your Git server is secured with two-way SSL and user name with password, you must add the certificate files to your source build and add references to the certificate files in the .gitconfig file.

Creating a secret from source code trusted certificate authorities: [ ]

The set of TLS certificate authorities (CA) that are trusted during a git clone operation are built into the OpenShift Container Platform infrastructure images. If your Git server uses a self-signed certificate or one signed by an authority not trusted by the image, you can create a secret that contains the certificate or disable TLS verification.

If you create a secret for the CA certificate, OpenShift Container Platform uses it to access your Git server during the git clone operation. Using this method is significantly more secure than disabling Git’s SSL verification, which accepts any TLS certificate that is presented.

Adding TLS certificates for authenticating DataVolume imports: [ ]

TLS certificates for registry or HTTPS endpoints must be added to a ConfigMap in order to import data from these sources. This ConfigMap must be present in the namespace of the destination DataVolume.

Adding TLS certificates for sending logs to external devices: [ ]

You can configure Fluentd to send a copy of its logs to an external log aggregator, and not the default Elasticsearch, using the forward plug-in.
You can add any certificates required by your external devices (external log aggregator) to a secret, called secure-forward, which is mounted to the Fluentd Pods.

Certificate signing requests management: [ ]

Because your cluster has limited access to automatic machine management when you use infrastructure that you provision, you must provide a mechanism for approving cluster certificate signing requests (CSRs) after installation. The kube-controller-manager only approves the kubelet client CSRs. The machine-approver cannot guarantee the validity of a serving certificate that is requested by using kubelet credentials because it cannot confirm that the correct machine issued the request. You must determine and implement a method of verifying the validity of the kubelet serving certificate requests and approving them.

Implementing Security Management

TBD

Kubernetes

TBD

OpenShift

TBD

On IBM Cloud

Security for Red Hat OpenShift on IBM Cloud:
https://cloud.ibm.com/docs/openshift?topic=openshift-security

With IBM Cloud Pak for MCM

TBD

Others

Other consideration

Reference

9 Kubernetes Security Best Practices Everyone Must Follow
As organizations accelerate their adoption of containers and container orchestrators, they will need to take necessary steps to protect such a critical part of their compute infrastructure. To help in this endeavor, check out these nine Kubernetes security best practices, based on customer input, you should follow to help protect your infrastructure.

  1. Upgrade to the Latest Version
  2. Enable Role-Based Access Control (RBAC)
  3. Use Namespaces to Establish Security Boundaries
  4. Separate Sensitive Workloads
  5. Secure Cloud Metadata Access
  6. Create and Define Cluster Network Policies
  7. Run a Cluster-wide Pod Security Policy
  8. Harden Node Security
  9. Turn on Audit Logging

Securing a Cluster

  • Controlling access to the Kubernetes API
  • Controlling access to the Kubelet
  • Controlling the capabilities of a workload or user at runtime
  • Protecting cluster components from compromise

Overview of Cloud Native Security
Kubernetes Security (and security in general) is an immense topic that has many highly interrelated parts. In today’s era where open source software is integrated into many of the systems that help web applications run, there are some overarching concepts that can help guide your intuition about how you can think about security holistically.

6 Kubernetes Security Best Practices: Secure Your Workloads

  1. Enable Role-Based Access Control (RBAC)
  2. Keep Your Secrets Secret
  3. Restrict Pod-to-Pod Traffic With a Kubernetes Network Policy
  4. Enhance Pod Security
  5. Kubernetes Rolling Updates
  6. Define Audit Policies