This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Best Practices

Best Practices when running Capsule in production

1 - Architecture

Architecture references and considerations

Ownership

In Capsule, we introduce a new persona called the Tenant Owner. The goal is to enable Cluster Administrators to delegate tenant management responsibilities to Tenant Owners. Here’s how it works:

  • Tenant Owners: They manage the namespaces within their tenants and perform administrative tasks confined to their tenant boundaries. This delegation allows teams to operate more autonomously while still adhering to organizational policies.
  • Cluster Administrators: They provision tenants, essentially determining the size and resource allocation of each tenant within the entire cluster. Think of it as defining how big each piece of cake (Tenant) should be within the whole cake (Cluster).

Capsule provides robust tools to strictly enforce tenant boundaries, ensuring that each tenant operates within its defined limits. This separation of duties promotes both security and efficient resource management.

Key Decisions

Introducing a new separation of duties can lead to a significant paradigm shift. This has technical implications and may also impact your organizational structure. Therefore, when designing a multi-tenant platform pattern, carefully consider the following aspects. As Cluster Administrator, ask yourself:

  • 🔑 How much ownership can be delegated to Tenant Owners (Platform Users)?

The answer to this question may be influenced by the following aspects:

  • Are the Cluster Adminsitrators willing to grant permissions to Tenant Owners?

    • You might have a problem with know-how and probably your organisation is not yet pushing Kubernetes itself enough as a key strategic plattform. The key here is enabling Plattform Users through good UX and know-how transfers
  • Who is responsible for the deployed workloads within the Tenants??

    • If Platform Administrators are still handling this, a true “shift left” has not yet been achieved.
  • Who gets paged during a production outage within a Tenant’s application??

    • You’ll need robust monitoring that enables Tenant Owners to clearly understand and manage what’s happening inside their own tenant.
  • Are your customers technically capable of working directly with the Kubernetes API??

    • If not, you may need to build a more user-friendly platform with better UX — for example, a multi-tenant ArgoCD setup, or UI layers like Headlamp.

Layouts

Let’s dicuss different Tenant Layouts which could be used . These are just approaches we have seen, however you might also find a combination of these which fits your use-case.

Tenant As A Service

With this approach you essentially just provide your Customers with the Tenant on your cluster. The rest is their responsability. This concludes to a shared responsibility model. This can be achieved when also the Tenant Owners are responsible for everything they are provisiong within their Tenant’s namespaces.

Resourcepool Dashboard

Scheduling

Workload distribution across your compute infrastructure can be approached in various ways, depending on your specific priorities. Regardless of the use case, it’s essential to preserve maximum flexibility for your platform administrators. This means ensuring that:

  • Nodes can be drained or deleted at any time.
  • Cluster updates can be performed at any time.
  • The number of worker nodes can be scaled up or down as needed.

If your cluster architecture prevents any of these capabilities, or if certain applications block the enforcement of these policies, you should reconsider your approach.

Dedicated

Strong tenant isolation, ensuring that any noisy neighbor effects remain confined within individual tenants (tenant responsibility). This approach may involve higher administrative overhead and costs compared to shared compute. It also provides enhanced security by dedicating nodes to a single customer/application. It is recommended, at a minimum, to separate the cluster’s operator workload from customer workloads.

Dedicated Nodepool

Shared

With this approach you share the nodes amongst all Tenants, therefor giving you more potential for optimizing resources on a node level. It’s a common pattern to separate the controllers needed to power your Distribution (operators) form the actual workload. This ensures smooth operations for the cluster

Overview:

  • ✅ Designed for cost efficiency .
  • ✅ Suitable for applications that typically experience low resource fluctuations and run with multiple replicas.
  • ❌ Not ideal for applications that are not cloud-native ready, as they may adversely affect the operation of other applications or the maintenance of node pools.
  • ❌ Not ideal if strong isolation is required

Shared Nodepool

We provide the concept of ResourcePools to manage resources cross namespaces. There’s some further aspects you must think about with shared approaches:

2 - General Advice

This is general advice you should consider before making Kubernetes Distribution consideration

This is general advice you should consider before making Kubernetes Distribution consideration. They are partly relevant for Multi-Tenancy with Capsule.

Authentication

User authentication for the platform should be handled via a central OIDC-compatible identity provider system (e.g., Keycloak, Azure AD, Okta, or any other OIDC-compliant provider). The rationale is that other central platform components — such as ArgoCD, Grafana, Headlamp, or Harbor — should also integrate with the same authentication mechanism. This enables a unified login experience and reduces administrative complexity in managing users and permissions.

Capsule relies on native Kubernetes RBAC, so it’s important to consider how the Kubernetes API handles user authentication.

OCI Pull-Cache

By default, Kubernetes clusters pull images directly from upstream registries like docker.io, quay.io, ghcr.io, or gcr.io. In production environments, this can lead to issues — especially because Docker Hub enforces rate limits that may cause image pull failures with just a few nodes or frequent deployments (e.g., when pods are rescheduled).

To ensure availability, performance, and control over container images, it’s essential to provide an on-premise OCI mirror. This mirror should be configured via the CRI (Container Runtime Interface) by defining it as a mirror endpoint in registries.conf for default registries (e.g., docker.io). This way, all nodes automatically benefit from caching without requiring developers to change image URLs.

Secrets Management

In more complex environments with multiple clusters and applications, managing secrets manually via YAML or Helm is no longer practical. Instead, a centralized secrets management system should be established — such as Vault, AWS Secrets Manager, Azure Key Vault, or the CNCF project OpenBao (formerly the Vault community fork).

To integrate these external secret stores with Kubernetes, the External Secrets Operator (ESO) is a recommended solution. It automatically syncs defined secrets from external sources as Kubernetes secrets, and supports dynamic rotation, access control, and auditing.

If no external secret store is available, there should at least be a secure way to store sensitive data in Git. In our ecosystem, we provide a solution based on SOPS (Secrets OPerationS) for this use case.

👉 Demonstration

3 - Pod Security Standards

Control the security of the pods running in the tenant namespaces

In Kubernetes, by default, workloads run with administrative access, which might be acceptable if there is only a single application running in the cluster or a single user accessing it. This is seldom required and you’ll consequently suffer a noisy neighbour effect along with large security blast radiuses.

Many of these concerns were addressed initially by PodSecurityPolicies which have been present in the Kubernetes APIs since the very early days.

The Pod Security Policies are deprecated in Kubernetes 1.21 and removed entirely in 1.25. As replacement, the Pod Security Standards and Pod Security Admission has been introduced. Capsule support the new standard for tenants under its control as well as the oldest approach.

Pod Security Standards

One of the issues with Pod Security Policies is that it is difficult to apply restrictive permissions on a granular level, increasing security risk. Also the Pod Security Policies get applied when the request is submitted and there is no way of applying them to pods that are already running. For these, and other reasons, the Kubernetes community decided to deprecate the Pod Security Policies.

As the Pod Security Policies get deprecated and removed, the Pod Security Standards is used in place. It defines three different policies to broadly cover the security spectrum. These policies are cumulative and range from highly-permissive to highly-restrictive:

  • Privileged: unrestricted policy, providing the widest possible level of permissions.
  • Baseline: minimally restrictive policy which prevents known privilege escalations.
  • Restricted: heavily restricted policy, following current Pod hardening best practices.

Kubernetes provides a built-in Admission Controller to enforce the Pod Security Standards at either:

  1. cluster level which applies a standard configuration to all namespaces in a cluster
  2. namespace level, one namespace at a time

For the first case, the cluster admin has to configure the Admission Controller and pass the configuration to the kube-apiserver by mean of the --admission-control-config-file extra argument, for example:

apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1beta1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "baseline"
      enforce-version: "latest"
      warn: "restricted"
      warn-version: "latest"
      audit: "restricted"
      audit-version: "latest"
    exemptions:
      usernames: []
      runtimeClasses: []
      namespaces: [kube-system]

For the second case, he can just assign labels to the specific namespace he wants enforce the policy since the Pod Security Admission Controller is enabled by default starting from Kubernetes 1.23+:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted
  name: development

Capsule

According to the regular Kubernetes segregation model, the cluster admin has to operate either at cluster level or at namespace level. Since Capsule introduces a further segregation level (the Tenant abstraction), the cluster admin can implement Pod Security Standards at tenant level by simply forcing specific labels on all the namespaces created in the tenant.

You can distribute these profiles via namespace. Here’s how this could look like:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  namespaceOptions:
    additionalMetadataList:
    - namespaceSelector:
        matchExpressions:
          - key: projectcapsule.dev/low_security_profile
            operator: NotIn
            values: ["system"]
      labels:
        pod-security.kubernetes.io/enforce: restricted
        pod-security.kubernetes.io/warn: restricted
        pod-security.kubernetes.io/audit: restricted
    - namespaceSelector:
        matchExpressions:
          - key: company.com/env
            operator: In
            values: ["system"]
      labels:
        pod-security.kubernetes.io/enforce: privileged
        pod-security.kubernetes.io/warn: privileged
        pod-security.kubernetes.io/audit: privileged

All namespaces created by the tenant owner, will inherit the Pod Security labels:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    capsule.clastix.io/tenant: solar
    kubernetes.io/metadata.name: solar-development
    name: solar-development
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted
  name: solar-development
  ownerReferences:
  - apiVersion: capsule.clastix.io/v1beta2
    blockOwnerDeletion: true
    controller: true
    kind: Tenant
    name: solar

and the regular Pod Security Admission Controller does the magic:

kubectl --kubeconfig alice-oil.kubeconfig apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: solar-production
spec:
  containers:
  - image: nginx
    name: nginx
    ports:
    - containerPort: 80
    securityContext:
      privileged: true
EOF

The request gets denied:

Error from server (Forbidden): error when creating "STDIN":
pods "nginx" is forbidden: violates PodSecurity "baseline:latest": privileged
(container "nginx" must not set securityContext.privileged=true)

If the tenant owner tries to change o delete the above labels, Capsule will reconcile them to the original tenant manifest set by the cluster admin.

As additional security measure, the cluster admin can also prevent the tenant owner to make an improper usage of the above labels:

kubectl annotate tenant solar \
  capsule.clastix.io/forbidden-namespace-labels-regexp="pod-security.kubernetes.io\/(enforce|warn|audit)"

In that case, the tenant owner gets denied if she tries to use the labels:

kubectl --kubeconfig alice-solar.kubeconfig label ns solar-production \
    pod-security.kubernetes.io/enforce=restricted \
    --overwrite

Error from server (Label pod-security.kubernetes.io/audit is forbidden for namespaces in the current Tenant ...

User Namespaces

A process running as root in a container can run as a different (non-root) user in the host; in other words, the process has full privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace. Read More

Kubelet

On your Kubelet you must use the FeatureGates:

  • UserNamespacesSupport
  • UserNamespacesPodSecurityStandards (Optional)

Sysctls

user.max_user_namespaces: "11255"

Admission (Kyverno)

To make sure all the workloads are forced to use dedicated User Namespaces, we recommend to mutate pods at admission. See the following examples.

Kyverno

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-hostusers-spec
  annotations:
    policies.kyverno.io/title: Add HostUsers
    policies.kyverno.io/category: Security
    policies.kyverno.io/subject: Pod,User Namespace
    kyverno.io/kubernetes-version: "1.31"
    policies.kyverno.io/description: >-
      Do not use the host's user namespace. A new userns is created for the pod. 
      Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root
      without actually having root privileges on the host. This field is
      alpha-level and is only honored by servers that enable the
      UserNamespacesSupport feature.
spec:
  rules:
  - name: add-host-users
    match:
      any:
      - resources:
          kinds:
          - Pod
          namespaceSelector:
            matchExpressions:
            - key: capsule.clastix.io/tenant
              operator: Exists
    preconditions:
      all:
      - key: "{{request.operation || 'BACKGROUND'}}"
        operator: AnyIn
        value:
          - CREATE
          - UPDATE
    mutate:
      patchStrategicMerge:
        spec:
          hostUsers: false

Pod Security Policies

As stated in the documentation, “PodSecurityPolicies enable fine-grained authorization of pod creation and updates. A Pod Security Policy is a cluster-level resource that controls security sensitive aspects of the pod specification. The PodSecurityPolicy objects define a set of conditions that a pod must run with in order to be accepted into the system, as well as defaults for the related fields.”

Using the Pod Security Policies, the cluster admin can impose limits on pod creation, for example the types of volume that can be consumed, the linux user that the process runs as in order to avoid running things as root, and more. From multi-tenancy point of view, the cluster admin has to control how users run pods in their tenants with a different level of permission on tenant basis.

Assume the Kubernetes cluster has been configured with Pod Security Policy Admission Controller enabled in the APIs server: --enable-admission-plugins=PodSecurityPolicy

The cluster admin creates a PodSecurityPolicy:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: psp:restricted
spec:
  privileged: false
  # Required to prevent escalations to root.
  allowPrivilegeEscalation: false

Then create a ClusterRole using or granting the said item

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: psp:restricted
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  resourceNames: ['psp:restricted']
  verbs: ['use']

He can assign this role to all namespaces in a tenant by setting the tenant manifest:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  additionalRoleBindings:
  - clusterRoleName: psp:privileged
    subjects:
    - kind: "Group"
      apiGroup: "rbac.authorization.k8s.io"
      name: "system:authenticated"

With the given specification, Capsule will ensure that all tenant namespaces will contain a RoleBinding for the specified Cluster Role:

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: 'capsule-solar-psp:privileged'
  namespace: solar-production
  labels:
    capsule.clastix.io/tenant: solar
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: 'system:authenticated'
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: 'psp:privileged'

Capsule admission controller forbids the tenant owner to run privileged pods in solar-production namespace and perform privilege escalation as declared by the above Cluster Role psp:privileged.

As tenant owner, creates a namespace:

kubectl --kubeconfig alice-solar.kubeconfig create ns solar-production

and create a pod with privileged permissions:

kubectl --kubeconfig alice-solar.kubeconfig apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: solar-production
spec:
  containers:
  - image: nginx
    name: nginx
    ports:
    - containerPort: 80
    securityContext:
      privileged: true
EOF

Since the assigned PodSecurityPolicy explicitly disallows privileged containers, the tenant owner will see her request to be rejected by the Pod Security Policy Admission Controller.

4 - Networking

Multi-Tenant Networking considerations

Network-Policies

It’s a best practice to not allow any traffic outside of a tenant (or a tenant’s namespace). For this we can use Tenant Replications to ensure we have for every namespace Networkpolicies in place.

The following NetworkPolicy is distributed to all namespaces which belong to a Capsule tenant:

apiVersion: capsule.clastix.io/v1beta2
kind: GlobalTenantResource
metadata:
  name: default-networkpolicies
  namespace: solar-system
spec:
  resyncPeriod: 60s
  resources:
    - rawItems:
        - apiVersion: networking.k8s.io/v1
          kind: NetworkPolicy
          metadata:
            name: default-policy
          spec:
            # Apply to all pods in this namespace
            podSelector: {}
            policyTypes:
              - Ingress
              - Egress
            ingress:
              # Allow traffic from the same namespace (intra-namespace communication)
              - from:
                  - podSelector: {}

              # Allow traffic from all namespaces within the tenant
              - from:
                  - namespaceSelector:
                      matchLabels:
                        capsule.clastix.io/tenant: "{{tenant.name}}"

              # Allow ingress from other namespaces labeled (System Namespaces, eg. Monitoring, Ingress)
              - from:
                  - namespaceSelector:
                      matchLabels:
                        company.com/system: "true"

            egress:
              # Allow DNS to kube-dns service IP (might be different in your setup)
              - to:
                  - ipBlock:
                      cidr: 10.96.0.10/32
                ports:
                  - protocol: UDP
                    port: 53
                  - protocol: TCP
                    port: 53

              # Allow traffic to all namespaces within the tenant
              - to:
                  - namespaceSelector:
                      matchLabels:
                        capsule.clastix.io/tenant: "{{tenant.name}}"

Deny Namespace Metadata

In the above example we allow traffic from namespaces with the label company.com/system: "true". This is meant for Kubernetes Operators to eg. scrape the workloads within a tenant. However without further enforcement any namespace can set this label and therefor gain access to any tenant namespace. To prevent this, we must restrict, who can declare this label on namespaces.

We can deny such labels on tenant basis. So in this scenario every tenant should disallow the use of these labels on namespaces:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  namespaceOptions:
    forbiddenLabels:
      denied:
          - company.com/system

Or you can implement a Kyverno-Policy, which solves this.

Non-Native Network-Policies

The same principle can be applied with alternative CNI solutions. In this example we are using Cilium:

apiVersion: capsule.clastix.io/v1beta2
kind: GlobalTenantResource
metadata:
  name: default-networkpolicies
  namespace: solar-system
spec:
  resyncPeriod: 60s
  resources:
    - rawItems:
        - apiVersion: cilium.io/v2
          kind: CiliumNetworkPolicy
          metadata:
            name: default-policy
          spec:
            endpointSelector: {}  # Apply to all pods in the namespace
            ingress:
              - fromEndpoints:
                  - matchLabels: {}  # Same namespace pods (intra-namespace)
              - fromEntities:
                  - cluster  # For completeness; can be used to allow internal cluster traffic if needed
              - fromEndpoints:
                  - matchLabels:
                      capsule.clastix.io/tenant: "{{tenant.name}}"  # Pods in other namespaces with same tenant
              - fromNamespaces:
                  - matchLabels:
                      company.com/system: "true"  # System namespaces (monitoring, ingress, etc.)
          
            egress:
              - toCIDR:
                  - 10.96.0.10/32  # kube-dns IP
                toPorts:
                  - ports:
                      - port: "53"
                        protocol: UDP
                      - port: "53"
                        protocol: TCP
          
              - toNamespaces:
                  - matchLabels:
                      capsule.clastix.io/tenant: "{{tenant.name}}"  # Egress to all tenant namespaces

5 - Container Images

Multi-Tenant Container Images considerations

Until this issue is resolved (might be in Kubernetes 1.34)

it’s recommended to use the ImagePullPolicy Always for private registries on shared nodes. This ensures that no images can be used which are already pulled to the node.