Enforcement

Configure policies and restrictions on tenant-basis

Scheduling

LimitRanges

This feature will be deprecated in a future release of Capsule. Instead use TenantReplications

Bill, the cluster admin, can also set Limit Ranges for each Namespace in Alice’s Tenant by defining limits for pods and containers in the Tenant spec:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
...
  limitRanges:
    items:
      - limits:
          - type: Pod
            min:
              cpu: "50m"
              memory: "5Mi"
            max:
              cpu: "1"
              memory: "1Gi"
      - limits:
          - type: Container
            defaultRequest:
              cpu: "100m"
              memory: "10Mi"
            default:
              cpu: "200m"
              memory: "100Mi"
            min:
              cpu: "50m"
              memory: "5Mi"
            max:
              cpu: "1"
              memory: "1Gi"
      - limits:
          - type: PersistentVolumeClaim
            min:
              storage: "1Gi"
            max:
              storage: "10Gi"

Limits will be inherited by all the Namespaces created by Alice. In our case, when Alice creates the Namespace solar-production, Capsule creates the following:

apiVersion: v1
kind: LimitRange
metadata:
  name: capsule-solar-0
  namespace: solar-production
spec:
  limits:
    - max:
        cpu: "1"
        memory: 1Gi
      min:
        cpu: 50m
        memory: 5Mi
      type: Pod
---
apiVersion: v1
kind: LimitRange
metadata:
  name: capsule-solar-1
  namespace: solar-production
spec:
  limits:
    - default:
        cpu: 200m
        memory: 100Mi
      defaultRequest:
        cpu: 100m
        memory: 10Mi
      max:
        cpu: "1"
        memory: 1Gi
      min:
        cpu: 50m
        memory: 5Mi
      type: Container
---
apiVersion: v1
kind: LimitRange
metadata:
  name: capsule-solar-2
  namespace: solar-production
spec:
  limits:
    - max:
        storage: 10Gi
      min:
        storage: 1Gi
      type: PersistentVolumeClaim

Note: being the limit range specific of single resources, there is no aggregate to count.

Alice doesn’t have permission to change or delete the resources according to the assigned RBAC profile.

kubectl -n solar-production auth can-i patch resourcequota
no
kubectl -n solar-production auth can-i delete resourcequota
no
kubectl -n solar-production auth can-i patch limitranges
no
kubectl -n solar-production auth can-i delete limitranges
no

LimitRange Distribution with TenantReplications

In the future Cluster-Administrators must distribute LimitRanges via TenantReplications. This is a more flexible and powerful way to distribute LimitRanges, as it allows to distribute any kind of resource, not only LimitRanges. Here’s an example of how to distribute a LimitRange to all the Namespaces of a tenant:

apiVersion: capsule.clastix.io/v1beta2
kind: TenantResource
metadata:
  name: solar-limitranges
  namespace: solar-system
spec:
  resyncPeriod: 60s
  resources:
    - namespaceSelector:
        matchLabels:
          capsule.clastix.io/tenant: solar
      rawItems:
        - apiVersion: v1
          kind: LimitRange
          metadata:
            name: cpu-resource-constraint
          spec:
            limits:
            - default: # this section defines default limits
                cpu: 500m
              defaultRequest: # this section defines default requests
                cpu: 500m
              max: # max and min define the limit range
                cpu: "1"
              min:
                cpu: 100m
              type: Container

PriorityClasses

Pods can have priority. Priority indicates the importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. See Kubernetes documentation.

In a multi-tenant cluster, not all users can be trusted, as a tenant owner could create Pods at the highest possible priorities, causing other Pods to be evicted/not get scheduled.

To prevent misuses of Pod PriorityClass, Bill, the cluster admin, can enforce the allowed Pod PriorityClass at tenant level:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  priorityClasses:
    matchLabels:
      env: "production"

With the said Tenant specification, Alice can create a Pod resource if spec.priorityClassName equals to:

  • Any PriorityClass which has the label env with the value production

If a Pod is going to use a non-allowed PriorityClass, it will be rejected by the Validation Webhook enforcing it.

Assign Pod PriorityClass as tenant default

Note: This feature supports type PriorityClass only on API version scheduling.k8s.io/v1

This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (globalDefault=true) that acts only at the cluster level.

It’s possible to assign each Tenant a PriorityClass which will be used, if no PriorityClass is set on pod basis:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  priorityClasses:
    default: "tenant-default"
    matchLabels:
      env: "production"

Let’s create a PriorityClass which is used as the default:

kubectl apply -f - << EOF
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: tenant-default
  labels:
    env: "production"
value: 1313
preemptionPolicy: Never
globalDefault: false
description: "This is the default PriorityClass for the solar-tenant"
EOF

Note the globalDefault: false which is important to avoid the PriorityClass to be used as the default for all the Tenants. If a Pod has no value for spec.priorityClassName, the default value for PriorityClass (tenant-default) will be used.

RuntimeClasses

Pods can be assigned different RuntimeClasses. With the assigned runtime you can control Container Runtime Interface (CRI) is used for each pod. See Kubernetes documentation for more information.

To prevent misuses of Pod RuntimeClasses, Bill, the cluster admin, can enforce the allowed PodRuntimeClasses at Tenant level:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  runtimeClasses:
    matchLabels:
      env: "production"

With the said Tenant specification, Alice can create a Pod resource if spec.runtimeClassName equals to:

  • Any RuntimeClass which has the label env with the value production

If a Pod is going to use a non-allowed RuntimeClass, it will be rejected by the Validation Webhook enforcing it.

Assign Runtime Class as tenant default

This feature allows specifying a custom default value on a Tenant basis- It’s possible to assign each tenant a Runtime which will be used, if no Runtime is set on pod basis:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  runtimeClasses:
    default: "tenant-default"
    matchLabels:
      env: "production"

Let’s create a RuntimeClass which is used as the default:

kubectl apply -f - << EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: tenant-default 
  labels:
    env: "production"
handler: myconfiguration 
EOF

If a Pod has no value for spec.runtimeclass, the default value for RuntimeClass (tenant-default) will be used.

NodeSelector

Bill, the cluster admin, can dedicate a pool of worker nodes to the solar Tenant, to isolate the tenant applications from other noisy neighbors.

These nodes are labeled by Bill as pool=renewable

kubectl get nodes --show-labels

NAME                      STATUS   ROLES             AGE   VERSION   LABELS
...
worker06.acme.com         Ready    worker            8d    v1.25.2 pool=renewable
worker07.acme.com         Ready    worker            8d    v1.25.2   pool=renewable
worker08.acme.com         Ready    worker            8d    v1.25.2   pool=renewable

PodNodeSelector

This approach requires PodNodeSelector Admission Controller plugin to be active. If the plugin is not active, the pods will be scheduled to any node. If your distribution does not support this feature, you can use Expression Node Selectors.

The label pool=renewable is defined as .spec.nodeSelector in the Tenant manifest:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  nodeSelector:
    pool: renewable
    kubernetes.io/os: linux

The Capsule controller makes sure that any Namespace created in the Tenant has the annotation: scheduler.alpha.kubernetes.io/node-selector: pool=renewable. This annotation tells the scheduler of Kubernetes to assign the node selector pool=renewable to all the Pods deployed in the Tenant. The effect is that all the Pods deployed by Alice are placed only on the designated pool of nodes.

Multiple node selector labels can be defined as in the following snippet:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  nodeSelector:
    pool: renewable
    kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    hardware: gpu

Any attempt of Alice to change the selector on the Pods will result in an error from the PodNodeSelector Admission Controller plugin.

kubectl auth can-i edit ns -n solar-production
no

Dynamic resource allocation (DRA)

Dynamic Resource Allocation (DRA) is a Kubernetes capability that allows Pods to request and use shared resources, typically external devices such as hardware accelerators. See Kubernetes documentation for more information.

Bill can assign a set of dedicated DeviceClasses to tell the solar Tenant what devices they can request.

apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
  name: gpu.example.com
  labels:
    env: "production"
spec:
  selectors:
    - cel:
        expression: device.driver == 'gpu.example.com' && device.attributes['gpu.example.com'].type
          == 'gpu'
  extendedResourceName: example.com/gpu
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
    - name: alice
      kind: User
  deviceClasses:
    matchLabels:
      env: "production"

With the said Tenant specification, Alice can create a ResourceClaim or ResourceClaimTemplate resource if spec.devices.requests[].deviceClassName ( ResourceClaim) or spec.spec.devices.requests[].deviceClassName ( ResourceClaimTemplate) equals to:

  • Any DeviceClass, which has the label env with the value production

If any of the devices in the ResourceClaim or ResourceClaimTemplate spec is going to use a non-allowed DeviceClass, the entire request will be rejected by the Validation Webhook enforcing it.

Alice now can create a ResourceClaim using only an allowed DeviceClass:

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: example-resource-claim
  namespace: solar-production
spec:
  devices:
    requests:
      - name: gpu-request
        exactly:
          deviceClassName: 'gpu.example.com'

Connectivity

Services

ExternalIPs

Specifies the external IPs that can be used in Services with type ClusterIP. An empty list means no IPs are allowed, which is recommended in multi-tenant environments (can be misused for traffic hijacking):

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  serviceOptions:
    externalIPs:
      allowed: []

Deny labels and annotations

By default, capsule allows Tenant owners to add and modify any label or annotation on their Services.

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  serviceOptions:
    forbiddenAnnotations:
      denied:
          - loadbalancer.class.acme.net
      deniedRegex: .*.acme.net
    forbiddenLabels:
      denied:
          - loadbalancer.class.acme.net
      deniedRegex: .*.acme.net

Deny Service Types

Bill, the cluster admin, can prevent the creation of Services with specific Service types.

NodePort

When dealing with a shared multi-tenant scenario, multiple NodePort services can start becoming cumbersome to manage. The reason behind this could be related to the overlapping needs by the Tenant owners, since a NodePort is going to be open on all nodes and, when using hostNetwork=true, accessible to any Pod although any specific NetworkPolicy.

Bill, the cluster admin, can block the creation of Services with NodePort service type for a given Tenant

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  serviceOptions:
    allowedServices:
      nodePort: false

With the above configuration, any attempt of Alice to create a Service of type NodePort is denied by the Validation Webhook enforcing it. Default value is true.

ExternalName

Service with the type of ExternalName has been found subject to many security issues. To prevent TenantOwners to create services with the type of ExternalName, the cluster admin can prevent a tenant to create them:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  serviceOptions:
    allowedServices:
      externalName: false

With the above configuration, any attempt of Alice to create a Service of type externalName is denied by the Validation Webhook enforcing it. Default value is true.

LoadBalancer

Same as previously, the Service of type of LoadBalancer could be blocked for various reasons. To prevent TenantOwners to create these kinds of Services, the cluster admin can Tenant a tenant to create them:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  serviceOptions:
    allowedServices:
      loadBalancer: false

With the above configuration, any attempt of Alice to create a Service of type LoadBalancer is denied by the Validation Webhook enforcing it. Default value is true.

GatewayClasses

Note: This feature is offered only by API type GatewayClass in group gateway.networking.k8s.io version v1.

GatewayClass is cluster-scoped resource defined by the infrastructure provider. This resource represents a class of Gateways that can be instantiated. Read More

Bill can assign a set of dedicated GatewayClasses to the solar Tenant to force the applications in the solar Tenant to be published only by the assigned Gateway Controller:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  gatewayOptions:
    allowedClasses:
      matchLabels:
        env: "production"

With the said Tenant specification, Alice can create a Gateway resource if spec.gatewayClassName equals to:

  • Any GatewayClass which has the label env with the value production

If an Gateway is going to use a non-allowed GatewayClass, it will be rejected by the Validation Webhook enforcing it.

Alice can create an Gateway using only an allowed GatewayClass:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: example-gateway
  namespace: solar-production
spec:
  gatewayClassName: customer-class
  listeners:
  - name: http
    protocol: HTTP
    port: 80

Any attempt of Alice to use a non-valid GatewayClass, or missing it, is denied by the Validation Webhook enforcing it.

Assign GatewayClass as tenant default

Note: The Default GatewayClass must have a label which is allowed within the tenant. This behavior is only implemented this way for the GatewayClass default.

This feature allows specifying a custom default value on a Tenant basis. Currently there is no global default feature for a GatewayClass. Each Gateway must have a spec.gatewayClassName set.

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  gatewayOptions:
    allowedClasses:
      default: "tenant-default"
      matchLabels:
        env: "production"

Here’s how the Tenant default GatewayClass could look like:

kubectl apply -f - << EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: tenant-default
  labels:
    env: "production"
spec:
  controllerName: example.com/gateway-controller
EOF

If a Gateway has no value for spec.gatewayClassName, the tenant-default GatewayClass is automatically applied to the Gateway resource.

Ingresses

Assign Ingress Hostnames

Bill can control ingress hostnames in the solar Tenant to force the applications to be published only using the given hostname or set of hostnames:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  ingressOptions:
    allowedHostnames:
      allowed:
        - solar.acmecorp.com
      allowedRegex: ^.*acmecorp.com$

The Capsule controller assures that all Ingresses created in the Tenant can use only one of the valid hostnames. Alice can create an Ingress using any allowed hostname:

kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  namespace: solar-production
spec:
  ingressClassName: solar
  rules:
  - host: web.solar.acmecorp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: nginx
            port:
              number: 80
EOF

Any attempt of Alice to use a non-valid hostname is denied by the Validation Webhook enforcing it.

Control Hostname collision in Ingresses

In a multi-tenant environment, as more and more ingresses are defined, there is a chance of collision on the hostname leading to unpredictable behavior of the Ingress Controller. Bill, the cluster admin, can enforce hostname collision detection at different scope levels:

  • Cluster
  • Tenant
  • Namespace
  • Disabled (default)
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  - name: joe
    kind: User
  ingressOptions:
    hostnameCollisionScope: Tenant

When a TenantOwner creates an Ingress resource, Capsule will check the collision of hostname in the current ingress with all the hostnames already used, depending on the defined scope.

For example, Alice, one of the TenantOwners, creates an Ingress:

kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  namespace: solar-production
spec:
  rules:
  - host: web.solar.acmecorp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: nginx
            port:
              number: 80
EOF

Another user, Joe creates an Ingress having the same hostname:

kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  namespace: solar-development
spec:
  rules:
  - host: web.solar.acmecorp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: nginx
            port:
              number: 80
EOF

When a collision is detected at scope defined by spec.ingressOptions.hostnameCollisionScope, the creation of the Ingress resource will be rejected by the Validation Webhook enforcing it. When spec.ingressOptions.hostnameCollisionScope=Disabled (default), no collision detection is made at all.

Deny Wildcard Hostname in Ingresses

Bill, the cluster admin, can deny the use of wildcard hostname in Ingresses. Let’s assume that Acme Corp. uses the domain acme.com.

As a TenantOwner of solar, Alice creates an Ingress with the host like - host: "*.acme.com". That can lead problems for the water tenant because Alice can deliberately create ingress with host: water.acme.com.

To avoid this kind of problems, Bill can deny the use of wildcard hostnames in the following way:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
    - name: alice
      kind: User
  ingressOptions:
    allowWildcardHostnames: false

Doing this, Alice will not be able to use *.water.acme.com, being the tenant owner of solar and green only.

IngressClasses

An Ingress Controller is used in Kubernetes to publish services and applications outside of the cluster. An Ingress Controller can be provisioned to accept only Ingresses with a given IngressClass.

Bill can assign a set of dedicated IngressClass to the solar Tenant to force the applications in the solar tenant to be published only by the assigned Ingress Controller:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  ingressOptions:
    allowedClasses:
      matchLabels:
        env: "production"

With the said Tenant specification, Alice can create a Ingress resource if spec.ingressClassName or metadata.annotations."kubernetes.io/ingress.class" equals to:

  • Any IngressClass which has the label env with the value production

If an Ingress is going to use a non-allowed IngressClass, it will be rejected by the Validation Webhook enforcing it.

Alice can create an Ingress using only an allowed IngressClass:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  namespace: solar-production
spec:
  ingressClassName: legacy
  rules:
  - host: solar.acmecorp.com
    http:
      paths:
      - backend:
          service:
            name: nginx
            port:
              number: 80
        path: /
        pathType: ImplementationSpecific

Any attempt of Alice to use a non-valid Ingress Class, or missing it, is denied by the Validation Webhook enforcing it.

Assign Ingress Class as tenant default

Note: This feature is offered only by API type IngressClass in group networking.k8s.io version v1. However, resource Ingress is supported in networking.k8s.io/v1 and networking.k8s.io/v1beta1

This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (with the annotation metadata.annotations.ingressclass.kubernetes.io/is-default-class=true) that acts only at the cluster level. More information: Default IngressClass

It’s possible to assign each Tenant an IngressClass which will be used, if a class is not set on Ingress basis:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  ingressOptions:
    allowedClasses:
      default: "tenant-default"
      matchLabels:
        env: "production"

Here’s how the Tenant default IngressClass could look like:

kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  labels:
    env: "production"
    app.kubernetes.io/component: controller
  name: tenant-default
  annotations:
    ingressclass.kubernetes.io/is-default-class: "false"
spec:
  controller: k8s.io/customer-nginx
EOF

If an Ingress has no value for spec.ingressClassName or metadata.annotations."kubernetes.io/ingress.class", the tenant-default IngressClass is automatically applied to the Ingress resource.

NetworkPolicies

Kubernetes network policies control network traffic between Namespaces and between pods in the same Namespace. Bill, the cluster admin, can enforce network traffic isolation between different Tenants while leaving to Alice, the TenantOwner, the freedom to set isolation between Namespaces in the same Tenant or even between pods in the same Namespace.

To meet this requirement, Bill needs to define network policies that deny pods belonging to Alice’s Namespaces to access pods in Namespaces belonging to other Tenants, e.g. Bob’s Tenant water, or in system Namespaces, e.g. kube-system.

Keep in mind, that because of how the NetworkPolicies API works, the users can still add a policy which contradicts what the Tenant has set, resulting in users being able to circumvent the initial limitation set by the Tenant admin. Two options can be put in place to mitigate this potential privilege escalation: 1. providing a restricted role rather than the default admin one 2. using Calico’s GlobalNetworkPolicy, or Cilium’s CiliumClusterwideNetworkPolicy which are defined at the cluster-level, thus creating an order of packet filtering.

Also, Bill can make sure pods belonging to a Tenant Namespace cannot access other network infrastructures like cluster nodes, load balancers, and virtual machines running other services.

Bill can set network policies in the Tenant manifest, according to the requirements:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  networkPolicies:
    items:
    - policyTypes:
      - Ingress
      - Egress
      egress:
      - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 192.168.0.0/16
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              capsule.clastix.io/tenant: water
        - podSelector: {}
        - ipBlock:
            cidr: 192.168.0.0/16
      podSelector: {}

The Capsule controller, watching for Namespace creation, creates the Network Policies for each Namespace in the Tenant.

Alice has access to network policies:

kubectl -n solar-production get networkpolicies
NAME              POD-SELECTOR   AGE
capsule-solar-0   <none>         42h

Alice can create, patch, and delete additional network policies within her Namespaces:

kubectl -n solar-production auth can-i get networkpolicies
yes

kubectl -n solar-production auth can-i delete networkpolicies
yes

kubectl -n solar-production auth can-i patch networkpolicies
yes

For example, she can create:

kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  labels:
  name: production-network-policy
  namespace: solar-production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
EOF

Check all the network policies

kubectl -n solar-production get networkpolicies
NAME                          POD-SELECTOR   AGE
capsule-solar-0               <none>         42h
production-network-policy     <none>         3m

And delete the Namespace network policies:

kubectl -n solar-production delete networkpolicy production-network-policy

Any attempt of Alice to delete the Tenant network policy defined in the tenant manifest is denied by the Validation Webhook enforcing it. Any deletion by a cluster-administrator will cause the network policy to be recreated by the Capsule controller.

NetworkPolicy Distribution with TenantReplications

In the future Cluster-Administrators must distribute NetworkPolicies via TenantReplications. This is a more flexible and powerful way to distribute NetworkPolicies, as it allows to distribute any kind of resource. Here’s an example of how to distribute a CiliumNetworkPolicy to all the Namespaces of a Tenant:

apiVersion: capsule.clastix.io/v1beta2
kind: TenantResource
metadata:
  name: solar-limitranges
  namespace: solar-system
spec:
  resyncPeriod: 60s
  resources:
    - namespaceSelector:
        matchLabels:
          capsule.clastix.io/tenant: solar
      rawItems:
        - apiVersion: "cilium.io/v2"
          kind: CiliumNetworkPolicy
          metadata:
            name: "l3-rule"
          spec:
            endpointSelector:
              matchLabels:
                role: backend
            ingress:
            - fromEndpoints:
              - matchLabels:
                  role: frontend

Storage

PersistentVolumes

Any Tenant owner is able to create a PersistentVolumeClaim that, backed by a given StorageClass, will provide volumes for their applications.

In most cases, once a PersistentVolumeClaim is deleted, the bounded PersistentVolume will be recycled due.

However, in some scenarios, the StorageClass or the provisioned PersistentVolume itself could change the retention policy of the volume, keeping it available for recycling and being consumable for another Pod.

In such a scenario, Capsule enforces the Volume mount only to the Namespaces belonging to the Tenant on which it’s been consumed, by adding a label to the Volume as follows.

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: rancher.io/local-path
  creationTimestamp: "2022-12-22T09:54:46Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    capsule.clastix.io/tenant: solar
  name: pvc-1b3aa814-3b0c-4912-9bd9-112820da38fe
  resourceVersion: "2743059"
  uid: 9836ae3e-4adb-41d2-a416-0c45c2da41ff
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: melange
    namespace: caladan
    resourceVersion: "2743014"
    uid: 1b3aa814-3b0c-4912-9bd9-112820da38fe

Once the PeristentVolume become available again, it can be referenced by any PersistentVolumeClaim in the solar Tenant Namespace resources.

If another Tenant, like green, tries to use it, it will get an error:

$ kubectl describe pv pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
Name:              pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
Labels:            capsule.clastix.io/tenant=solar
Annotations:       pv.kubernetes.io/provisioned-by: rancher.io/local-path
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      standard
Status:            Available
...

$ cat /tmp/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: melange
  namespace:  green-energy
spec:
  storageClassName: standard
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  volumeName: pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d

$ kubectl apply -f /tmp/pvc.yaml
Error from server: error when creating "/tmp/pvc.yaml": admission webhook "pvc.capsule.clastix.io" denied the request: PeristentVolume pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d cannot be used by the following Tenant, preventing a cross-tenant mount

StorageClasses

Persistent storage infrastructure is provided to Tenants. Different types of storage requirements, with different levels of QoS, eg. SSD versus HDD, are available for different tenants according to the Tenant’s profile. To meet these different requirements, Bill, the cluster admin can provision different StorageClasses and assign them to the tenant:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  storageClasses:
    matchLabels:
      env: "production"

With the said Tenant specification, Alice can create a Persistent Volume Claims if spec.storageClassName equals to:

  • Any StorageClass which has the label env with the value production

Capsule assures that all PersistentVolumeClaims created by Alice will use only one of the valid storage classes. Assume the StorageClass ceph-rbd has the label env: production:

kubectl apply -f - << EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc
  namespace: solar-production
spec:
  storageClassName: ceph-rbd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 12Gi
EOF

If a PersistentVolumeClaim is going to use a non-allowed Storage Class, it will be rejected by the Validation Webhook enforcing it.

Assign Storage Class as tenant default

Note: This feature supports type StorageClass only on API version storage.k8s.io/v1

This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (.metadata.annotations.storageclass.kubernetes.io/is-default-class=true) that acts only at the cluster level. See the Default Storage Class section on Kubernetes documentation.

It’s possible to assign each tenant a StorageClass which will be used, if no value is set on PersistentVolumeClaim basis:

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  storageClasses:
    default: "tenant-default"
    matchLabels:
      env: "production"

Here’s how the new StorageClass could look like:

kubectl apply -f - << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: tenant-default
  labels:
    env: production
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF

If a PersistentVolumeClaim has no value for spec.storageClassName the tenant-default value will be used on new PersistentVolumeClaim resources.

Images

PullPolicy

Bill is a cluster admin providing a Container as a Service platform using shared nodes.

Alice, a TenantOwner, can start container images using private images: according to the Kubernetes architecture, the kubelet will download the layers on its cache.

Bob, an attacker, could try to schedule a Pod on the same node where Alice is running her Pods backed by private images: they could start new Pods using ImagePullPolicy=IfNotPresent and be able to start them, even without required authentication since the image is cached on the node.

To avoid this kind of attack, Bill, the cluster admin, can force Alice, the TenantOwner, to start her Pods using only the allowed values for ImagePullPolicy, enforcing the Kubelet to check the authorization first.

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  imagePullPolicies:
  - Always

Allowed values are: Always, IfNotPresent, Never. As defined by the Kubernetes API

Any attempt of Alice to use a disallowed imagePullPolicies value is denied by the Validation Webhook enforcing it.

Images Registries

Bill, the cluster admin, can set a strict policy on the applications running into Alice’s Tenant: he’d like to allow running just images hosted on a list of specific container registries.

The spec.containerRegistries addresses this task and can provide a combination with hard enforcement using a list of allowed values.

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: solar
spec:
  owners:
  - name: alice
    kind: User
  containerRegistries:
    allowed:
    - docker.io
    - quay.io
    allowedRegex: 'internal.registry.\\w.tld'

In case of Pod running non-FQCI (non fully qualified container image) containers, the container registry enforcement will disallow the execution. If you would like to run a bbusybox:latest container that is commonly hosted on Docker Hub, the TenantOwner has to specify its name explicitly, like docker.io/library/busybox:latest.

A Pod running internal.registry.foo.tld/capsule:latest as registry will be allowed, as well internal.registry.bar.tld since these are matching the regular expression.

A catch-all regex entry as .* allows every kind of registry, which would be the same result of unsetting .spec.containerRegistries at all.

Any attempt of Alice to use a not allowed .spec.containerRegistries value is denied by the Validation Webhook enforcing it.