Enforcement
Scheduling
LimitRanges
This feature will be deprecated in a future release of Capsule. Instead use TenantReplications
Bill, the cluster admin, can also set Limit Ranges for each Namespace in Alice’s Tenant by defining limits for pods and containers in the Tenant spec:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
...
limitRanges:
items:
- limits:
- type: Pod
min:
cpu: "50m"
memory: "5Mi"
max:
cpu: "1"
memory: "1Gi"
- limits:
- type: Container
defaultRequest:
cpu: "100m"
memory: "10Mi"
default:
cpu: "200m"
memory: "100Mi"
min:
cpu: "50m"
memory: "5Mi"
max:
cpu: "1"
memory: "1Gi"
- limits:
- type: PersistentVolumeClaim
min:
storage: "1Gi"
max:
storage: "10Gi"
Limits will be inherited by all the Namespaces created by Alice. In our case, when Alice creates the Namespace solar-production, Capsule creates the following:
apiVersion: v1
kind: LimitRange
metadata:
name: capsule-solar-0
namespace: solar-production
spec:
limits:
- max:
cpu: "1"
memory: 1Gi
min:
cpu: 50m
memory: 5Mi
type: Pod
---
apiVersion: v1
kind: LimitRange
metadata:
name: capsule-solar-1
namespace: solar-production
spec:
limits:
- default:
cpu: 200m
memory: 100Mi
defaultRequest:
cpu: 100m
memory: 10Mi
max:
cpu: "1"
memory: 1Gi
min:
cpu: 50m
memory: 5Mi
type: Container
---
apiVersion: v1
kind: LimitRange
metadata:
name: capsule-solar-2
namespace: solar-production
spec:
limits:
- max:
storage: 10Gi
min:
storage: 1Gi
type: PersistentVolumeClaim
Note: being the limit range specific of single resources, there is no aggregate to count.
Alice doesn’t have permission to change or delete the resources according to the assigned RBAC profile.
kubectl -n solar-production auth can-i patch resourcequota
no
kubectl -n solar-production auth can-i delete resourcequota
no
kubectl -n solar-production auth can-i patch limitranges
no
kubectl -n solar-production auth can-i delete limitranges
no
LimitRange Distribution with TenantReplications
In the future Cluster-Administrators must distribute LimitRanges via TenantReplications. This is a more flexible and powerful way to distribute LimitRanges, as it allows to distribute any kind of resource, not only LimitRanges. Here’s an example of how to distribute a LimitRange to all the Namespaces of a tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: TenantResource
metadata:
name: solar-limitranges
namespace: solar-system
spec:
resyncPeriod: 60s
resources:
- namespaceSelector:
matchLabels:
capsule.clastix.io/tenant: solar
rawItems:
- apiVersion: v1
kind: LimitRange
metadata:
name: cpu-resource-constraint
spec:
limits:
- default: # this section defines default limits
cpu: 500m
defaultRequest: # this section defines default requests
cpu: 500m
max: # max and min define the limit range
cpu: "1"
min:
cpu: 100m
type: Container
PriorityClasses
Pods can have priority. Priority indicates the importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. See Kubernetes documentation.
In a multi-tenant cluster, not all users can be trusted, as a tenant owner could create Pods at the highest possible priorities, causing other Pods to be evicted/not get scheduled.
To prevent misuses of Pod PriorityClass, Bill, the cluster admin, can enforce the allowed Pod PriorityClass at tenant level:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
priorityClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Pod resource if spec.priorityClassName equals to:
- Any
PriorityClasswhich has the labelenvwith the valueproduction
If a Pod is going to use a non-allowed PriorityClass, it will be rejected by the Validation Webhook enforcing it.
Assign Pod PriorityClass as tenant default
Note: This feature supports type
PriorityClassonly on API version scheduling.k8s.io/v1
This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (globalDefault=true) that acts only at the cluster level.
It’s possible to assign each Tenant a PriorityClass which will be used, if no PriorityClass is set on pod basis:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
priorityClasses:
default: "tenant-default"
matchLabels:
env: "production"
Let’s create a PriorityClass which is used as the default:
kubectl apply -f - << EOF
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: tenant-default
labels:
env: "production"
value: 1313
preemptionPolicy: Never
globalDefault: false
description: "This is the default PriorityClass for the solar-tenant"
EOF
Note the globalDefault: false which is important to avoid the PriorityClass to be used as the default for all the Tenants. If a Pod has no value for spec.priorityClassName, the default value for PriorityClass (tenant-default) will be used.
RuntimeClasses
Pods can be assigned different RuntimeClasses. With the assigned runtime you can control Container Runtime Interface (CRI) is used for each pod. See Kubernetes documentation for more information.
To prevent misuses of Pod RuntimeClasses, Bill, the cluster admin, can enforce the allowed PodRuntimeClasses at Tenant level:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
runtimeClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Pod resource if spec.runtimeClassName equals to:
- Any
RuntimeClasswhich has the labelenvwith the valueproduction
If a Pod is going to use a non-allowed RuntimeClass, it will be rejected by the Validation Webhook enforcing it.
Assign Runtime Class as tenant default
This feature allows specifying a custom default value on a Tenant basis- It’s possible to assign each tenant a Runtime which will be used, if no Runtime is set on pod basis:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
runtimeClasses:
default: "tenant-default"
matchLabels:
env: "production"
Let’s create a RuntimeClass which is used as the default:
kubectl apply -f - << EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: tenant-default
labels:
env: "production"
handler: myconfiguration
EOF
If a Pod has no value for spec.runtimeclass, the default value for RuntimeClass (tenant-default) will be used.
NodeSelector
Bill, the cluster admin, can dedicate a pool of worker nodes to the solar Tenant, to isolate the tenant applications from other noisy neighbors.
These nodes are labeled by Bill as pool=renewable
kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
...
worker06.acme.com Ready worker 8d v1.25.2 pool=renewable
worker07.acme.com Ready worker 8d v1.25.2 pool=renewable
worker08.acme.com Ready worker 8d v1.25.2 pool=renewable
PodNodeSelector
This approach requires
PodNodeSelectorAdmission Controller plugin to be active. If the plugin is not active, the pods will be scheduled to any node. If your distribution does not support this feature, you can use Expression Node Selectors.
The label pool=renewable is defined as .spec.nodeSelector in the Tenant manifest:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
nodeSelector:
pool: renewable
kubernetes.io/os: linux
The Capsule controller makes sure that any Namespace created in the Tenant has the annotation: scheduler.alpha.kubernetes.io/node-selector: pool=renewable. This annotation tells the scheduler of Kubernetes to assign the node selector pool=renewable to all the Pods deployed in the Tenant. The effect is that all the Pods deployed by Alice are placed only on the designated pool of nodes.
Multiple node selector labels can be defined as in the following snippet:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
nodeSelector:
pool: renewable
kubernetes.io/os: linux
kubernetes.io/arch: amd64
hardware: gpu
Any attempt of Alice to change the selector on the Pods will result in an error from the PodNodeSelector Admission Controller plugin.
kubectl auth can-i edit ns -n solar-production
no
Dynamic resource allocation (DRA)
Dynamic Resource Allocation (DRA) is a Kubernetes capability that allows Pods to request and use shared resources, typically external devices such as hardware accelerators. See Kubernetes documentation for more information.
Bill can assign a set of dedicated DeviceClasses to tell the solar Tenant what devices they can request.
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: gpu.example.com
labels:
env: "production"
spec:
selectors:
- cel:
expression: device.driver == 'gpu.example.com' && device.attributes['gpu.example.com'].type
== 'gpu'
extendedResourceName: example.com/gpu
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
deviceClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a ResourceClaim or ResourceClaimTemplate resource if spec.devices.requests[].deviceClassName ( ResourceClaim) or spec.spec.devices.requests[].deviceClassName ( ResourceClaimTemplate) equals to:
- Any DeviceClass, which has the label env with the value production
If any of the devices in the ResourceClaim or ResourceClaimTemplate spec is going to use a non-allowed DeviceClass, the entire request will be rejected by the Validation Webhook enforcing it.
Alice now can create a ResourceClaim using only an allowed DeviceClass:
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
name: example-resource-claim
namespace: solar-production
spec:
devices:
requests:
- name: gpu-request
exactly:
deviceClassName: 'gpu.example.com'
Connectivity
Services
ExternalIPs
Specifies the external IPs that can be used in Services with type ClusterIP. An empty list means no IPs are allowed, which is recommended in multi-tenant environments (can be misused for traffic hijacking):
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
externalIPs:
allowed: []
Deny labels and annotations
By default, capsule allows Tenant owners to add and modify any label or annotation on their Services.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
serviceOptions:
forbiddenAnnotations:
denied:
- loadbalancer.class.acme.net
deniedRegex: .*.acme.net
forbiddenLabels:
denied:
- loadbalancer.class.acme.net
deniedRegex: .*.acme.net
Deny Service Types
Bill, the cluster admin, can prevent the creation of Services with specific Service types.
NodePort
When dealing with a shared multi-tenant scenario, multiple NodePort services can start becoming cumbersome to manage. The reason behind this could be related to the overlapping needs by the Tenant owners, since a NodePort is going to be open on all nodes and, when using hostNetwork=true, accessible to any Pod although any specific NetworkPolicy.
Bill, the cluster admin, can block the creation of Services with NodePort service type for a given Tenant
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
allowedServices:
nodePort: false
With the above configuration, any attempt of Alice to create a Service of type NodePort is denied by the Validation Webhook enforcing it. Default value is true.
ExternalName
Service with the type of ExternalName has been found subject to many security issues. To prevent TenantOwners to create services with the type of ExternalName, the cluster admin can prevent a tenant to create them:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
allowedServices:
externalName: false
With the above configuration, any attempt of Alice to create a Service of type externalName is denied by the Validation Webhook enforcing it. Default value is true.
LoadBalancer
Same as previously, the Service of type of LoadBalancer could be blocked for various reasons. To prevent TenantOwners to create these kinds of Services, the cluster admin can Tenant a tenant to create them:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
allowedServices:
loadBalancer: false
With the above configuration, any attempt of Alice to create a Service of type LoadBalancer is denied by the Validation Webhook enforcing it. Default value is true.
GatewayClasses
Note: This feature is offered only by API type
GatewayClassin groupgateway.networking.k8s.ioversionv1.
GatewayClass is cluster-scoped resource defined by the infrastructure provider. This resource represents a class of Gateways that can be instantiated. Read More
Bill can assign a set of dedicated GatewayClasses to the solar Tenant to force the applications in the solar Tenant to be published only by the assigned Gateway Controller:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
gatewayOptions:
allowedClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Gateway resource if spec.gatewayClassName equals to:
- Any
GatewayClasswhich has the labelenvwith the valueproduction
If an Gateway is going to use a non-allowed GatewayClass, it will be rejected by the Validation Webhook enforcing it.
Alice can create an Gateway using only an allowed GatewayClass:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: example-gateway
namespace: solar-production
spec:
gatewayClassName: customer-class
listeners:
- name: http
protocol: HTTP
port: 80
Any attempt of Alice to use a non-valid GatewayClass, or missing it, is denied by the Validation Webhook enforcing it.
Assign GatewayClass as tenant default
Note: The Default
GatewayClassmust have a label which is allowed within the tenant. This behavior is only implemented this way for theGatewayClassdefault.
This feature allows specifying a custom default value on a Tenant basis. Currently there is no global default feature for a GatewayClass. Each Gateway must have a spec.gatewayClassName set.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
gatewayOptions:
allowedClasses:
default: "tenant-default"
matchLabels:
env: "production"
Here’s how the Tenant default GatewayClass could look like:
kubectl apply -f - << EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: tenant-default
labels:
env: "production"
spec:
controllerName: example.com/gateway-controller
EOF
If a Gateway has no value for spec.gatewayClassName, the tenant-default GatewayClass is automatically applied to the Gateway resource.
Ingresses
Assign Ingress Hostnames
Bill can control ingress hostnames in the solar Tenant to force the applications to be published only using the given hostname or set of hostnames:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowedHostnames:
allowed:
- solar.acmecorp.com
allowedRegex: ^.*acmecorp.com$
The Capsule controller assures that all Ingresses created in the Tenant can use only one of the valid hostnames. Alice can create an Ingress using any allowed hostname:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-production
spec:
ingressClassName: solar
rules:
- host: web.solar.acmecorp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
EOF
Any attempt of Alice to use a non-valid hostname is denied by the Validation Webhook enforcing it.
Control Hostname collision in Ingresses
In a multi-tenant environment, as more and more ingresses are defined, there is a chance of collision on the hostname leading to unpredictable behavior of the Ingress Controller. Bill, the cluster admin, can enforce hostname collision detection at different scope levels:
- Cluster
- Tenant
- Namespace
- Disabled (default)
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
- name: joe
kind: User
ingressOptions:
hostnameCollisionScope: Tenant
When a TenantOwner creates an Ingress resource, Capsule will check the collision of hostname in the current ingress with all the hostnames already used, depending on the defined scope.
For example, Alice, one of the TenantOwners, creates an Ingress:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-production
spec:
rules:
- host: web.solar.acmecorp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
EOF
Another user, Joe creates an Ingress having the same hostname:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-development
spec:
rules:
- host: web.solar.acmecorp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
EOF
When a collision is detected at scope defined by spec.ingressOptions.hostnameCollisionScope, the creation of the Ingress resource will be rejected by the Validation Webhook enforcing it. When spec.ingressOptions.hostnameCollisionScope=Disabled (default), no collision detection is made at all.
Deny Wildcard Hostname in Ingresses
Bill, the cluster admin, can deny the use of wildcard hostname in Ingresses. Let’s assume that Acme Corp. uses the domain acme.com.
As a TenantOwner of solar, Alice creates an Ingress with the host like - host: "*.acme.com". That can lead problems for the water tenant because Alice can deliberately create ingress with host: water.acme.com.
To avoid this kind of problems, Bill can deny the use of wildcard hostnames in the following way:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowWildcardHostnames: false
Doing this, Alice will not be able to use *.water.acme.com, being the tenant owner of solar and green only.
IngressClasses
An Ingress Controller is used in Kubernetes to publish services and applications outside of the cluster. An Ingress Controller can be provisioned to accept only Ingresses with a given IngressClass.
Bill can assign a set of dedicated IngressClass to the solar Tenant to force the applications in the solar tenant to be published only by the assigned Ingress Controller:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowedClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Ingress resource if spec.ingressClassName or metadata.annotations."kubernetes.io/ingress.class" equals to:
- Any
IngressClasswhich has the labelenvwith the valueproduction
If an Ingress is going to use a non-allowed IngressClass, it will be rejected by the Validation Webhook enforcing it.
Alice can create an Ingress using only an allowed IngressClass:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-production
spec:
ingressClassName: legacy
rules:
- host: solar.acmecorp.com
http:
paths:
- backend:
service:
name: nginx
port:
number: 80
path: /
pathType: ImplementationSpecific
Any attempt of Alice to use a non-valid Ingress Class, or missing it, is denied by the Validation Webhook enforcing it.
Assign Ingress Class as tenant default
Note: This feature is offered only by API type
IngressClassin group networking.k8s.io version v1. However, resourceIngressis supported innetworking.k8s.io/v1andnetworking.k8s.io/v1beta1
This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (with the annotation metadata.annotations.ingressclass.kubernetes.io/is-default-class=true) that acts only at the cluster level. More information: Default IngressClass
It’s possible to assign each Tenant an IngressClass which will be used, if a class is not set on Ingress basis:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowedClasses:
default: "tenant-default"
matchLabels:
env: "production"
Here’s how the Tenant default IngressClass could look like:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
labels:
env: "production"
app.kubernetes.io/component: controller
name: tenant-default
annotations:
ingressclass.kubernetes.io/is-default-class: "false"
spec:
controller: k8s.io/customer-nginx
EOF
If an Ingress has no value for spec.ingressClassName or metadata.annotations."kubernetes.io/ingress.class", the tenant-default IngressClass is automatically applied to the Ingress resource.
NetworkPolicies
Deprecated
This feature will be deprecated in a future release of Capsule. Instead use TenantReplications. This is also true if you would like other NetworkPolicy implementation like Cilium.
Kubernetes network policies control network traffic between Namespaces and between pods in the same Namespace. Bill, the cluster admin, can enforce network traffic isolation between different Tenants while leaving to Alice, the TenantOwner, the freedom to set isolation between Namespaces in the same Tenant or even between pods in the same Namespace.
To meet this requirement, Bill needs to define network policies that deny pods belonging to Alice’s Namespaces to access pods in Namespaces belonging to other Tenants, e.g. Bob’s Tenant water, or in system Namespaces, e.g. kube-system.
Keep in mind, that because of how the
NetworkPoliciesAPI works, the users can still add a policy which contradicts what theTenanthas set, resulting in users being able to circumvent the initial limitation set by theTenantadmin. Two options can be put in place to mitigate this potential privilege escalation: 1. providing a restricted role rather than the default admin one 2. using Calico’sGlobalNetworkPolicy, or Cilium’sCiliumClusterwideNetworkPolicywhich are defined at the cluster-level, thus creating an order of packet filtering.
Also, Bill can make sure pods belonging to a Tenant Namespace cannot access other network infrastructures like cluster nodes, load balancers, and virtual machines running other services.
Bill can set network policies in the Tenant manifest, according to the requirements:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
networkPolicies:
items:
- policyTypes:
- Ingress
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 192.168.0.0/16
ingress:
- from:
- namespaceSelector:
matchLabels:
capsule.clastix.io/tenant: water
- podSelector: {}
- ipBlock:
cidr: 192.168.0.0/16
podSelector: {}
The Capsule controller, watching for Namespace creation, creates the Network Policies for each Namespace in the Tenant.
Alice has access to network policies:
kubectl -n solar-production get networkpolicies
NAME POD-SELECTOR AGE
capsule-solar-0 <none> 42h
Alice can create, patch, and delete additional network policies within her Namespaces:
kubectl -n solar-production auth can-i get networkpolicies
yes
kubectl -n solar-production auth can-i delete networkpolicies
yes
kubectl -n solar-production auth can-i patch networkpolicies
yes
For example, she can create:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
name: production-network-policy
namespace: solar-production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
Check all the network policies
kubectl -n solar-production get networkpolicies
NAME POD-SELECTOR AGE
capsule-solar-0 <none> 42h
production-network-policy <none> 3m
And delete the Namespace network policies:
kubectl -n solar-production delete networkpolicy production-network-policy
Any attempt of Alice to delete the Tenant network policy defined in the tenant manifest is denied by the Validation Webhook enforcing it. Any deletion by a cluster-administrator will cause the network policy to be recreated by the Capsule controller.
NetworkPolicy Distribution with TenantReplications
In the future Cluster-Administrators must distribute NetworkPolicies via TenantReplications. This is a more flexible and powerful way to distribute NetworkPolicies, as it allows to distribute any kind of resource. Here’s an example of how to distribute a CiliumNetworkPolicy to all the Namespaces of a Tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: TenantResource
metadata:
name: solar-limitranges
namespace: solar-system
spec:
resyncPeriod: 60s
resources:
- namespaceSelector:
matchLabels:
capsule.clastix.io/tenant: solar
rawItems:
- apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "l3-rule"
spec:
endpointSelector:
matchLabels:
role: backend
ingress:
- fromEndpoints:
- matchLabels:
role: frontend
Storage
PersistentVolumes
Any Tenant owner is able to create a PersistentVolumeClaim that, backed by a given StorageClass, will provide volumes for their applications.
In most cases, once a PersistentVolumeClaim is deleted, the bounded PersistentVolume will be recycled due.
However, in some scenarios, the StorageClass or the provisioned PersistentVolume itself could change the retention policy of the volume, keeping it available for recycling and being consumable for another Pod.
In such a scenario, Capsule enforces the Volume mount only to the Namespaces belonging to the Tenant on which it’s been consumed, by adding a label to the Volume as follows.
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: rancher.io/local-path
creationTimestamp: "2022-12-22T09:54:46Z"
finalizers:
- kubernetes.io/pv-protection
labels:
capsule.clastix.io/tenant: solar
name: pvc-1b3aa814-3b0c-4912-9bd9-112820da38fe
resourceVersion: "2743059"
uid: 9836ae3e-4adb-41d2-a416-0c45c2da41ff
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: melange
namespace: caladan
resourceVersion: "2743014"
uid: 1b3aa814-3b0c-4912-9bd9-112820da38fe
Once the PeristentVolume become available again, it can be referenced by any PersistentVolumeClaim in the solar Tenant Namespace resources.
If another Tenant, like green, tries to use it, it will get an error:
$ kubectl describe pv pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
Name: pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
Labels: capsule.clastix.io/tenant=solar
Annotations: pv.kubernetes.io/provisioned-by: rancher.io/local-path
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Available
...
$ cat /tmp/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: melange
namespace: green-energy
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
volumeName: pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
$ kubectl apply -f /tmp/pvc.yaml
Error from server: error when creating "/tmp/pvc.yaml": admission webhook "pvc.capsule.clastix.io" denied the request: PeristentVolume pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d cannot be used by the following Tenant, preventing a cross-tenant mount
StorageClasses
Persistent storage infrastructure is provided to Tenants. Different types of storage requirements, with different levels of QoS, eg. SSD versus HDD, are available for different tenants according to the Tenant’s profile. To meet these different requirements, Bill, the cluster admin can provision different StorageClasses and assign them to the tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
storageClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Persistent Volume Claims if spec.storageClassName equals to:
- Any
StorageClasswhich has the labelenvwith the valueproduction
Capsule assures that all PersistentVolumeClaims created by Alice will use only one of the valid storage classes. Assume the StorageClass ceph-rbd has the label env: production:
kubectl apply -f - << EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc
namespace: solar-production
spec:
storageClassName: ceph-rbd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 12Gi
EOF
If a PersistentVolumeClaim is going to use a non-allowed Storage Class, it will be rejected by the Validation Webhook enforcing it.
Assign Storage Class as tenant default
Note: This feature supports type
StorageClassonly on API versionstorage.k8s.io/v1
This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (.metadata.annotations.storageclass.kubernetes.io/is-default-class=true) that acts only at the cluster level. See the Default Storage Class section on Kubernetes documentation.
It’s possible to assign each tenant a StorageClass which will be used, if no value is set on PersistentVolumeClaim basis:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
storageClasses:
default: "tenant-default"
matchLabels:
env: "production"
Here’s how the new StorageClass could look like:
kubectl apply -f - << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: tenant-default
labels:
env: production
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF
If a PersistentVolumeClaim has no value for spec.storageClassName the tenant-default value will be used on new PersistentVolumeClaim resources.
Images
PullPolicy
Bill is a cluster admin providing a Container as a Service platform using shared nodes.
Alice, a TenantOwner, can start container images using private images: according to the Kubernetes architecture, the kubelet will download the layers on its cache.
Bob, an attacker, could try to schedule a Pod on the same node where Alice is running her Pods backed by private images: they could start new Pods using ImagePullPolicy=IfNotPresent and be able to start them, even without required authentication since the image is cached on the node.
To avoid this kind of attack, Bill, the cluster admin, can force Alice, the TenantOwner, to start her Pods using only the allowed values for ImagePullPolicy, enforcing the Kubelet to check the authorization first.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
imagePullPolicies:
- Always
Allowed values are: Always, IfNotPresent, Never. As defined by the Kubernetes API
Any attempt of Alice to use a disallowed imagePullPolicies value is denied by the Validation Webhook enforcing it.
Images Registries
Bill, the cluster admin, can set a strict policy on the applications running into Alice’s Tenant: he’d like to allow running just images hosted on a list of specific container registries.
The spec.containerRegistries addresses this task and can provide a combination with hard enforcement using a list of allowed values.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
containerRegistries:
allowed:
- docker.io
- quay.io
allowedRegex: 'internal.registry.\\w.tld'
In case of
Podrunning non-FQCI (non fully qualified container image) containers, the container registry enforcement will disallow the execution. If you would like to run a bbusybox:latestcontainer that is commonly hosted on Docker Hub, theTenantOwnerhas to specify its name explicitly, likedocker.io/library/busybox:latest.
A Pod running internal.registry.foo.tld/capsule:latest as registry will be allowed, as well internal.registry.bar.tld since these are matching the regular expression.
A catch-all regex entry as
.*allows every kind of registry, which would be the same result of unsetting.spec.containerRegistriesat all.
Any attempt of Alice to use a not allowed .spec.containerRegistries value is denied by the Validation Webhook enforcing it.