DebuggingFAQ

FAQ

Question: Every Pod, Job, Deployment, and any other type that runs a container must have the resources attribute set, otherwise the deploy will fail with a similar error:

Pods "master-685f855ff-gg9sr" is forbidden: failed quota: default-w2qv7: must specify limits.cpu,limits.memory,requests.cpu,requests.memory:Deployment does not have minimum availability.

Answer: Pod, Job, Deployment is missing the following resource section with appropriate values:

resources:
  requests:
    cpu: 1
    memory: 512Mi
  limits:
    cpu: 1
    memory: 512Mi
 

Question: Deployment returns message simmilar to:

Error creating: pods "app-cronjob-28837272-7k594" is forbidden: exceeded quota: default-kcq58, requested: limits.memory=8Gi, used: limits.memory=41872Mi, limited: limits.memory=45000Mi

Answer: This error means the Pod you’re trying to create requests more memory (limits.memory=8Gi) than the available quota allows. Here, the namespace has a memory limit of 45,000 Mi (about 43.95 Gi)defined in the resource quota object (default-kcq58, see kubectl get resourcequota -n [your-namespace] default-kcq58), and your Pod creation would exceed this because the quota is nearly fully used (used: limits.memory=41872Mi).

Options for Resolving:

  • Reduce the Memory Request of the Pod:

    Adjust the limits.memory in the Pod spec to request less memory, ideally fitting within the remaining quota (about 3128 Mi, or around 3 Gi).

  • Request an Increase in Quota (if you have control over the cluster settings):

    Contact us at k8s@ics.muni.cz to increase the memory quota for the namespace or create explicit project namespace with higher quotas then the personal namespace..


Question: No GPU found, nvidia-smi returns command not found.

Answer: The deployment is missing request for GPU like:

resources:
  limits:
    nvidia.com/gpu: 1

or

resources:   
  limits:
    cerit.io/gpu-mem: 1

Question: Deployment returns message similar to:

CreateContainerConfigError (container has runAsNonRoot and image will run as root (pod: "mongo-db-846b7bfc7-qrlqt_namespace-ns(5d3538ab-7493-41ab-bd94-a4256c236f6f)", container: mongo-db-test))

Answer: The deployment is missing the securityContext section and the container image (in this case mongo) does not contain numeric USER. To fix this, just extend the deployment definition like this:

image: cerit.io/nextflowproxy:v1.2
imagePullPolicy: Always
securityContext:
  runAsUser: 1000
  runAsGroup: 1000
resources:
  limits:
    cpu: 4
    memory: 8192Mi

The runAsUser and runAsGroup lines are important.

See full security context settings here.


Question: Helm deployment returns error code 413

Answer: HTML code 413 means entity too larger. Helm stores whole deployment (including all the local files in the chart no matter if they have .yaml suffix) and values into a Secret object. Limit of the Secret size is about 1.5MB. Verify, if there is no big file in the whole chart.


Question: How can I fix the following type of error:

Error creating: pods "gmx-18dafd65-49d5-4263-8da2-e7d574b69930-nrtb9" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "gmx-18dafd65-49d5-4263-8da2-e7d574b69930" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "gmx-18dafd65-49d5-4263-8da2-e7d574b69930" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "gmx-18dafd65-49d5-4263-8da2-e7d574b69930" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "gmx-18dafd65-49d5-4263-8da2-e7d574b69930" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Answer: The error message is due to the Pod not meeting the requirements of the restricted PodSecurity standard in Kubernetes. To fix this, you need to add a securityContext to the Pod and container specification. Here’s how you can address each issue:

  • allowPrivilegeEscalation: Set this to false.
  • Capabilities: Drop all capabilities to meet the restricted policy.
  • runAsNonRoot: Ensure the Pod or container is set to run as a non-root user.
  • seccompProfile: Set the seccompProfile.type to "RuntimeDefault" or "Localhost".

Example Pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: gmx-pod
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: gmx-container
    image: your-image
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - "ALL"

Explanation of the Changes:

  • allowPrivilegeEscalation: false: Prevents the container from gaining additional privileges.
  • capabilities.drop: ["ALL"]: Drops all Linux capabilities, which is required under the restricted policy.
  • runAsNonRoot: true: Ensures the container doesn’t run as the root user.
  • seccompProfile.type: "RuntimeDefault": Enforces the default seccomp profile for additional security.

Applying the Updated Spec:

Replace your-image with the appropriate container image name, and apply the updated configuration. This should resolve the error and allow the Pod to pass the restricted PodSecurity admission.