Running Nextflow Pipelines in Kubernetes

The following guide explains how to run Nextflow pipelines on the CERIT-SC Kubernetes cluster.

Nextflow Overview

To run a Nextflow pipeline using this guide, you will start it from your local machine or any other accessible machine. However, the pipeline itself will run on the CERIT-SC Kubernetes cluster, not on the machine you start it from. That machine only serves to launch the pipeline and does not need to stay online once the pipeline is running.

Running pipelines on a Kubernetes cluster is not a general feature of Nextflow by default. It works this way only when you follow this specific guide and use the provided tools and configuration for the CERIT-SC Kubernetes environment.

Alternatively, you can also start the pipeline from an already running Pod within the Kubernetes cluster.

Ask any questions related to using Nextflow pressing the button below. It is connected to the official documentation, GitHub issues, and GitHub discussions to help provide accurate and relevant answers.

This chat is provided by a third-party service. We do not guarantee its availability, functionality, or data privacy. Cookies are used to support chat features — by continuing to use the chat, you consent to their use. Use at your own discretion.

Starting Nextflow

Recommended Method: Using `nextflow-go`

The previously used method involved installing the official Nextflow (a Java application) and using the kuberun driver. However, this method is now deprecated.

We recommend using our custom-built binary, nextflow-go, which is available at https://github.com/CERIT-SC/nextflow-go under the Releases section.

This binary is self-contained and works on Ubuntu Linux 22.04 or later, as well as on similar Debian/RedHat-based systems.

Why `nextflow-go`?

There are several advantages of using nextflow-go over the official kuberun method:

The kuberun driver requires direct access to shared storage, which is not possible from most local machines.
nextflow-go allows you to start a pipeline from any machine, even without access to shared storage.
The kuberun driver requires explicit support in each pipeline, meaning some pipelines may not work at all unless kuberun is specifically adapted.
kuberun has compatibility issues with some pipeline configurations, particularly those using functions inside the nextflow.config file, which may lead to errors or unexpected behavior.
nextflow-go removes these limitations and provides a more robust and flexible way to launch pipelines in the CERIT-SC Kubernetes environment.

Installation

Download and make the binary executable:

wget https://github.com/CERIT-SC/nextflow-go/releases/download/v0.1/nextflow-go-linux-amd64 -O nextflow-go
chmod a+x nextflow-go

Running the Binary

Simply run the binary:

./nextflow-go

Running a Simple `hello` Pipeline

To run a simple example pipeline, such as the built-in hello pipeline, you start it using the nextflow-go command along with a minimal configuration.

It is assumed that you are either:

running the command from within the CERIT-SC Kubernetes cluster, or
have already configured access to the cluster using a valid Kubernetes configuration file (kubeconfig).

Create the nextflow.config File

Save the following configuration file as nextflow.config in the same directory where you will run nextflow-go:

nextflow.config

k8s {
   namespace        = '[your-namespace]'
   runAsUser        = 1000
   storageClaimName = '[your-pvc]'
   storageMountPath = '/mnt'
   launchDir        = '${k8s.storageMountPath}'
   workDir          = '${k8s.storageMountPath}/tmp'
}

process {
   executor = 'k8s'
}

Replace the placeholders with the appropriate values for your Kubernetes environment:

[your-namespace] — your Kubernetes Namespace, which determines where the workflow will run. You can find your namespace in the Rancher UI or follow the instructions here to look it up.
[your-pvc] — the name of your PersistentVolumeClaim (PVC), which defines the shared storage used by the workflow. You can view existing PVCs in the Rancher UI or refer to the guide here to create or identify one.

These values are essential for ensuring that both the workflow controller and task workers can access the same storage and run in the correct Kubernetes environment.

Run the Pipeline

Once the configuration file is ready, launch the hello pipeline using:

./nextflow-go run hello

This command starts the standard hello pipeline, which is a simple built-in test workflow available from a public GitHub repository. You do not need to download the pipeline manually—Nextflow will fetch it automatically.

How It Works

When the pipeline is launched, it runs in two main components:

Workflow Controller – responsible for managing the execution of the pipeline.
Workers – individual jobs that perform the specific tasks defined in the pipeline.

Both the controller and the workers must have access to a shared storage volume. This is defined by your [your-pvc], and it is mounted to the Kubernetes pods at /mnt (as specified by storageMountPath in the config). This shared storage is essential for data exchange between tasks during execution.

Make sure you have a valid and accessible PVC, and that your Kubernetes namespace is correctly set up with permissions to use it.

Expected Output

Expected output of the hello run:

Running Nextflow K8s Job...
computeResourceType not defined in configuration, defaulting to Job
--- Output from pod himalayan-fact-bx8d5 ---
N E X T F L O W  ~  version 25.04.4
Pulling nextflow-io/hello ...
 downloaded from https://github.com/nextflow-io/hello.git
Launching `https://github.com/nextflow-io/hello` [elegant_panini] DSL2 - revision: 2be824e69a [master]
[81/765d22] Submitted process > sayHello (2)
[9f/c26cb7] Submitted process > sayHello (1)
[c7/1e9f5c] Submitted process > sayHello (3)
[9a/11f0e3] Submitted process > sayHello (4)
Ciao world!

Bonjour world!

Hello world!

Hola world!

Kubernetes Job 'himalayan-fact' created successfully.

This output confirms that the pipeline ran successfully on the Kubernetes cluster. You can see that the workflow was pulled from the GitHub repository (nextflow-io/hello) and that multiple processes (sayHello) were submitted and executed as Kubernetes jobs.

Each sayHello process returns a greeting in a different language, indicating that multiple tasks ran independently and in parallel, as expected in a Nextflow workflow.

The final message, Kubernetes Job 'himalayan-fact' created successfully., indicates that the main workflow controller job completed its setup and coordination of worker pods. You can use this kind of output to verify that the basic setup (config, namespace, PVC, and connectivity) is working correctly.

Advanced Nextflow Configuration for Kubernetes

When running advanced pipelines in Kubernetes, more fine-grained control over the execution environment is often needed.

In this setup, the workflow controller is executed as a Kubernetes Pod. It launches additional worker pods based on the process definitions in your pipeline. The controller pod is typically given a randomly generated human-readable name (e.g., naughty-williams), while worker pods have hashed names (e.g., nf-81dae79db8e5e2c7a7c3ad5f6c7d59c6).

Configuration 🧩

Here is an extended example of a nextflow.config file with more advanced settings:

nextflow.config

k8s {
   namespace           = '[your-namespace]'
   runAsUser           = 1000
   computeResourceType = 'Job' // explicitly use Jobs instead of Pods
   cpuLimits           = true  // needed for correct cpu resource settings
   storageClaimName    = '[your-pvc]'
   storageMountPath    = '/mnt'
   launchDir           = '/mnt/path/to/launch'
   workDir             = '/mnt/path/to/work'
}

executor {
  queueSize = 30 // Maximum number of tasks running in parallel
}

process {
   executor = 'k8s'
}

Use a unique launchDir and workDir for each pipeline run if running multiple workflows in parallel to avoid file conflicts.
The shared storage defined by storageClaimName (PVC) must be writable and accessible by all pods.

Customization Options

Nextflow allows further customization at both the process and pod level:

Mount additional volumes: such as other PVCs or Kubernetes secrets.
Set resource limits: including CPU and memory requirements per process.
Attach metadata: like Kubernetes labels or annotations to pods.
Use selective configuration: through labels (withLabel) or process names (withName).

For more information:

See Nextflow process documentation for defining process-level options.
See the section on Kubernetes pod customization for pod-level tweaks.

Priority of Configuration

When applying configuration settings, Nextflow uses the following order of precedence (from lowest to highest):

Generic process configuration in nextflow.config
Process-specific directives in the workflow script
withLabel selector configuration
withName selector configuration

This means that more specific settings (e.g., using withName) will override general defaults.

Run ⏱

To start a pipeline, use the run subcommand of nextflow-go with optional flags to customize the execution environment:

-head-image 'cerit.io/nextflow/nextflow:25.04.4' -head-memory 4096Mi -head-cpus 1

These options are not mandatory—they simply override the default settings. The default container image used by nextflow-go (as of version v0.3) is:

cerit.io/nextflow/nextflow:25.04.4

The -head-memory and -head-cpus flags define the memory and CPU resources allocated to the workflow controller pod. For pipelines that generate thousands of tasks, consider increasing these values to ensure stability and performance.

Example

To run the basic hello pipeline:

nextflow-go run hello -head-image 'cerit.io/nextflow/nextflow:25.04.4' -head-memory 4096Mi -head-cpus 1 -v PVC:/mnt

This mounts the specified PersistentVolumeClaim (PVC) to the controller pod at /mnt.

Running DSL 1 Pipelines

If you are using an older DSL 1 Nextflow pipeline, use an appropriate image like:

cerit.io/nextflow/nextflow:22.10.8

Example command:

nextflow-go run hello -head-image 'cerit.io/nextflow/nextflow:22.10.8' -head-memory 4096Mi -head-cpus 1 -v PVC:/mnt

DSL 1 pipelines typically require a more detailed configuration. Here’s an example:

nextflow.config

k8s {
   namespace           = '[your-namespace]'
   runAsUser           = 1000
   computeResourceType = 'Job'
   cpuLimits           = true
   storageClaimName    = '[your-pvc]'
   storageMountPath    = '/mnt'
   launchDir           = '/mnt/data1'
   workDir             = '/mnt/data1/tmp'
}

executor {
  queueSize = 30
}

process {
   executor = 'k8s'
   memory   = '500M' // Default for all workers unless overridden
   pod = [
      [securityContext:
          [fsGroupChangePolicy:'OnRootMismatch', 
           runAsUser:1000, 
           runAsGroup:1, 
           fsGroup:1, 
           seccompProfile:
           [type:'RuntimeDefault']]], 
      [automountServiceAccountToken:false]]

   withLabel:VEP {
       memory = { check_resource(14.GB * task.attempt) } // Applied only to processes with label VEP
   }
}

process mdrun {
  cpus = 20 // Applied only to 'mdrun' process if not set in script
}

This example demonstrates how to:

Apply default settings to all processes (e.g., memory, Pod securityContext).
Customize resources per label (withLabel:VEP).
Target individual processes by name (process mdrun).

Such flexibility is especially important when working with older pipelines or those that require fine-tuned resource control.

Debug 🐞

We recommend watching your namespace in Rancher GUI or on command line when you submit a pipeline. Not all problems are propagated to terminal, especially error related to Kubernetes such as quota exceeded. You can open Jobs tab in Rancher GUI and watch out for jobs that are In progress for too long or in Error state. Useful commands might include

kubectl get jobs -n [namespace] // GET ALL JOBS IN NAMESPACE
kubectl describe job [job_name] -n [namespace] // FIND OUT MORE ABOUT JOB AND WHAT IS HAPPENING WITH IT

kubectl get pods -n [namespace] // GET ALL PODS IN NAMESPACE
kubectl describe pod [pod_name] -n [namespace] // FIND OUT MORE ABOUT POD AND WHAT IS HAPPENING WITH IT
kubectl logs [pod_name] -n [namespace] // GET POD LOGS (IF AVAILABLE)

If job is waiting for start for too long, try describing a job. It might reveal quota exceeded in your namespace:

  Warning  FailedCreate  18m    job-controller  Error creating: pods "nf-5dd9dc33d33c729b5cd57c818bafba86-lk4tl" is forbidden: exceeded quota: default-kbz9v, requested: requests.cpu=8, used: requests.cpu=16, limited: requests.cpu=20

If this happens to you, consider lowering problematic resource requests of workflow controller or processes that might demand a little too much. If you don’t know what to do, contact us and we will come with solution together.

Caveats

If pipeline runs for a long time (not the case of the hello pipeline), the nextflow-go command ends with connection terminated. This is normal and it does not mean that pipeline is not running anymore. It stops logging to your terminal only. You can still find logs of the workflow controller in Rancher GUI.
Running pipeline can be terminated from Rancher GUI, hitting ctrl-c does not terminate the pipeline.
Pipeline debug log can be found on the PVC in launchDir/.nextflow.log. Consecutive runs rotate the logs, so that they are not overwritten.
If pipeline fails, you can try to resume the pipeline with -resume command line option, it creates a new run but it tries to skip already finished tasks. See details.
All runs (success or failed) will keep workflow controller pod visible in Rancher GUI, failed workers are also kept in Rancher GUI. You can delete them from GUI as needed.
For some workers, log are not available in Rancher GUI, but the logs can be watched using the command:

kubectl logs POD -n NAMESPACE

where POD is the name of the worker (e.g., nf-81dae79db8e5e2c7a7c3ad5f6c7d59c6) and NAMESPACE is used namespace.

nf-core/sarek Pipeline

nf-core/sarek is a comprehensive analysis pipeline for detecting germline or somatic variants from whole genome sequencing (WGS) or targeted sequencing data. It includes steps for pre-processing, variant calling, and annotation.

Kubernetes Run

To run sarek on Kubernetes, you need to provide a custom configuration to ensure the pipeline executes correctly and reliably:

Use a specific nextflow.config file that increases memory allocation for the VEP process, which is part of the pipeline. Without this adjustment, the VEP step is likely to be killed due to insufficient memory.
Use a patched custom.config, as the public GitHub version of the sarek pipeline contains a known bug that causes output statistics to be written to incorrect files.

Additionally, the sarek pipeline uses functions inside its configuration. These are not supported by the standard kuberun executor in Nextflow, but are supported by the nextflow-go binary. This means that if you’re following this guide and using nextflow-go, no workaround is necessary.

Once your input data is available on the shared PVC, you can launch the pipeline with the following command:

nextflow-go run nf-core/sarek -v PVC:/mnt --input /mnt/test.tsv --genome GRCh38 --tools HaplotypeCaller,VEP,Manta

Here, replace PVC with your actual PersistentVolumeClaim name, and make sure test.tsv is present on the PVC. This TSV should contain your input metadata.

The nf-core/sarek pipeline version 3.x supports DSL 2 and is compatible with Nextflow 25.04.4, making it suitable for nextflow-go. If you need to run an older version of Sarek that uses DSL 1, ensure the configuration matches the DSL 1 setup described earlier.

Caveats

Download igenome locally: It’s highly recommended to download the igenome data from Amazon S3 to your PVC in advance. This significantly improves performance when using the -resume option after a failed run and avoids issues caused by Amazon S3 throttling or network interruptions. After downloading, specify the path with:
```
--igenomes_base /mnt/igenome
```
Expected error at end of run: The pipeline may end with a stacktrace like No signature of method: java.lang.String.toBytes(). This occurs when no email is specified for notifications. It is harmless and can be safely ignored.
Work directory cleanup: The sarek pipeline does not automatically delete its workDir. You are responsible for manually cleaning it up after the run.
Manual resume with different input: You can resume a failed run using a modified --input specification. Refer to the official documentation for guidance.

vib-singlecell-nf/vsn-pipelines pipeline

vsn-pipelines contain multiple workflows for analyzing single cell transcriptomics data, and depends on a number of tools.

Kubernetes Run

You need to download pipeline specific nextflow.config and put it into the current directory where you start Nextflow from. This pipeline uses -entry parameter to specify entry point of workflow. Unless this issue #2397 is resolved, patched version of Nextflow is needed. To deal with this bug, you need Nextflow version 22.06.1-edge or later.

On the PVC you need to prepare data into directories specified in the nextflow.config see all occurrences of /mnt/data1 in the config and change them accordingly.

Consult documentation for further config options.

You can run the pipeline with the following command:

nextflow-go -C nextflow.config kuberun vib-singlecell-nf/vsn-pipelines -head-image 'cerit.io/nextflow/nextflow:24.04.4' -head-cpus 1 -head-memory 4096Mi -v PVC:/mnt -entry scenic

where PVC is the mentioned PVC, scenic is pipeline entry point, and nextflow.config is the downloaded nextflow.config.

Caveats

For parallel run, you need to set maxForks in the nextflow.config together with params.sc.scenic.numRuns parameter. Consult documentation.
NUMBA_CACHE_DIR variable pointing to /tmp or other writable directory is requirement otherwise execution fails on permission denied. It tries to update readonly parts of running container.

Using GPUs

Using GPUs in containers is straightforward, just add:

  accelerator = 1

into process section of nextflow.config, e.g.:

nextflow.config

process {
   executor = 'k8s'

   withLabel:VEP {
      accelerator = 1
   }
}

Run from Jupyter Notebook

The nextflow-go binary can be used directly from within a Jupyter Notebook environment, provided it’s placed in the home directory of the Jupyter user. This setup allows users to launch and monitor Nextflow pipelines from notebooks running inside the Kubernetes cluster.

Setup Instructions

First, open a terminal inside the Jupyter Notebook environment and determine your home PVC name:

echo $JUPYTERHUB_PVC_HOME

Example output:

jovyan@jupyter-xhejtman--ai---11fb682b:~$ echo $JUPYTERHUB_PVC_HOME
xhejtman-home-ai

Next, get your service account (SA) name:

echo sa-$JUPYTERHUB_USER

Example output:

jovyan@jupyter-xhejtman--ai---11fb682b:~$ echo sa-$JUPYTERHUB_USER
sa-xhejtman

Now create the following nextflow.config file in the same directory where your nextflow-go binary is located:

nextflow.config

k8s {
   storageClaimName      = '[PVC]'
   storageMountPath      = '/home/jovyan'
   serviceAccount        = '[SA]'
   launchDir             = '/home/jovyan'
   workDir               = '/home/jovyan/tmp'
   computeResourceType   = 'Job'
   runAsUser             = 1000
}

executor {
  queueSize = 10
}

process {
   executor = 'k8s'
}

Replace the [PVC] placeholder with the output of $JUPYTERHUB_PVC_HOME (e.g., xhejtman-home-ai), and replace [SA] with the value of your service account (e.g., sa-xhejtman).

Once this file is saved, you can run any supported Nextflow pipeline directly from your notebook environment using the same commands as in the general instructions, such as:

./nextflow-go run hello

This approach enables seamless experimentation and workflow execution from interactive Jupyter environments, fully utilizing the underlying Kubernetes cluster.

Common Problems

Jobs Not Starting

If your jobs are not starting, you can troubleshoot by running:

kubectl get events -n [your-namespace]

This command may reveal errors like the following:

Warning  FailedCreate  job/nf-...  Error creating: pods "nf-..." is forbidden: violates PodSecurity "restricted:latest": 
runAsNonRoot != true (pod or container "nf-..." must set securityContext.runAsNonRoot=true), 
seccompProfile (pod or container "nf-..." must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

These errors are typically related to Pod Security Policies being enforced on your cluster.

Security Context Errors

Security-related pod creation errors like:

Error creating: pods "nf-..." is forbidden: violates PodSecurity "restricted:latest": runAsNonRoot != true

are usually due to incorrect securityContext settings in your nextflow.config.

Recommendation

Do not set any custom securityContext values in your configuration unless you have a specific reason. The nextflow-go runner and official Nextflow images automatically handle the correct security settings.

Java Certificate Error

If you encounter the following error:

ERROR ~ extension (5) should not be presented in certificate_request

It is caused by using an outdated Java version in your Nextflow environment.

Solution

For DSL 2, simply omit the --head-image option when running Nextflow. The nextflow-go runner will automatically use the latest compatible image.
For DSL 1, explicitly use the following image:

cerit.io/nextflow/nextflow:22.10.8

Running Nextflow Pipelines in Kubernetes

On this page