einfra logoDocumentation
AI as a Service (AIaaS)

OpenAI API

API Introduction

This documentation provides detailed instructions for generating and using API keys to access locally running Large Language Models (LLMs) on the e-INFRA CZ infrastructure, specifically through the Open WebUI interface https://chat.ai.e-infra.cz. The guide is designed for researchers and scientists who want to integrate these models into their applications, scripts, or AI workflows via API.

To access the Open WebUI service and generate an API key, you must meet the following prerequisites:

  • A valid MetaCentrum account (for Czech research institutions)
  • Or an active Masaryk University account (if affiliated).

For a comprehensive description of available models, see the Chat AI documentation.

Creating an API Key

API keys serve as authentication tokens to securely access Open WebUI’s API endpoint. Follow these steps to generate and use your API key:

Step-by-Step Instructions

  1. Go to the Settings section of the Open WebUI interface https://chat.ai.e-infra.cz.
  2. Navigate to the Account (Účet).
  3. Click on API keys (display).
  4. Ignore JWT token, select API key, and either generate a new one or display the existing one.
  5. Copy the generated API key and store it securely.
  6. Use this key in API requests to authenticate and access Open WebUI services.
  7. The base endpoint for the Open-WebUI API is: https://llm.ai.e-infra.cz/v1/. This endpoint follows the OpenAI API specification, enabling compatibility with many existing LLM frameworks and applications.

Using the API Key

For detailed API specifications, refer to the API reference. Note that not all endpoints are supported.

For client access and development, we recommend using either the LiteLLM SDK or the OpenAI Client Libraries, which provide well-maintained, production-ready integrations.

Listing Available Models

Before querying a model, ensure you use the correct model name. To retrieve a list of all available models on the e-INFRA CZ infrastructure, run the following using curl and jq commands. Replace ${E_INFRA_API_TOKEN} with your actual token.

curl -H "Authorization: Bearer ${E_INFRA_API_TOKEN}" https://llm.ai.e-infra.cz/v1/models | jq .data[].id

Expected Output (example):

"llama3.3:latest"
"llama3.3:70b-instruct-fp16"
"deepseek-r1:32b-qwen-distill-fp16"
"qwen2.5-coder:32b-instruct-q8_0"
"aya-expanse:latest"

This list reflects the model identifiers (id) that can be queried via the API. The identifiers follow a naming convention, typically including:

  • The model name (e.g., llama3.3, qwen2.5-coder).
  • A quantization tag (e.g., fp16, q8_0) that indicates how the model is optimized for inference (important for performance and resource consumption).
  • A variant or revision (e.g., :latest, :70b-instruct-fp16).

Model Aliases

To simplify referencing LLM models, we provide category aliases for users who prefer not to update specific model designations in their tools. Please note that calling specific models by their exact names continues to work as before.

alias namemodel
minigpt-oss-120b
coderqwen3.5
agenticqwen3-coder-next
thinkerkimi-k2.5

Currently available models

Example API Request

Below is an example of how to use the API key to query the LLaMA 3.3 model (llama3.3:latest) with a chat completions request:

curl https://llm.ai.e-infra.cz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${E_INFRA_API_TOKEN}" \
  -d '{
    "model": "llama3.3:latest",
    "messages": [
      {
        "role": "user",
        "content": "Explain the impact of machine learning on climate research in 100 words or less."
      }
    ]
  }'

Expected Output (example):

{
  "id": "chatcmpl-XYZ123",
  "object": "chat.completion",
  "model": "llama3.3:latest",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Machine learning (ML) is revolutionizing climate research by unlocking unprecedented insights..."
      },
      "index": 0,
      "finish_reason": "length"
    }
  ]
}

Framework Integration

The e-INFRA CZ LLM API is designed to be compatible with the OpenAI API specification, meaning it can be integrated into frameworks originally built for OpenAI’s services.

PyDanticAI Integration Example

PyDanticAI is a framework that simplifies LLM interactions using OpenAI-compatible models. Use the following configuration to authenticate with the e-INFRA CZ LLM endpoint:

Example PydanticAI configuration for integrating with our API (use similar settings for other frameworks):

from pydantic_ai_provider import OpenAIModel, OpenAIProvider
import os

model = OpenAIModel(
    'deepseek-r1',
    provider=OpenAIProvider(
        base_url="https://llm.ai.e-infra.cz/v1",
        api_key=os.getenv("E_INFRA_API_TOKEN"),
    ),
)

Beyond PyDanticAI, similar configurations can be applied to:

  • LangChain
  • LlamaIndex
  • FastAPI-based applications
  • Any other frameworks or clients that support OpenAI API custom endpoints.

Reasoning Models in the API

Some models are hybrid, supporting both reasoning (thinking) and non-reasoning modes.

In the chat UI, most hybrid models are preconfigured to run in reasoning mode, which means you may see intermediate thinking output before the final response is returned. In the API, however, the default behavior depends on the specific model.

DeepSeek v3.2

By default, DeepSeek v3.2 runs without reasoning enabled. To enable reasoning, pass additional parameters via chat_template_kwargs:

curl https://llm.ai.e-infra.cz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer xxx" \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {
        "role": "user",
        "content": "What are 5 creative things I could do with my kids'' art? I don''t want to throw them away, but it''s also so much clutter."
      }
    ],
    "chat_template_kwargs": {
      "thinking": true
    }
  }'

For convenience—and for environments where chat_template_kwargs cannot be used (for example, certain AI agents)—we also provide a dedicated reasoning variant named deepseek-v3.2-thinking, which is permanently forced into thinking mode.

GLM-4.7

In contrast, the GLM-4.7 model enables reasoning mode by default. To disable reasoning and return only the final response, explicitly turn it off using chat_template_kwargs:

curl https://llm.ai.e-infra.cz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer xxx" \
  -d '{
    "model": "glm-4.7",
    "messages": [
      {
        "role": "user",
        "content": "What are 5 creative things I could do with my kids'' art? I don''t want to throw them away, but it''s also so much clutter."
      }
    ],
    "chat_template_kwargs": {
      "enable_thinking": false
    }
  }'

Use these options to control whether intermediate reasoning is included in API responses, depending on your application’s needs.

Last updated on

publicity banner

On this page

einfra banner