einfra logoDocumentation
Web apps

Chat AI

Introduction

Open-WebUI is an AI-powered chatbot interface that allows users to interact with various models for text generation and image creation. It can be also used via API and connected to other applications such as Visual Studio. See section Creating an API key below. This guide provides instructions on how to log in, use models, generate images, and create an API key.

Accessing Open-WebUI

Open-WebUI is accessible at https://chat.ai.e-infra.cz. To use the platform, you need a valid Metacentrum account, see How to get Access.

Logging In

  1. Open your web browser and navigate to https://chat.ai.e-infra.cz.
  2. Click on the Login button.
  3. Select the option to log in with e-INFRA CZ.
  4. Once logged in, you will be redirected to the Open-WebUI dashboard.

Using AI Models

Open-WebUI provides access to various AI models for text generation. To use them:

  1. After logging in, navigate to the chat interface.
  2. Select a model from the available options in the dropdown menu.
  3. Type your query or request in the input field.
  4. Press Enter or click Submit to receive a response from the selected model.
  5. Do not hesitate to scroll through the model list! There are more models available. 👇⬇️

chatscroll

Currently Available Models (as of 04/29/2025)

We categorize our models into two groups:

  1. Guaranteed Models – These are stable and expected to remain available long-term. Any replacements or updates will be announced here and via WebUI banners.
  2. Experimental Models – These are subject to change as we optimize resources or test new capabilities.

For exact model names, query the model list as shown below. When accessing externally, you must use the exact model names, such as llama3.3:latest.

Model names are case-sensitive and may include version tags like :latest or specific quantization formats.

Guaranteed Models

ModelDescription
LLaMA 3.3A 70B language model from Meta. Its efficiency and versatility make it well-suited for a wide range of natural language processing tasks.
DeepSeek R1A 32B Qwen-distilled model from DeepSeek (China), designed with a focus on reasoning. It excels at complex tasks such as mathematics and code generation.
Qwen 2.5 CoderA 32B Q8 variant specialized in code understanding and generation, ideal for developers needing programming assistance.
Gemma 3A 27B FP16 language model from Google, part of the lightweight Gemma family built on Gemini technology.
Command-AA 111B model from Cohere. Though slower in response time, it is particularly strong in programming tasks and manifest generation.

Experimental Models

ModelDescription
Aya ExpanseA 32B Q8 multilingual model from Cohere, trained to perform well across 23 languages, including Czech.
Phi-4A 14B Q8 model from Microsoft, trained on a mix of synthetic datasets, filtered public web content, academic books, and Q&A datasets. It represents the current state of the art in open-access models.
LLaMA 4Meta’s Scout 17B-16E variant. While intended for general NLP tasks, it currently underperforms compared to LLaMA 3.3.
Mistral-Small 3.1A 24B language model from Mistral AI, featuring built-in vision capabilities.

Embedding Models

The Open WebUI currently does not support API access to embedding models (even if compatible with the OpenAI API). As a result, no embedding models are currently available through the Open WebUI API.

However, users may request IP-based access to the vLLM system, which includes support for embedding models.

Generating Images

Open-WebUI also allows users to generate images using AI.

  1. Select text model for prompt generation (such as LLama 3.3).
  2. Click on Image icon (Generate an Image).
  3. Enter the text prompt, e.g, Four horsemen of apocalypse. Click send or Enter.
  4. Image will be generated and displayed.

Creating an API Key

To use Open-WebUI’s API, you need to generate an API key.

  1. Go to the Settings section of the Open-WebUI interface.
  2. Nagigate to the Account (Účet).
  3. Click on API keys (display).
  4. Ignore JWT token and select API key and either generate new or display existing.
  5. Copy the generated API key and store it securely.
  6. Use this key in API requests to authenticate and access Open-WebUI services.
  7. Endpoint API is: https://chat.ai.e-infra.cz/api/.

Models for API Interface

For API interfaces, you should query exact model names using, e.g., curl and jq commands, replacing TOKEN with your real token.

curl -H "Authorization: Bearer TOKEN" https://chat.ai.e-infra.cz/api/models | jq .data[].id

You will see output similar to:

"llama3.3:latest"
"llama3.3:70b-instruct-fp16"
"deepseek-r1:32b-qwen-distill-fp16"
"qwen2.5-coder:32b-instruct-q8_0"
"aya-expanse:latest"

Then use, e.g., the llama3.3:latest as the model name in the API.

Knowledge Function (Alpha)

Open WebUI includes an experimental Knowledge Function, which is essentially a RAG (Retrieval-Augmented Generation) system. This allows users to upload custom texts and query a model that generates answers based on that content.

Currently, the Knowledge Function only supports global configuration, meaning that a single embedding model is used for all stored texts. We are still evaluating which embedding models are best suited for this purpose. A key limitation is that changing the embedding model requires all previously uploaded texts to be reprocessed, as RAG relies on consistent embeddings to function correctly. For this reason, the feature is not recommended for production use and is intended primarily for testing and preview.

Another challenge lies in finding suitable embedding models that support the Czech language and large input contexts. Most available models are limited to a 512-token input, which is suboptimal for longer texts, as it requires splitting the content into small fragments—often too small to provide high-quality answers.

While we could integrate external embedding models like OpenAI’s text-embedding-3-small, this approach would compromise data privacy, as it involves sending data to a third-party service.

Data Privacy

Open WebUI displays model labels as either internal or external. This labeling is a feature of WebUI itself. Models are marked as internal if inference is handled via the Ollama API, and external if inference is done through an OpenAI-compatible API (in our case, the vLLM system).

All models accessible to users run on our infrastructure, and inference-related data does not leave our systems. However, we can provide access to truly external models—such as GPT-4o—upon special agreement. In such cases, data is understandably transmitted to a third party.

The Ollama inference system has all request/response logging completely disabled. In contrast, the vLLM system logs requests and responses only while the associated Pod is running.

Open WebUI itself logs at the INFO level by default, meaning request/response data is not logged. Occasionally, when troubleshooting is required, the log level may be temporarily raised to DEBUG. In such cases, request/response data is stored until the Pod is restarted—typically when a new version is deployed.

Only system administrators have access to these logs, and they are not transmitted anywhere else.

Saved conversations are stored in a PostgreSQL database hosted within our infrastructure. When a conversation is deleted, it should also be removed from the database—though we have not independently verified whether WebUI fully implements this behavior. The database is backed up to an S3 storage on CESNET with a 30-day retention period, meaning older backups are deleted after 30 days. Theoretically, a deleted conversation will be completely erased within that window. Again, access to this system is restricted to administrators only.

All administrators have NDA signed.

Last updated on

publicity banner

On this page

einfra banner