Chat AI
Introduction
Open-WebUI is an AI-powered chatbot interface that allows users to interact with various models for text generation and image creation. It can be also used via API and connected to other applications such as Visual Studio. See section Creating an API key below. This guide provides instructions on how to log in, use models, generate images, and create an API key.
Accessing Open-WebUI
Open-WebUI is accessible at https://chat.ai.e-infra.cz. To use the platform, you need a valid Metacentrum account, see How to get Access.
Logging In
- Open your web browser and navigate to https://chat.ai.e-infra.cz.
- Click on the Login button.
- Select the option to log in with e-INFRA CZ.
- Once logged in, you will be redirected to the Open-WebUI dashboard.
Using AI Models
Open-WebUI provides access to various AI models for text generation. To use them:
- After logging in, navigate to the chat interface.
- Select a model from the available options in the dropdown menu.
- Type your query or request in the input field.
- Press Enter or click Submit to receive a response from the selected model.
- Do not hesitate to scroll through the model list! There are more models available. 👇⬇️
Currently Available Models (as of 04/29/2025)
We categorize our models into two groups:
- Guaranteed Models – These are stable and expected to remain available long-term. Any replacements or updates will be announced here and via WebUI banners.
- Experimental Models – These are subject to change as we optimize resources or test new capabilities.
For exact model names, query the model list as shown below. When accessing externally, you must use the exact model names, such as llama3.3:latest
.
Model names are case-sensitive and may include version tags like :latest
or specific quantization formats.
Guaranteed Models
Model | Description |
---|---|
LLaMA 3.3 | A 70B language model from Meta. Its efficiency and versatility make it well-suited for a wide range of natural language processing tasks. |
DeepSeek R1 | A 32B Qwen-distilled model from DeepSeek (China), designed with a focus on reasoning. It excels at complex tasks such as mathematics and code generation. |
Qwen 2.5 Coder | A 32B Q8 variant specialized in code understanding and generation, ideal for developers needing programming assistance. |
Gemma 3 | A 27B FP16 language model from Google, part of the lightweight Gemma family built on Gemini technology. |
Command-A | A 111B model from Cohere. Though slower in response time, it is particularly strong in programming tasks and manifest generation. |
Experimental Models
Model | Description |
---|---|
Aya Expanse | A 32B Q8 multilingual model from Cohere, trained to perform well across 23 languages, including Czech. |
Phi-4 | A 14B Q8 model from Microsoft, trained on a mix of synthetic datasets, filtered public web content, academic books, and Q&A datasets. It represents the current state of the art in open-access models. |
LLaMA 4 | Meta’s Scout 17B-16E variant. While intended for general NLP tasks, it currently underperforms compared to LLaMA 3.3. |
Mistral-Small 3.1 | A 24B language model from Mistral AI, featuring built-in vision capabilities. |
Embedding Models
The Open WebUI currently does not support API access to embedding models (even if compatible with the OpenAI API). As a result, no embedding models are currently available through the Open WebUI API.
However, users may request IP-based access to the vLLM system, which includes support for embedding models.
Generating Images
Open-WebUI also allows users to generate images using AI.
- Select text model for prompt generation (such as LLama 3.3).
- Click on Image icon (Generate an Image).
- Enter the text prompt, e.g,
Four horsemen of apocalypse
. Click send or Enter. - Image will be generated and displayed.
Creating an API Key
To use Open-WebUI’s API, you need to generate an API key.
- Go to the Settings section of the Open-WebUI interface.
- Nagigate to the Account (Účet).
- Click on API keys (display).
- Ignore JWT token and select API key and either generate new or display existing.
- Copy the generated API key and store it securely.
- Use this key in API requests to authenticate and access Open-WebUI services.
- Endpoint API is: https://chat.ai.e-infra.cz/api/.
Models for API Interface
For API interfaces, you should query exact model names using, e.g., curl
and jq
commands, replacing TOKEN with your real token.
You will see output similar to:
Then use, e.g., the llama3.3:latest
as the model name in the API.
Knowledge Function (Alpha)
Open WebUI includes an experimental Knowledge Function, which is essentially a RAG (Retrieval-Augmented Generation) system. This allows users to upload custom texts and query a model that generates answers based on that content.
Currently, the Knowledge Function only supports global configuration, meaning that a single embedding model is used for all stored texts. We are still evaluating which embedding models are best suited for this purpose. A key limitation is that changing the embedding model requires all previously uploaded texts to be reprocessed, as RAG relies on consistent embeddings to function correctly. For this reason, the feature is not recommended for production use and is intended primarily for testing and preview.
Another challenge lies in finding suitable embedding models that support the Czech language and large input contexts. Most available models are limited to a 512-token input, which is suboptimal for longer texts, as it requires splitting the content into small fragments—often too small to provide high-quality answers.
While we could integrate external embedding models like OpenAI’s text-embedding-3-small
, this approach would compromise data privacy, as it involves sending data to a third-party service.
Data Privacy
Open WebUI displays model labels as either internal or external. This labeling is a feature of WebUI itself. Models are marked as internal if inference is handled via the Ollama API, and external if inference is done through an OpenAI-compatible API (in our case, the vLLM system).
All models accessible to users run on our infrastructure, and inference-related data does not leave our systems. However, we can provide access to truly external models—such as GPT-4o—upon special agreement. In such cases, data is understandably transmitted to a third party.
The Ollama inference system has all request/response logging completely disabled. In contrast, the vLLM system logs requests and responses only while the associated Pod is running.
Open WebUI itself logs at the INFO level by default, meaning request/response data is not logged. Occasionally, when troubleshooting is required, the log level may be temporarily raised to DEBUG. In such cases, request/response data is stored until the Pod is restarted—typically when a new version is deployed.
Only system administrators have access to these logs, and they are not transmitted anywhere else.
Saved conversations are stored in a PostgreSQL database hosted within our infrastructure. When a conversation is deleted, it should also be removed from the database—though we have not independently verified whether WebUI fully implements this behavior. The database is backed up to an S3 storage on CESNET with a 30-day retention period, meaning older backups are deleted after 30 days. Theoretically, a deleted conversation will be completely erased within that window. Again, access to this system is restricted to administrators only.
All administrators have NDA signed.
Last updated on