
"Nebius is launching an AI platform called Token Factory. It is designed to help companies bring open-source and customized language models into production. Token Factory combines various components of the AI production process, such as inferencing, fine-tuning, and access management within a single environment. It supports dozens of open models, including DeepSeek, Llama, GPT-OSS from OpenAI, NVIDIA Nemotron, and Qwen. Companies can also run their own models on it. The service runs on Nebius' existing AI Cloud infrastructure."
"The introduction comes at a time when many organizations are moving from experimental AI projects to practical applications. This is increasing the need for open models that offer more freedom than commercial alternatives. However, the use of such models presents challenges, for example in terms of security, scalability, and cost control. Token Factory aims to partially address these issues by automating management and monitoring."
"The underlying infrastructure, Nebius AI Cloud 3.0 (Aether), offers monitoring, security, and performance that has been tested against industry standards such as MLPerf Inference. Transparent costs per token Token Factory focuses on optimizing models after the training phase. Users can convert open model weights into production-ready systems with transparent costs per token. Fine-tuning and distillation are built in, allowing models to be adapted to business data while response time and costs can be reduced by tens of percent, according to Nebius."
Nebius is launching Token Factory, an AI platform designed to bring open-source and customized language models into production. The platform integrates inferencing, fine-tuning, distillation, and access management within a single environment and supports dozens of models including DeepSeek, Llama, GPT-OSS, NVIDIA Nemotron, and Qwen while enabling companies to run their own models. Token Factory runs on Nebius AI Cloud 3.0 (Aether) and provides monitoring, security, and performance tested against industry standards. The service automates infrastructure management to help teams scale solutions and address security, scalability, and cost challenges. Users can convert model weights into production-ready systems with transparent token costs and built-in optimization to reduce response time and operating costs.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]