A high-throughput, memory-efficient serving engine for LLMs using PagedAttention.
A high-performance C++ implementation for running LLMs on consumer hardware.
The fastest generative media platform for real-time image and video generation.
A lightning-fast cloud platform for building and running open-source AI models.
The leading European AI company producing efficient, open, and high-performance LLMs.
An advanced AI research company providing high-performance open-source coding and chat models.
Get up and running with large language models locally on macOS, Linux, and Windows.
The premier community platform for Stable Diffusion models, LoRAs, and creative assets.
The leading Chinese open-source Model-as-a-Service platform by Alibaba Group.
Run and deploy open-source machine learning models with a simple cloud API.
A high-throughput, memory-efficient serving engine for LLMs using PagedAttention.