The best LLM inference performance on AMD GPUs

Moreh’s optimized version of vLLM improves inference throughput and latency through our proprietary libraries highly tuned for AMD GPUs, while fully enjoying the wide model support and optimization techniques of the original open-source vLLM.

Showcase

DeepSeek-R1 671B Inference

Moreh vLLM delivers industry-leading performance for the DeepSeek-R1 model on AMD Instinct MI300 series GPUs.

Normalized output tokens per second

(ROCm vLLM = 1)

All the numbers are measured on an 8x AMD Instinct MI300X GPU sever using vLLM’s benchmark_serving tool.

Performance Evaluation Reports

3 Ways to Get Started

Install on Existing AMD GPU Servers

Moreh vLLM is packaged in a Docker container and can be easily deployed to your server.

Request On-Demand Optimization for Private AI Models

Moreh can deliver on-demand vLLM that enables customer’s proprietary AI models to be optimized and served on AMD GPUs.

Try on Moreh Cloud