The best LLM inference performance on AMD GPUs
Moreh’s optimized version of vLLM improves inference throughput and latency through our proprietary libraries highly tuned for AMD GPUs, while fully enjoying the wide model support and optimization techniques of the original open-source vLLM.
Showcase
DeepSeek-R1 671B Inference
Moreh vLLM delivers industry-leading performance for the DeepSeek-R1 model on AMD Instinct MI300 series GPUs.
Normalized output tokens per second
(ROCm vLLM = 1)
All the numbers are measured on an 8x AMD Instinct MI300X GPU sever using vLLM’s benchmark_serving tool.