Reduces HBM Costs with GPU–Tenstorrent Heterogeneous Distributed Serving
First unveiled at Tenstorrent’s launch event, TT-Deploy, in San Francisco on May 1
SANTA CLARA, Calif., May 2, 2026 /PRNewswire/ — Moreh, an AI infrastructure software company, led by CEO Gangwon Jo, announced that it has successfully validated LLM inference performance on the Tenstorrent Galaxy Wormhole system using its proprietary ‘MoAI Inference Framework.’
Based on tests across leading Mixture-of-Experts (MoE) models—including GPT-OSS, Qwen, GLM, and DeepSeek—Moreh achieved LLM inference performance on Tenstorrent Galaxy Wormhole matching or surpassing NVIDIA DGX A100-class systems, demonstrating a compelling alternative to conventional GPU-centric AI infrastructure.
Moreh also improved cost efficiency by implementing a disaggregated serving architecture that combines GPUs with Tenstorrent Wormhole chips. By utilizing Tenstorrent processors as dedicated prefill accelerators, the company reduced reliance on high-cost HBM and lowered overall infrastructure costs.
The results were first unveiled at Tenstorrent’s launch event, TT-Deploy, held on May 1 in San Francisco.
As a strategic partner of Tenstorrent and a major external contributor to Metalium, Moreh showcased a live LLM inference demo at the event. Building on its experience operating AMD GPU-based production environments in real-world data centers, the company presented its latest technical achievements in ‘Production-Ready LLM Inference on Tenstorrent Galaxy.’
MoAI Inference Framework is a disaggregated inference solution that enables unified operation of heterogeneous GPUs and NPUs—including NVIDIA, AMD, and Tenstorrent—within a single cluster. This allows enterprises to build flexible AI infrastructure strategies without vendor lock-in.
Moreh CEO Gangwon Jo stated, “Achieving production-grade LLM inference performance and stability on Tenstorrent-based systems marks a significant milestone,” and added, “We will continue to enhance performance through deeper optimization across heterogeneous architectures and closer integration with Tenstorrent NPUs.”
Moreh is developing its own core AI infrastructure engine and, through its foundation LLM subsidiary Motif Technologies, is building end-to-end capabilities spanning both infrastructure and model domains. Simultaneously, the company is making its mark in the global market through collaborations with key partners such as AMD, Tenstorrent, and SGLang.
