Unveiled at TT-Deploy in San Francisco, the demonstration highlighted Moreh’s proprietary MoAI Inference Framework, tested across leading Mixture-of-Experts models like GPT-OSS, Qwen, GLM, and DeepSeek. The results signal a strong alternative to traditional GPU-centric architectures.
A key innovation lies in its heterogeneous distributed serving architecture, combining GPUs with Tenstorrent processors as dedicated prefill accelerators. This approach reduces reliance on high-cost HBM memory, delivering greater scalability and lower infrastructure costs without compromising performance.
By enabling unified operation across NVIDIA, AMD, and Tenstorrent systems, Moreh’s platform supports flexible, vendor-agnostic AI infrastructure strategies — a critical advantage for enterprises scaling large language models.
This milestone reinforces Moreh’s growing role in the global AI ecosystem, alongside partners like AMD and Tenstorrent — pushing the boundaries of efficient, next-generation AI deployment

Share your Details for subscribe