BEIJING, Nov. 6, 2025 /PRNewswire/ — Recently, HiDream.ai has been honored the Best Demo at the 33rd ACM International Conference on Multimedia (ACM MM 2025), thus becoming the first Chinese startup team in multimodal generative AI to claim this honor—underscoring the company’s top-tier research prowess and exceptional innovation capabilities in this field. The prestigious award recognizes the company’s revolutionary unified multimodal agent—HiDream-Agent, a pioneering agent that transforms complex visual content creation into an intuitive conversational experience.
ACM MM, organized by the Special Interest Group on Multimedia (SIGMM) of the Association for Computing Machinery (ACM), is the top-tier academic event in the global multimedia field. Dedicated to advancing research innovation and industrial application of multimedia technologies, it is widely regarded as one of the most authoritative and influential conferences in the industry, attracting leading scholars and tech giants worldwide. The Best Demo symbolizes both the high international recognition of the research outcomes and the research team’s outstanding competence in multimedia technology innovation and application.
HiDream-Agent’s core strength lies in breaking the limitations of fragmented multimodal tools. It seamlessly integrates text-to-image generation, instruction-based image editing, and text/image-to-video generation within a single interface, effectively addressing the industry-wide challenge of cross-modal semantic alignment. Built on the 17-billion-parameter HiDream-I1 model, featuring a sparse Diffusion Transformer (DiT) structure and dynamic Mixture-of-Experts (MoE) design, it delivers exceptional performance on international benchmarks like HPS and GenEval. For instruction-based image editing, the team optimized HiDream-I1 with robust in-context visual conditioning, enabling precise image modifications.
This agent ushers in a new paradigm for accessible, interactive visual storytelling and collaborative content creation in multimodal generative AI. By merging generation and editing into a dialogue-driven experience, it lowers the barrier to high-quality visual content creation, drastically shortens iteration cycles, and enables a “one-conversation” creative loop from idea to polished output. Currently, this technology prototype has been successfully iterated into the Chat Generation function of HiDream.ai’s flagship product vivago, delivering more natural, personalized multimodal interaction for users.
Additionally, at ACM MM 2025, HiDream.ai hosted the Identity-Preserving Video Generation (IPVG) Challenge. Featuring two tracks—Facial Identity-Preserving Video Generation and Full-Body Identity-Preserving Video Generation—the competition requires participants to maintain the consistency of the given identity during video generation. It also provides a new dataset to support the task of identity-preserving text-to-video generation, attracting numerous top-tier research teams worldwide.
HiDream.ai was founded in 2023 by Dr. Mei Tao—an Academician of the Canadian Academy of Engineering, Fellow of IEEE/IAPR/CAAI, and Senior Researcher at Microsoft Research Asia. The team he leads has over a decade of experience, dedicating to the innovative exploration and commercialization of generative AI technologies. HiDream.ai focuses on visual multimodal foundation models, aiming to empower the creative industry through generative AI technology. Notably, its HiDream-I1 model, launched in April this year, topped the authoritative Artificial Analysis ranking within 24 hours, becoming the first Chinese self-developed generative AI model to enter the global top tier and maintaining its leading position ever since. Moving forward, the team will deepen multimodal technology innovation, accelerate the industrialization of its technologies, expand core application scenarios in digital creation and film/television post-production, foster global tech collaboration and academic exchanges, and deliver more intelligent, efficient AI creative solutions for creators worldwide.
Paper: https://doi.org/10.1145/3746027.3754467
