REDWOOD CITY, Calif., April 4, 2024 /PRNewswire/ — FriendliAI, a frontrunner in inference serving for generative AI, is thrilled to announce Friendli Dedicated Endpoints, which offers the capabilities of Friendli Container as a managed service. This latest addition to the Friendli Suite eliminates the complexities of containerization and development, providing customers with automated, cost-effective, and high-performance custom model serving.
Friendli Dedicated Endpoints is the managed cloud service alternative to Friendli Container. Friendli Container, currently adopted by startups and enterprises alike to deploy Large Language Models (LLMs) at scale within private environments, shows significant reductions in GPU costs with the power of the highly GPU optimized Friendli Engine, which powers Friendli Dedicated Endpoints as well.
In addition to leveraging the Friendli Engine, Friendli Dedicated Endpoints streamlines the process of building and serving LLMs through automation, making it more cost and time efficient. Friendli Dedicated Endpoints handles managing and operating generative AI deployments, from model custom fine-tuning to procuring cloud resources to automatic monitoring of deployments. For instance, users can fine-tune and deploy a quantized Llama 2 or Mixtral model using the powerful Friendli Engine in just a few clicks, bringing cutting-edge GPU-optimized serving to users of all technical backgrounds.
Byung-Gon Chun, CEO of FriendliAI, highlighted the importance of democratizing generative AI, emphasizing its importance in driving innovation and organizational productivity.
“With Friendli Dedicated Endpoints, we’re eliminating the hassle of infrastructure management so that customers can unlock the full potential of generative AI with the power of Friendli Engine. Whether it’s text generation, image creation, or beyond, our service opens the doors to endless possibilities for users of all backgrounds.”
Key features of Friendli Dedicated Endpoints:
Dedicated GPU Instances: Users can reserve entire GPUs for serving their custom generative AI models, ensuring consistent and reliable access to high-performance GPU resources. Custom Model Support: Users can upload, fine-tune, and deploy models, enabling tailored solutions for diverse AI applications. Superior Performance and Efficiency: A single GPU with the optimized Friendli Engine delivers results equivalent to up to seven GPUs with vLLM. Friendli Engine saves 50% to 90% on GPU costs and boasts up to 10x faster query response times. Intelligent Operation: Friendli Dedicated Endpoints seamlessly adapts to fluctuating workloads and failures with automated failure management and auto-scaling that adjusts resource allocation based on traffic patterns, ensuring uninterrupted operations and resource efficiency during peak demand periods.
By eliminating technical barriers and optimizing GPU usage, FriendliAI hopes that infrastructure constraints will no longer hinder innovation in generative AI.
Chun says, “We’re thrilled to welcome new users on our journey to make generative AI models fast and affordable.”
For more information about Friendli Dedicated Endpoints or Friendli Container, please visit https://friendli.ai/
About FriendliAI:
FriendliAI is a leader in inference serving for generative AI, committed to democratizing access to cutting-edge generative AI technologies. By providing accessible generative AI infrastructure services for developers, FriendliAI aims to accelerate innovation in the field of generative AI.
For media inquiries or interview requests, please contact Sujin Oh at press@friendli.ai