NVIDIA has introduced four new NeMo Retriever NIM inference microservices, designed to improve the accuracy and throughput of large language models, by efficiently fetching relevant proprietary data for AI applications.
NVIDIA has unveiled four new NeMo Retriever NIM (NVIDIA Inference Microservices) inference microservices, aimed at enhancing the accuracy and performance of large language models (LLMs) in enterprise applications. These microservices are designed to enable efficient retrieval-augmented generation (RAG) by connecting custom models to diverse business data.
The new NeMo Retriever NIM microservices include NV-EmbedQA-E5-v5, NV-EmbedQA-Mistral7B-v2, Snowflake-Arctic-Embed-L, and NV-RerankQA-Mistral4B-v3.
These models combine embedding and reranking capabilities for optimal performance, resulting in 30% fewer inaccurate answers for enterprise question answering compared to alternate models.
These microservices offer a wide range of applications, including chatbots, security analysis, and supply chain insights. They integrate seamlessly with platforms from Cohesity, DataStax, NetApp, and Snowflake, and are compatible with NVIDIA Riva NIM microservices for enhanced speech AI applications.
Developers can deploy the microservices in cloud, on-premises, or hybrid environments, providing flexibility to suit various needs. They are available through the NVIDIA API catalog and can be accessed via ai.nvidia.com.
NVIDIA is collaborating with global system integrators and service delivery partners, to help enterprises incorporate these microservices into their AI pipelines. The company is also working with major cloud providers like AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure to ensure compatibility with their platforms.
With the launch of NeMo Retriever NIM microservices, NVIDIA is addressing the critical need for accuracy in generative AI applications. By enabling enterprises to leverage their proprietary data more effectively, these microservices have the potential to significantly enhance the performance and reliability of AI-powered solutions.