NVIDIA has introduced four new NeMo Retriever NIM inference microservices designed to improve the accuracy and throughput of large language models by efficiently fetching relevant proprietary data for AI applications.
NVIDIA has unveiled four new NeMo Retriever NIM (NVIDIA Inference Microservices) inference microservices, aimed at enhancing the accuracy and performance of large language models (LLMs) in enterprise applications. These microservices are designed to enable efficient retrieval-augmented generation (RAG) by connecting custom models to diverse business data.
The new microservices include NV-EmbedQA-E5-v5, NV-EmbedQA-Mistral7B-v2, Snowflake-Arctic-Embed-L, and NV-RerankQA-Mistral4B-v3. By combining embedding and reranking models, these services offer optimal performance, resulting in 30% fewer inaccurate answers for enterprise question answering compared to alternate models.
These microservices support a wide range of applications, including chatbots, security analysis, and supply chain insights. They integrate seamlessly with platforms from Cohesity, DataStax, NetApp, and Snowflake, and are compatible with NVIDIA Riva NIM microservices for enhanced speech AI applications.
Developers can deploy the microservices in cloud, on-premises, or hybrid environments, providing flexibility to suit various needs.
NVIDIA is collaborating with global system integrators and service delivery partners to help enterprises incorporate these microservices into their AI pipelines. The company is also working with major cloud providers like AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure to ensure compatibility with their platforms.
With the launch of NeMo Retriever NIM microservices, NVIDIA is addressing the critical need for accuracy in generative AI applications. By enabling enterprises to leverage their proprietary data more effectively, these microservices have the potential to significantly enhance the performance and reliability of AI-powered solutions across various industries.