Brazilian healthcare AI company Sofya has successfully deployed Meta's Llama large language models (LLMs) to streamline administrative workflows for clinicians, demonstrating significant efficiency gains in a high-stakes enterprise environment. The implementation addresses the critical challenge of administrative burden in healthcare settings, allowing medical professionals to dedicate more time to direct patient care.
"Our use of Llama aligns with Sofya's mission as an expert in medical AI serving as a reasoning engine for precision health by streamlining data structuring and supporting clinical excellence," explained Marcelo Mearim, CEO at Sofya, who highlighted the strategic fit between the organisation's healthcare focus and Llama's capabilities.
The enterprise implementation delivers measurable business outcomes, with Sofya reporting a reduction of up to 30% in time spent on documentation and administrative tasks per consultation. This efficiency gain translates directly to organisational value by enabling healthcare providers to see more patients while maintaining quality standards, as evidenced by the company's 90% customer satisfaction (CSAT) rating.
Sofya's technical deployment strategy prioritised performance requirements essential for clinical environments. The company hosts its models on Oracle Cloud instances in Brazil, leveraging frameworks including Sglang and VLLM for model serving. This infrastructure choice provides enhanced security through data localisation while delivering the millisecond-level latency required for real-time clinical applications.
To optimise for both performance and accuracy, Sofya implemented a sophisticated model adaptation approach. The team employed knowledge distillation techniques with Llama 405B, alongside self-reflection prompt engineering methods, to create high-quality synthetic training data. This data was then used to fine-tune smaller, more efficient models including the 70B, 8B, and 3B variants, balancing computational requirements with clinical accuracy needs.
The enterprise implementation spans multiple healthcare workflow functions, with Llama automating data structuring, named entity recognition, and clinical question answering. These capabilities directly address documentation bottlenecks in healthcare settings, improving operational efficiency while reducing error rates.
Sofya's selection process for an enterprise AI model prioritised three key criteria: capability, transparency, and performance. The company's leadership cited Llama's flexibility as particularly valuable for healthcare applications, noting that its "high adaptability for different use cases makes it a robust choice for companies with similar challenges."
The implementation required integration with Sofya's existing enterprise technology stack, incorporating tools including Oracle Cloud, Hugging Face, LangSmith, and Sglang. The company also leveraged support from the open-source community to optimise their deployment.
Looking forward, Sofya plans to expand its Llama implementation by deploying the 70B model in an agent workflow that combines multiple tools with retrieval-augmented generation capabilities. This expansion will support the company's growth trajectory toward processing one million consultations monthly.
"Sofya.ai is all about making it easier to blend tech with personal care," noted Mearim, emphasising the alignment between AI implementation and the company's mission to enhance healthcare delivery. "We're creating a future where healthcare professionals can spend more time with patients, all thanks to automation and AI."
Sofya's implementation demonstrates how AI can deliver tangible business value in specialised enterprise environments like healthcare. The 30% reduction in administrative time directly addresses a critical pain point for healthcare organisations while maintaining high customer satisfaction levels. The implementation's ability to scale toward one million monthly consultations highlights enterprise-grade reliability and performance, establishing a blueprint for similar healthcare organisations seeking to reduce administrative burden while improving care delivery efficiency.