OpenAI has shed light on a new approach to AI image generation, creating high-quality images approximately 50 times faster than current methods while maintaining comparable quality.
The new system, called sCM (simplified Continuous-time consistency Models), reduces image generation to just two processing steps, enabling creation of a single image in 0.11 seconds on a single A100 GPU. This advancement represents a significant step forward for real-time AI applications across image, audio, and video generation.
The research team has successfully scaled the training of continuous-time consistency models to 1.5 billion parameters on ImageNet at 512×512 resolution. This new approach addresses a fundamental limitation of current diffusion models, which require dozens to hundreds of sequential steps to generate a single sample.
Performance testing demonstrates that sCM produces images with quality comparable to leading diffusion models, while using less than 10% of the effective sampling compute. The research team evaluated sample quality using the standard Fréchet Inception Distance (FID) scores, comparing against other state-of-the-art generative models.
Research shows that sCMs improve proportionally with teacher diffusion models as both scale up, with two-step samples already achieving comparable quality to samples from teacher diffusion models, that require hundreds of steps to generate.
However, the researchers acknowledge limitations. The best sCMs still rely on pre-trained diffusion models for initialisation and distillation, resulting in a small but consistent gap in sample quality compared to the teacher diffusion model.