University of Virginia researchers have created an artificial intelligence system that can analyse and understand human actions in video footage in real-time.
The system, named Semantic and Motion-Aware Spatiotemporal Transformer Network (SMAST), demonstrates unprecedented precision in detecting and analysing human actions in real-time video streams.
The SMAST system employs two innovative components: a multi-feature selective attention model and a motion-aware 2D positional encoding algorithm. The first component allows the AI to focus on crucial elements within a scene while filtering out irrelevant details, while the second tracks and remembers movement patterns over time.
In benchmark testing, SMAST has outperformed existing solutions across multiple academic standards, including AVA, UCF101-24, and EPIC-Kitchens databases, setting new industry standards for accuracy and efficiency.
The research work was largely based on a paper published earlier this year, titled "A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection" in the IEEE Transactions on Pattern Analysis and Machine Intelligence.