What Are Large Vision Models?
An LVM is an AI model that analyzes visual data, such as still images and videos. Trained on massive data sets, LVMs learn how to recognize objects and patterns. This is useful for such tasks as object detection and image classification.
Based on complex machine learning models, LVMs “analyze visual inputs and generate a likely set of outputs,” says Adam Rabin, senior product marketing manager for video security at Verkada.
“You can think of them as the visual counterpart to large language models, which process vast amounts of text-based data,” says Matthew Dietz, global AI leader at Cisco. Unlike LLMs, which focus on words, LVM are “trained specifically to process and understand visual data like images, videos and diagrams.”
This capability has significant potential for state and local organizations.
EXPLORE: Moving physical security to the cloud enhances surveillance.
Key Use Cases for State and Local Agencies
Public agencies can leverage LVMs across a range of use cases.
In public safety and emergency response, for example, “the LVMs can analyze disaster imagery to assist emergency responders, to support search-and-rescue operations, and provide actionable insights in real time,” Dietz says.
Law enforcement can benefit, as well.
“Suppose there’s a suspicious vehicle on the loose,” Rabin says. “Someone caught a glimpse of what they would describe as a 1970s red muscle car with a white stripe on it. Those are a handful of very specific attributes.”
Given that description, “and perhaps even getting even more granular, say there’s a bumper sticker or a dent on the side,” a local police department could leverage an LVM-supported tool to search traffic camera videos and other sources, he says.
LVMs likewise can help with routine traffic management. Planners could use them to understand “what the best routes are that we should take as we design our roadways, and also to make sure that we’re reducing the amount of wear and tear along these roadways,” Dietz says. “We’ve all dealt with traffic congestion in our lives. LVMs could potentially impact the traffic flow.”
DISCOVER: Balance speed and practicality at the network’s edge.
State and local agencies and their partners can also tap into LVMs for infrastructure and asset management. For example, “satellite imagery or drone imagery is helping to inspect the critical infrastructure, monitor road conditions and assess damage after things like natural disasters,” Dietz says.
“Imagine a utility company that is responsible for thousands of miles of power lines or pipelines or infrastructure spread across a vast area,” he says. “With LVMs, drones equipped with high-resolution cameras could fly over all of these assets, capture all of these images in minutes, and then analyze that visual data in real time to detect things like cracks in pipelines or corroded power lines.”
Technology Solutions for Leveraging LVMs
To harness the power of LVMs, state and local agencies likely will take advantage of commercial products that, in turn, depend on open-source models.
“It wouldn’t be practical for state and local agencies to build their own LVMs, because there are already so many great open-source ones that have been trained on billions of inputs, with billions of parameters,” Rabin says.
Commercial products use open-source LVMs to enable end users to search for and analyze video footage. Verkada, for example, combines LVMs with LLMs, enabling users to enter text that describes what they are seeking, whether that’s a suspect’s vehicle or a downed power line.