Artificial intelligence is driving super surveillance - Hangzhou Dingbiao Technology Co., Ltd.

We usually think of surveillance cameras as digital eyes, watching us, or for us, depending on your position. But in reality, they are more like portholes: only useful when someone looks through them. Sometimes this means that someone will monitor the surveillance video, usually multiple video windows at the same time. However, most surveillance cameras are passive. They are there to act as a deterrent, or to provide evidence when problems arise.

However, this situation is changing video surveillance, and the speed of change is very fast. Artificial intelligence gives surveillance cameras a brain that matches their eyes, allowing them to analyze live video without human intervention. This may be good news for public safety, helping police and emergency responders to more easily detect crimes and accidents, and has a range of scientific and industrial applications. But it also raises serious questions about the future of privacy and poses new risks to social justice.

What would happen if the government could use CCTV video surveillance to track a large number of people? What if the police can upload your face photos to a database and track you digitally across the city? Or is the algorithm running on the camera in your local mall prejudiced, just because you don't like the look of a certain group of teenagers, what if you issue an alert to the police?

Although it will take some time for these scenarios to appear, we have already seen the initial results of combining surveillance and artificial intelligence. One example is IC Realtime. The company's flagship product launched last December was used by Google for CCTV video surveillance. This is an application and web platform called Ella that uses artificial intelligence to analyze the content in the video stream and make it instantly searchable. Ella can recognize thousands of natural language queries, allowing users to search through the footage to find people who include specific animals, clothes of a specific color, and even clipped images of a specific car brand or model.

In a network demonstration, Matt Realor, CEO of IC Realtime, showed a version of Ella to The Verge, which connected about 40 surveillance cameras monitoring an industrial park. He entered various searches-"a man in red", "UPS truck", "police car"-all of which extracted relevant footage within seconds. He then narrowed down the time and location range, and pointed out how users can swipe up and down with their thumbs to improve results-just like Netflix.

AI surveillance starts with searchable videos

Sailor said: "If there is a robbery, you don't really know what happened." He said, "But then a Jeep Wrangler sped eastward. So let's search for 'Jeep

Jeep Wrangler 'and found it. "On the screen, video clips started to appear, showing different Jeep Wranglers sliding past the camera. This would be the first big advantage of combining artificial intelligence and CCTV video surveillance, Sailor explained: Makes you very It's easy to find what you're looking for. He said: "Without this technology, you won't know more than your camera, and you have to pick from hours, hours and hours of video Filter content. "

LaElla runs on Google Cloud and can search for shots from almost any CCTV video surveillance system. Sailor said: "It works well from a single camera system-such as a babysitter or dog camera-to an enterprise system with tens of thousands of cameras." Users pay monthly usage fees, starting at For around $ 7 per month, the total price will increase based on the number of cameras.

IC Realtime hopes to target businesses of all sizes, but the company also believes that its technology can also appeal to individual consumers. These customers have been well served by the rapid development of the "smart" home security camera market, which is manufactured by companies such as Amazon, Logitech, Netgear, and Google's Nest. But Sailor said that this technology is too crude compared to IC Realtime's technology. These cameras connect to home Wi-Fi and provide real-time video streaming through the app. When they find something moving, they automatically record the video. However, Sailor said that their inability to distinguish between intruders and birds has led to many false positives. "They are very basic technologies that have been around for many years," he said. "There is no artificial intelligence and no deep learning here."

This situation will not last long. Although IC Realtime's cloud-based analytics capabilities can upgrade existing, point-and-shoot cameras, other companies have added artificial intelligence directly to their hardware. Boulder AI is one such startup that uses its own independent artificial intelligence camera to market "vision as a service". A big advantage of integrating artificial intelligence into devices is that they do not require an internet connection to work. Boulder sells to a wide range of industries, tailoring machine vision systems to each customer.

The company's founder Darren Odom told The Verge: "The application is really very comprehensive." He said: "Our platform is sold to banks, energy companies. We even have an application that can look at pizzas and make sure Are they the right size and shape. "

"We now recognize 100% of Idaho trout."

DomOdom cites an example of a client building a dam in Idaho. To comply with environmental regulations, they are monitoring the number of fish that can cross the top of this infrastructure. Odom said, "They used to have a person sit at a window and watch the fish ladder, counting how many trout swim through." Upstream.) "Then they moved to video technology and someone (remotely) monitored it." Finally, they contacted Boulder, which built a customized CCTV surveillance system for them to determine the fish passing upstream of the fish ladder. category. Odom is proud to say: "We really use computer vision for fish species identification." Odom says: "We are now 100% able to identify Idaho trout." If IC Realtime represents a universal for this market At one end, Boulder is demonstrating the capabilities of boutique contractors. However, in both cases, what these companies are currently offering is only the tip of the iceberg. Just as machine learning has made rapid progress in its ability to recognize objects, its ability to analyze scenes, activities, and actions is expected to increase rapidly. Everything is in place, including basic research, computing power, and training datasets-a key component of creating capable artificial intelligence. The two largest data sets for video analytics come from YouTube and Facebook, both of which have expressed their hope that artificial intelligence will help them control the content on the platform (though both companies also acknowledge that they are not ready). For example, YouTube's dataset contains more than 450,000 hours of tagged videos, hoping to stimulate "innovation and advancement in video understanding." The breadth of organizations involved in building such datasets gives some insight into the importance of the field. Google, MIT, IBM and DeepMind have all joined in and started similar projects of their own.

IC Realtime is already developing advanced tools such as facial recognition. After that, it wants to be able to analyze what is happening on the screen. Sailor said he has talked to potential customers in the education industry, who hope that when students encounter trouble at school, monitoring will identify them. "For example, they are interested in quick notifications of fights," he said. All that the system needs to do is pay attention to the students who come together and then remind someone so he can check the video content to see what happened or personally Go investigate.

Boulder is also exploring this advanced analysis. The goal of a prototype system the company is developing is to analyze the behavior of people in the bank. Odom said, "We are looking specifically for bad guys and exploring the differences between the behavior of a normal person and a person who crosses the line." To do this, they are training their systems with videos from old security cameras. To discover abnormal behavior. However, many of these videos are of very low quality, so they will also find some actors to shoot their own training video clips. Odom did not elaborate on the details, but said the system would look for specific facial expressions and behaviors. "Our actors will do things like crouching, shoving and turning back," he said.

For experts in surveillance and artificial intelligence, the introduction of these features is full of potential difficulties in technology and ethics. And, as is often the case with artificial intelligence, these two categories of difficulties are intertwined. Machines cannot understand the world like humans. This is a technical issue, but when we assume that they can do it and let them make decisions for us, it becomes a moral issue.

Alex Hauptmann, a professor at Carnegie Mellon University who specializes in this kind of computer analysis, said that although artificial intelligence has made great progress in this area in recent years, there is still a very fundamental way for computers to understand video The problem. The biggest one is the problem of the camera, which we no longer often think of: resolution.

The biggest obstacle is very common: low-resolution video

For example, a neural network is trained to analyze human behavior in videos. These work are done by subdividing the human body into multiple parts-arms, legs, shoulders, head, etc.-and then watching these small parts change from frame to frame in the video. In this way, artificial intelligence can tell you if someone is running or combing their hair. Hauptmann told The Verge: "But it depends on the resolution of the video you have." Hauptmann said: "If I use a camera to point to the end of the parking lot, if I can tell if someone has opened the door, Even if you are very lucky. If you stand in front of (the camera) and play the guitar, it can track the movement of each of your fingers. "

For CCTV surveillance systems, this is a big problem. The camera often has a grainy feel, and the angle is often weird. Hauptmann cites an example of a convenience store camera, which is designed to monitor a cash register, but it also monitors street-facing windows. If there is a robbery outside and a part of the camera lens is blocked, the artificial intelligence may get stuck. "But as humans, we can imagine what is happening and piece them together. But computers can't do that," he said.

Similarly, although artificial intelligence is good at identifying related events in a video (for example, someone is brushing their teeth, watching a mobile phone, or playing football), it still cannot extract important causal relationships. Take neural networks for analyzing human behavior as an example. It may see the camera and say "this person is running", but it cannot tell you whether they are running because they are about to catch the bus, or because they have stolen someone's mobile phone.

These questions about accuracy should make us seriously consider the declarations of some artificial intelligence startups. We are still far from a point where computers can gain the same insights as humans by watching videos. (Researchers may tell you that this is too difficult to do, because it is basically a synonym for "solving" intellectual problems.) But things are moving very fast.

UpHauptmann said that tracking the vehicle using the license plate tracking function is "a practical problem that has been solved", as well as facial recognition in a controlled setting. (Using low-quality CCTV surveillance video for facial recognition is another matter entirely.) Recognition of items such as cars and clothing is also very reliable. It is also possible to automatically track a person between multiple cameras, but the premise is The conditions are correct. Hauptmann said: "The effect of tracking a person in a non-crowded scene may be very good, but in a crowded scene, forget it." He said that if the person is wearing inconspicuous clothing, do it This is particularly difficult.

Some artificial intelligence monitoring tasks have been solved; others require continued efforts

But even these very basic tools can produce very powerful results. In Moscow, for example, a similar infrastructure is being assembled, inserting facial recognition software into a centralized system with more than 100,000 high-resolution cameras covering more than 90% of apartment entrances in the city.

In this case, there may be a virtuous circle. As the software gets better, the system will collect more data to help the software become better. "I think all this will improve," Hauptmann said. "This is happening."

If these systems are already working, then we already have problems like algorithmic bias. This is not a hypothetical challenge. Research shows that machine learning systems absorb racism and sexism in the societies that write programs for them-from image recognition software that always places women in the kitchen to criminal justice systems that always say black people are more likely to commit crimes again, than All are. If we use old video clips to train artificial intelligence surveillance systems, such as capturing videos from CCTV video surveillance or cameras worn by police, then prejudice in society is likely to continue.

Meredith Whittaker, co-director of NYU's ethics-conscious "AI Now" institute, said the process is already in the process of law enforcement and will be extended to the private sector. Whittaker cites Axon (formerly known as Taser), which bought several artificial intelligence companies to help it integrate video analytics into its products. Whittaker said: "The data they get comes from the cameras worn by the police, which tells us a lot about who a single police officer will pay attention to, but does not give us a complete description." She said: "This is a real of