tech:

taffy

Carnegie Mellon Robot Uses Non-Visual Data To Identify Objects

Herb_CMU

A robot can struggle to discover objects in its surroundings when it relies on computer vision alone; but by taking advantage of all of the information available to it – an object’s location, size, shape and even whether it can be lifted – a robot can continually discover and refine its understanding of objects, say researchers at Carnegie Mellon University’s Robotics Institute.

The Lifelong Robotic Object Discovery (LROD) process developed by a research team enabled a two-armed, mobile robot to use color video, a Kinect depth camera and non-visual information to discover more than 100 objects in a home-like laboratory, including items such as computer monitors, plants and food items.

Normally, the CMU researchers build digital models and images of objects and load them into the memory of HERB – the Home-Exploring Robot Butler – so the robot can recognize objects that it needs to manipulate. Virtually all roboticists do something similar to help their robots recognize objects. With the team’s implementation of LROD, called HerbDisc, the robot now can discover these objects on its own.

The robot’s ability to discover objects on its own sometimes takes even the researchers by surprise, says Siddhartha Srinivasa, associate professor of robotics and head of the Personal Robotics Lab, where HERB is being developed. In one case, some students left the remains of lunch – a pineapple and a bag of bagels – in the lab when they went home for the evening. The next morning, they returned to find that HERB had built digital models of both the pineapple and the bag and had figured out how it could pick up each one.

Discovering and understanding objects in places filled with hundreds or thousands of things will be a crucial capability once robots begin working in the home and expanding their role in the workplace.  

Object recognition has long been a challenging area of inquiry for computer vision researchers. Recognizing objects based on vision alone quickly becomes an intractable computational problem in a cluttered environment, but humans don’t rely on sight alone to understand objects; babies will squeeze a rubber ducky, beat it against the tub, dunk it – even stick it in their mouth. Robots, too, have a lot of “domain knowledge” about their environment that they can use to discover objects. 

Depth measurements from HERB’s Kinect sensors proved to be particularly important, providing three-dimensional shape data that is highly discriminative for household items. Other domain knowledge available to HERB includes location – whether something is on a table, on the floor or in a cupboard. The robot can see whether a potential object moves on its own, or is moveable at all. It can note whether something is in a particular place at a particular time. And it can use its arms to see if it can lift the object – the ultimate test of its “objectness.”

“The first time HERB looks at the video, everything ‘lights up’ as a possible object,” Srinivasa said. But as the robot uses its domain knowledge, it becomes clearer what is and isn’t an object. The team found that adding domain knowledge to the video input almost tripled the number of objects HERB could discover and reduced computer processing time by a factor of 190.  

Though not yet implemented, HERB and other robots could use the Internet to create an even richer understanding of objects. Earlier work by Srinivasa showed that robots can use crowdsourcing via Amazon Mechanical Turk to help understand objects. Likewise, a robot might access image sites, such as RoboEarth, ImageNet or 3D Warehouse, to find the name of an object, or to get images of parts of the object it can’t see.

Bo Xiong, a student at Connecticut College, and Corina Gurau, a student at Jacobs University in Bremen, Germany, also contributed to this study.

HERB is a project of the Quality of Life Technology Center, a National Science Foundation engineering research center operated by Carnegie Mellon and the University of Pittsburgh. The center is focused on the development of intelligent systems that improve quality of life for everyone while enabling older adults and people with disabilities.

[Image courtesy: Carnegie Mellon University]

Just in

Apple sued in a landmark iPhone monopoly lawsuit — CNN

The US Justice Department and more than a dozen states filed a blockbuster antitrust lawsuit against Apple on Thursday, accusing the giant company of illegally monopolizing the smartphone market, writes Brian Fung, Hannah Rabinowitz and Evan Perez.

Google is bringing satellite messaging to Android 15 — The Verge

Google’s second developer preview for Android 15 has arrived, bringing long-awaited support for satellite connectivity alongside several improvements to contactless payments, multi-language recognition, volume consistency, and interaction with PDFs via apps, writes Jess Weatherbed. 

Reddit CEO Steve Huffman is paid more than the heads of Meta, Pinterest, and Snap — combined — QZ

Reddit co-founder and CEO Steve Huffman has been blasted by Redditors and in media reports over his recently-revealed, super-sized pay package of $193 million in 2023, writes Laura Bratton. 

British AI pioneer Mustafa Suleyman joins Microsoft — BBC

Microsoft has announced British Artificial Intelligence pioneer Mustafa Suleyman will lead its newly-formed division, Microsoft AI, according to the BBC report. 

UnitedHealth Group has paid more than $2 billion to providers following cyberattack — CNBC

UnitedHealth Group said Monday that it’s paid out more than $2 billion to help health-care providers who have been affected by the cyberattack on subsidiary Change Healthcare, writes Ashley Capoot.