Cyrus Bamji had encountered a challenge. Luckily for him, Microsoft Research had just the solution.
Bamji, Microsoft partner hardware architect for Microsoft’s Silicon Valley-based Architecture and Silicon Management group, and members of his team were trying to incorporate a time-of-flight camera into Xbox One, the successor to the wildly popular Xbox 360.
A time-of-flight camera emits light signals and then measures how long it takes them to return. That needs to be accurate to 1/10,000,000,000 of a second—remember, we’re talking the speed of light here. With such measurements, the camera is able to differentiate light reflecting from objects in a room and the surrounding environment. That provides an accurate depth estimation that enables the shape of those objects to be computed.
Spotlight: Blog post
That speed-of-light capability would be a major advancement for the Kinect sensor portion of Xbox One, being released to 13 launch markets next month. The new Kinect, a key differentiator for Xbox One against its competition, needed to capture a larger field of view with greater accuracy and higher resolution. An infrared sensor will enable object identification requiring little to no light, and improved hand-pose recognition, giving gamers and more casual users the ability to control the console with their hands.
But Cyrus Bamji had a challenge. The sensor was great, but it also left those working on it eager to do even more with it.
“When we take a relatively new technology, such as time-of-flight, and put it into a commercial product, there are a whole bunch of things that happen,” he says. “There are things that we didn’t know how important they were until the product was made. For example, we know theoretically that motion blur in time of flight is a big problem, but just how important is only discoverable when you’re building a product with it and that product needs to deliver an excellent experience.”
Accurate depth measurement in diverse scenes with the new camera’s high resolution and a wider field of view also pose user-experience issues, making it difficult to keep small objects, such as a finger, from fading into the background, for instance. While those features delivered more versatile device performance, they also created issues of their own in real-life scenarios, such as the need for accurate depth measurement in diverse, high-resolution scenes. That, as well as improving the wider field of view and the motion blur, required clean data—quickly. Xbox One had to be ready for the 2013 holiday season.
“We knew our time was limited,” Bamji recalls. “But we also had the advantage of being able to tap into Microsoft Research’s deep reservoir of technical expertise to get expert advice and help solve the various problems we encountered with new, cutting-edge solutions.”
Eyal Krupka, principal applied researcher with Microsoft Research Advanced Technology Lab, was up to the challenge.
“I was in Redmond last summer, working on hand-pose-recognition research, also for Xbox One,” Krupka says, “and Mark Plagge, a principal program-manager lead on the Xbox One team, approached me about the ongoing work in solving some issues with the camera. They had made huge progress, but the progress had not come quickly enough, and there were not any clear solutions yet. He asked me to check to see if I could help.”
Travis Perry, a senior system architect lead with the Architecture and Silicon Management team, says that things took off from there.
“Eyal and I had many meetings discussing the various tradeoffs of the sensor and discussing the problem statements,” says Perry, who worked with Krupka on algorithm and parameter optimization. “Our team supported Eyal and his team with data and software for the existing depth calculations, and Eyal and I worked together to achieve better edge and motion-blur performance.”
That sort of engagement and teamwork was key to the project’s success.
“It wasn’t like consultant mode, where we asked something and the researchers gave us an opinion,” Bamji says. “They really took charge of the project. They did all the tests. They built a whole infrastructure of software to deliver us a complete solution. Essentially, they took charge. We’re really grateful for that.”
Krupka—with a few Microsoft Research colleagues making contributions of their own—worked well with their Xbox partners. Their combined domain knowledge meshed well.
“The reason Eyal and I were successful,” Perry says, “was because of his extensive knowledge of computer vision, signal processing, and machine learning, along with my knowledge of time-of-flight technology and the system tradeoffs, allowing us to make the right decisions in a short amount of time and keep to the tight schedule.”
Krupka also worked diligently to gain a deep understanding of how the system worked.
“Eyal was curious from the beginning to understand how the technology works, the underlying mechanism, and the various noise models that go into the system,” says Sunil Acharya, senior director of engineering for the Architecture and Silicon Management team. “His team was helping the software team with face- and hand-recognition algorithms when he found out about the time-of-flight challenges we were facing. He jumped in and worked with us very closely, and his team started working on solutions that mapped directly into the product timeline.”
For Bamji, it was an a-ha moment.
“We had researchers who understood that time was of the essence,” he says. “We could ask them about a problem, and they would get on it and essentially come up with solutions that were technically challenging, but not in a vacuum. And they delivered solutions in a timeframe that was something that could be of use to us.
“The success story is a rapid response and the solving of difficult problems.”
That is a concise summation of the value of Microsoft Research, a unique asset to Microsoft developers of devices and services. For Krupka, this is significant.
“The research aspects of what we delivered for Xbox One did not start on the day we start working with the product team,” he says. “It starts years before we learn about any specific project or problem. It is based on accumulating a wide range of research expertise through exploration on multiple research projects, accumulating engineering and research tools and practices—including rapid research methods.
“This is achieved by rotating cycles of working on long-term research problems, then switching to short-term research tasks. This is critical to the success. If we did only short-term, on-demand research, we couldn’t have the critical assets when we work on the product’s problems. If we worked only on long-term research, we would have had a harder time switching gears to deliver solutions on a product group’s timeline.”
The analog nature of the time-of-flight data posed challenges to delivering such a solution.
“The time-of-flight data coming out of our sensor is per pixel, per frame, and there is a lot more analog information,” Acharya says. “Another issue was that the foreground objects close to the background objects would melt into the background—again, due to the analog nature of how our sensor provides the depth data for pixels that land on edges.”
“This resulted in a lot of information, and to make it easier for foreground/background extraction and scene segmentation, use by software and game developers, the requirement was to clean up this data simultaneously by adding software algorithms in the pipe, yet without incurring a performance hit. This was crucial. We started with various work streams and, in the end, settled on making optimization to the parameters in the system to overcome the issue.”
The collaborators wanted to deliver a clear separation of foreground and background even if the objects are close to each other. That, too, proved difficult. And then there was motion blur.
“Motion blur,” Acharya explains, “is a parameter that needs to be minimized and is not technology-specific. The time-of-flight camera uses global shutter, which has helped reduce motion blur significantly—from 65 milliseconds in the original Kinect to fewer than 14 milliseconds now.”
Other challenges presented themselves. For one thing, processing time became an issue. In the academic literature about time-of-flight systems, processing time wasn’t an issue. In the laboratory environment, the technology worked fine. But Xbox One needs to process a whopping 6.5 million pixels per second. And only a small part of Xbox One’s computing power could be harnessed for this task. The lion’s share is reserved, understandably, for essentials such as gaming, skeleton tracking, face recognition, and audio.
“You need to do very, very light computation for each pixel,” Krupka says, “and this is one of the things that made the problem challenging and different from the typical approach in the academic literature in this field.”
And then there was that tight timeframe for delivery. Benchmark numbers in multiple domains had to be attained to make the Xbox One perform at the highest level.
“The fact that we needed to hit all the benchmark performance numbers in all these multiple, different domains was a challenge,” Bamji confirms. “Many of these things were heavy-duty theoretical stuff. That’s why we reached out to Microsoft Research and asked for their help.”
Remarkably, it all came together, and that means that while entertainment lovers worldwide will soon find themselves delighted by the Xbox One experience, so, too, will those eager to develop for the platform. Reducing that edge-data noise makes the data developer-ready, and being able to segment clearly between the foreground and the background solves a complex computational problem. The data is clean, and it can be absorbed more easily by game developers.
Another fascinating feature of the Kinect sensing device in Xbox One stems from its infrared sensor, which can identify objects in a completely darkened room. It can recognize people and track bodies even without any light visible to the naked eye. It can identify a hand pose from four meters away, see the fingers of a child, and remember your identity even minus room illumination.
The wider field of view makes it possible for more players to play an Xbox One game at the same time. With the new console, as many as six players can crowd into one scene. A tall adult can play with a small child without either being squeezed out of the picture. Users get a better experience if they’re standing close by, farther away, or on the periphery of the room.
And the improved hand-pose recognition enables users to interact with the Xbox One just by using their hands—no controller necessary. Thanks to the infrared camera, hand activities can be identified at any illumination, or with none at all. Prior hand-pose solutions were able to deliver speed or accuracy, but not both. The hand-pose solution jointly devised by the Xbox team and Microsoft Research can do both.
Based on his experience with both hand-pose and facial recognition, Krupka was able to find solutions that take into account system-level concerns.
“We ourselves were customers of the camera improvements, as part of our research for accurate hand pose,” he says. This helped us to determine the necessary tradeoffs. This also is a good example of Microsoft Research interdisciplinary work to deliver system-level optimization, not just optimization of separate parts.”
The aforementioned Xbox One enhancements resulted from an intensive collaborative effort by a number of individuals and teams. On the hardware side, Mirko Schmidt worked closely with Krupka and his team, and Arrigo Benedetti did likewise during investigations of the time-of-flight camera conducted with Philip A. Chou and Alex Acero of Microsoft Research Redmond, along with Shahram Izadi of Microsoft Research Cambridge. Krupka says he found those investigations seminal to his crash-course study of how the camera functions, enabling him to gain a quick understanding of which were the more promising directions—and which were not.
Two other contributors came from Perry’s team. Abdelrehim Ahmed worked with Simon Baker and Rick Szeliski of Microsoft Research Redmond on the camera’s filter, which has proved instrumental in detecting erroneous pixels caused by edges and motion. And Vishali Mogallapu supported all the above efforts.
Other Microsoft Research contributions to the Xbox One include Fang Wen and her team at Microsoft Research Asia, who contributed their state-of-the-art technologies for high-accuracy face recognition. Ivan Tashev of Microsoft Research Redmond brought his audio expertise, demonstrated during the development of the Xbox 360, to optimize the sound experience enabled by the Xbox One. Ranveer Chandra contributed a wireless-controller protocol. And, from Krupka’s team, Ido Leichter wrote valuable simulations of the camera and environment, and Daniel Freedman provided innovative solutions to improve the robustness of depth images.
“We found people eager to help us solve our problems,” says Bamji of his collaboration with Microsoft Research. “The researchers were very interested in making an impact by working on products that were going to market.”
Krupka found the collaboration challenging—and, ultimately, rewarding.
“For me,” he says, “it was the most intensive three months of work in my whole career. It seemed like I was working 24 hours a day, and the response from the Silicon Valley Xbox team was similar.
“We had a clear deadline, and we understood what was at stake. We understood that this was very critical for the project. We were aware of it from the beginning. Everyone was on board, and now, we’re excited to see how Xbox One customers respond later this fall. It was a great example of teamwork that, hopefully, translates into a ‘just works’ experience that delights Xbox One customers and developers.”