On 3D Scanners and Giving Kinects a New Purpose In Life
The concept of a 3D scanner can seem rather simple in theory: simply point a camera at the physical object you wish to scan in, rotate around the object to capture all angles and stitch it together into a 3D model along with textures created from the same photos. This photogrammetry application is definitely viable, but also limited in the sense that you’re relying on inferring three-dimensional parameters from a set of 2D images and rely on suitable lighting.
To get more detailed depth information from a scene you’d need to perform direct measurements, which can be done physically or through e.g. time-of-flight (ToF) measurements. Since contact-free ways of measurements tend to be often preferred, ToF makes a lot of sense, but comes with the disadvantage of measuring of only a single spot at a time. When the target is actively moving, you can fall back on photogrammetry or use an approach called structured-light (SL) scanning.
SL is what consumer electronics like the Microsoft Kinect popularized, using the combination of a visible and near-infrared (NIR) camera to record a pattern projected onto the subject, which is similar to how e.g. face-based login systems like Apple’s Face ID work. Considering how often Kinects have been used for generic purpose 3D scanners, this raises many questions regarding today’s crop of consumer 3D scanners, such as whether they’re all just basically Kinect-clones.
The Successful Kinect Failure
Although Microsoft’s Kinect flopped as a gaming accessory despite an initially successful run for the 2010 version released alongside the XBox 360, it does provide us with a good look at what it looks like when trying to make real-time 3D scanning work for the consumer market. The choice of SL-based scanning with the original Kinect was the obvious choice, as it was a mature technology that was also capable of providing real-time tracking of where a player’s body parts are relative in space.
Hardware-wise, the Kinect features a color camera, an infrared laser projector and a monochrome camera capable of capturing the scene including the projected IR pattern. The simple process of adding a known visual element to a scene allows a subsequent algorithm to derive fairly precise shape information based on where the pattern can be seen and how it was distorted. As this can all be derived from a single image frame, with the color camera providing any color information, the limiting factor then becomes the processing speed of this visual data.PrimeSense diagram of their reference depth sensor platform. (Source: iFixit)
After the relatively successful original Kinect for the XBox 360, the XBox One saw the introduction of a refreshed Kinect, which kept the same rough layout and functioning, but used much upgraded hardware, including triple NIR laser projectors, as can be seen in the iFixit teardown of one of these units.The naked front of the XBox One Kinect, featuring the same RGB and NIR camera setup alongside an NIR projector. (Credit: iFixit)
In both cases much of the processing is performed in the control IC inside the Kinect, which in the case of the original Kinect was made by PrimeSense and for the XBox One version a Microsoft-branded chip presumably manufactured by ST Microelectronics.
The NIR pattern projected by the PrimeSense system consists of a static, pseudorandom dot pattern that is projected onto the scene and captured as part of the scene by the NIR-sensitive monochrome camera. Since the system knows the pattern that it projects and its divergence in space, it can use this as part of a stereo triangulation algorithm applied to both. The calculated changes to the expected pattern thus create a depth map which can subsequently be used for limb and finger tracking for use with video games.The ToF phase-measurement principle. (Credit: Sarbolandi et al., 2015)
Here it’s interesting to note that for the second generation of the Kinect, Microsoft switched from SL to ToF, with both approaches compared in this 2015 paper by Hamed Sarbolandi et al. as published in Computer Vision and Image Understanding.
Perhaps the biggest difference between the SL and ToF versions of the Kinect is that the former can suffer quite significantly from occlusion, with up to 20% of the projected pattern obscured versus up to 5% occlusion for ToF. The ToF version of the Kinect has much better low-light performance as well. Thus, as long as you can scan a scene quickly enough with the ToF sensor configuration, it should theoretically perform better.
Instead of the singular scanning beam as you might expect with the ToF approach, The 2013 Kinect for XBox One and subsequent Kinect hardware use Continuous Wave (CW) Intensity Modulation, which effectively blasts the scene with NIR light that’s both periodic and intensity modulated, thus illuminating the NIR CMOS sensor with the resulting effect from the scene pretty much continuously.
Both the SL and ToF approach used here suffer negatively when there’s significant ambient background light, which requires the use of bandpass filters. Similarly, semi-transparent and scattering media also pose a significant challenge for both approaches. Finally, there is motion blur, with the Kinect SL approach having the benefit of only requiring a single image, whereas the ToF version requires multiple captures and is thus more likely to suffer from motion blur if capturing at the same rate.
What the comparison by Sarbolandi et al. makes clear is that at least in the comparison between 2010-era consumer-level SL hardware and 2013-era ToF hardware there are wins and losses on both sides, making it hard to pick a favorite. Of note is that the monochrome NIR cameras in both Kinects are roughly the same resolution, with the ToF depth sensor even slightly lower at 512 x 424 versus the 640 x 480 of the original SL Kinect.
Kinect Modelling Afterlife
Over the years the proprietary Kinect hardware has been dissected to figure out how to use them for purposes other than making the playing of XBox video games use more energy than fondling a hand-held controller. A recent project by [Stoppi] (in German, see below English-language video) is a good example of one that uses an original Kinect with the official Microsoft SDK and drivers along with the Skanect software to create 3D models.
This approach is reminiscent of the photogrammetry method, but provides a depth map for each angle around the scene being scanned, which helps immensely when later turning separate snapshots into a coherent 3D model.
In this particular project a turning table is made using an Arduino board and a stepper motor, which allows for precise control over how much the object that is being scanned rotates between snapshots. This control feature is then combined with the scanning software – here Skanect – to create the 3D model along with textures created from the Kinect’s RGB camera.
youtube.com/embed/-ywSxDIv-pY?…
Here it should be noted that Skanect has recently been phased out, and was replaced with an Apple mobile app, but you can still find official download links from Structure for now. This is unfortunately a recurring problem with relying on commercial options, whether free or not, as Kinect hardware begins to age out of the market.
Fortunately we can fallback on libfreenect for the original SL Kinect and lifreenect2 for the ToF Kinect. These are userspace drivers that provide effectively full support for all features on these devices. Unfortunately, these projects haven’t seen significant activity over the past years, with the OpenKinect domain name lapsing as well, so before long we may have to resort to purchasing off-the-shelf hardware again, rather than hacking Kinects.
On which note, how different are those commercial consumer-oriented 3D scanners from Kinects, exactly?
Commercial Scanners
The Creality CR-Scan Ferret Pro 3D scanner, with iPhone in place. (Credit: Creality)
It should probably not come as a massive surprise that the 3D scanners that you can can purchase for average consumer-levels of money are highly reminiscent of the Kinect. If we ogle the approximately $350 Creality CR-Scan Ferret Pro, for example, we’d be excused for thinking at first glance that someone stuck a tiny Kinect on top of a stick.
When we look at user manual for this particular 3D scanner, however, we can see that it’s got one more lens than a Kinect. This is because it uses two NIR cameras for stereoscopic imaging, while keeping the same NIR projector and single RGB camera that we are used seeing on the Kinect. A similar 30 FPS capture rate is claimed as for the Kinect, with a 1080p resolution for the RGB camera and ‘up to 0.1 mm’ resolution within its working distance of 150 – 700 mm.
The fundamental technology has of course not changed from the Kinect days, so we’re likely looking at ToF-based depth sensors for these commercial offerings. Improvements will be found in the number of NIR cameras used to get more depth information, higher-resolution NIR and RGB sensors, along with improvements to the algorithms that derive the depth map. Exact details here of course scarce barring someone tearing one of these units down for a detailed analysis. Unlike the Kinect, modern-day 3D scanners are much more niche and less generalized. This makes them far less attractive to hack than cheap-ish devices which flooded the market alongside ubiquitous XBox consoles with all of Microsoft’s mass-production muscle behind it.
When looking at the demise of the Kinect in this way, it is somewhat sad to see that the most accessible and affordable 3D scanner option available to both scientists and hobbyists is rapidly becoming a lost memory, with currently available commercial options not quite hitting the same buttons – or price point – and open source options apparently falling back to the excitingly mediocre option of RGB photogrammetry.
Featured image: still from “Point Cloud Test6” by [Simon].