Introducing GStreamer VR Plug-ins and SPHVR

This article sums up some VR R&D work I have been doing lately at Collabora, so thanks for making this possible! 🙂

Previously on GStreamer

Three Years ago in 2013 I released an OpenGL fragment shader you could use with the GstGLShader element to view Side-By-Side stereoscopical video on the Oculus Rift DK1 in GStreamer. It used the headset only as a stereo viewer and didn’t provide any tracking, it was just a quick way to make any use of the DK1 with GStreamer at all. Side-by-side stereoscopic video was becoming very popular, due to “3D” movies and screens gaining popularity. With its 1.6 release GStreamer added support for stereoscopic video, I didn’t test Side-By-Side stereo with that though.

Why is planar stereoscopic video not 3D?

Stereoscopic video does not provide the full 3D information, since the perspective is always given for a certain view, or parallax. Mapping the stereo video onto a sphere does not solve this, but at least it stores color information independent of view angle, so it’s way more immersive and gets described as telepresence experience. A better solution for “real 3D” video would be of course capturing a point cloud with as many sensitive sensors as possible, filter it and construct mesh data out of it for rendering, but more on that later.

A brief history of mapping imagery on anything different than a plane

Nowadays mankind projects its imagery mostly onto planes, as seen on most LCD Screens, canvases and Polaroids. Although this seems to be a physical limitation, there are some ways to overcome it, in particular with Curved LCD screens, fancy projector setups rarely seen in art installations and of course most recently: Virtual Reality Head Mounted Displays.

Projecting our images on different shapes than planes in virtual space is not new at all though. Still panoramas have been very commonly projected onto cylinders, not only in modern photo viewer software, but also in monumental paintings like the Racławice Panorama, which is housed inside a cylinder shaped building.

(Source)

But to store information from each angle in 3D space we need a different geometric shape.

The sphere

Sperical projection is used very commonly for example in Google Street View and of course in VR video.

As we are in 3D, a regular angle wouldn’t be enough to describe all directions on a sphere, since 360° can only describe a full circle, a 2D shape. In fact we have 2 angles, θ and φ, also called inclination and azimuth.

(Source)

radius r, inclination θ, azimuth φ

You can calculate Cartesian coordinates from spherical coordinates like this

$x=r \, \sin\theta \, \cos\varphi$
$y=r \, \sin\theta \, \sin\varphi$
$z=r \, \cos\theta$

For describing the angle on a sphere we can use the solid angle Ω, calculated by integrating over two angles θ and φ.

$d\Omega = \sin\theta\,d\theta\,d\varphi$
$\Omega = \iint\limits_S \sin\theta\,d\theta\,d\varphi$

The unit for the solid angle is steradian, where the full sphere is 4π sr, hence the hemisphere 2π sr.

(Source)

This is why the term 360° video does not suite to describe spherical video, since there are other popular shapes for projecting video having φ = 360°, like cylinders. 180° video also uses usually a half cylinder. With video half spheres, or hemispheres, you could for example texture a skydome.

So did you implement this sphere?

In Gst3D, which is a small graphics library currently supporting OpenGL 3+ I am providing a sphere built with one triangle strip, which has an equirectangular mapping of UV coordinates, which you can see in yellow to green. You can switch to the wireframe rendering with Tab in the vrtestsrc element.

$gst-launch-1.0 uridecodebin uri=file:///home/bmonkey/Videos/elephants.mp4 ! \ glupload ! glcolorconvert ! videorate ! vrcompositor ! \ video/x-raw$$memory:GLMemory$$, width=1920, height=1080, framerate=75/1 ! \ hmdwarp ! gtkglsink In this example pipeline I use GtkGLSink which works fine, but only provides a refresh rate of 60Hz, which is not really optimal for VR. This restriction may reside inside Gtk or window management, still need to investigate it, since the restriction also appears using the Gst Overlay API with Gtk. Viewing equirectangular projected photos You can just do an image search of equirectangular and will get plenty of images to view. Using imagefreeze in front of vrcompositor makes this possible. Image support is not implemented in SPHVR yet, but you can just run this pipeline: $ gst-launch-1.0 uridecodebin uri=http://4.bp.blogspot.com/_4ZFfiaaptaQ/TNHjKAwjK6I/AAAAAAAAE30/IG2SO24XrDU/s1600/new.png ! \
imagefreeze ! glupload ! glcolorconvert ! vrcompositor ! \
video/x-raw$$memory:GLMemory$$, width=1920, height=1080, framerate=75/1 ! \
hmdwarp ! glimagesink

Multiple outputs?

In most VR applications a second output window is created to spectate the VR experience on the desktop. In SPHVR I use the tee element for creating 2 GL sinks and put them in 2 Gtk windows via the GStreamer Overlay api, since GtkGLSink still seems to have it’s problems with tee.

Sensors for spherical video

Spherical video sensors range from consumer devices like the Ricoh Theta for $300 to the professional Nokia Ozo for$60,000. But in general you can just use 2 wide angle cameras and stitch them together correctly. This functionality is mostly found in photography software like Hugin, but will need to find a place in GStreamer soon. GSoC anyone?

Why is sphere + stereo still meh?

The other difficulty besides stitching in creating spherical video is of course stereoscopy. The parallax being different for every pixel and eye makes it difficult to be losslessly transformed from the sensor into the intermediate format and to the viewer. Nokia’s Ozo records stereo with 8 stereo camera pairs in each direction, adjusted to a horizontal default eye separation assumption for the viewer. This means that rotating your head around the axis you are looking along (for example tilting the head to the right) will still produce a wrong parallax.

John Carmack stated in a tweet that his best prerendered stereo VR experience was with renderings from Octane, a renderer from OTOY, who also created the well known Brigate, a path-tracer with real time capabilities. You can find the stereoscopic cube maps on the internet.

So it is apparently possible to encode correct projection in a prerendered stereoscopic cube map, but I still assume that the stereo quality would be highly isotropic. Especially when translating the viewer position.

With stereoscopic spherical video also no real depth information is stored, but  we could encode our depth information projected spherically around the viewer if you like, so a spherical video + spherical depth texture constructed from whatever sensory, would be more immersive / correct than having 3D information as a plain stereo image. But this solution would lack the ability to move in the room.

I think we should use a better format for storing 3D video.

Room-scale VR Video

If you want to walk around the stuff in your video, maybe one could call this holographic, you need a point cloud with absolute world positions. This of course could be converted into a vertex mesh with algorithms like Marching Tetrahedra, compressed and sent over the network.

Real 3D sensors like laser scanners, or other time-of-flight cameras like the Kinect v2 are a good start. You can of course reconstruct 3D positions from a stereoscopic camera, and calculate a point cloud out of it, but this will also result in a point cloud.

Point clouds?

In a previous post I was describing how to stream point clouds over the network with ROS, which I also did in HoloChat. Porting this functionality to GStreamer was always something that teased me, so I implemented a source for libfreenect2. This work is still pretty unfinished, since I need to implement 16bit float buffers in gst-video to transmit the full depth information from libfreenect2. So the point cloud projection is currently wrong, the color buffer is not currently mapped onto it, also no mesh is currently constructed. The code could also get some performance improving attention as well, but here are my first results.

The freenect2src and pointcloudbuilder elements.

Show Kinectv2 infrared image

$gst-launch-1.0 freenect2src sourcetype=2 ! glimagesink or color image $ gst-launch-1.0 freenect2src sourcetype=1 ! glimagesink

View a point cloud from the Kinect

$gst-launch-1.0 freenect2src sourcetype=0 ! glupload ! glcolorconvert ! \ pointcloudbuilder ! video/x-raw$$memory:GLMemory$$, width=1920, height=1080 ! \ glimagesink Since the point cloud is also a Gst3D scene, it can be already viewed with a HMD, and since it’s part of GStreamer, it can be transmitted over the network for telepresence, but there is currently no example doing this yet. More to see in the future. Distribution You can find the source of gst-plugins-vr on Github. An Arch Linux package is available on the AUR. In the future I plan distribution via flatpak. Future In the future I plan to implement more projections, for example 180° / half cylinder stereo video and stereoscopic equirectangular spherical video. OSVR support, improving point cloud quality and point cloud to mesh construction via marching cubes or similar are also possible things to do. If you are interested to contribute, then feel free to clone. Got Mandelbrot? A Mandelbrot set test pattern for the GStreamer OpenGL Plugins I wanted to introduce you into the next station in my Mandelbrot world domination tour. Using C and GLSL this Mandelbrot set visualization can be rendered to a video file. $ gst-launch-1.0 gltestsrc pattern=13 ! video/x-raw, width=1920, height=1080 ! glimagesink

To run this command you need the patches for gst-plugins-bad I proposed upstream. After they get merged you can try this in your distribution’s GStreamer version.

This is the short patch required for the Mandelbrot set, made possible with the patch adding a generic shader pipeline for gltestsrc patterns.

Transforming Video on the GPU

OpenGL is very suitable for calculating transformations like rotation, scale and translation. Since the video will end up on one rectangular plane, the vertex shader only needs to transform 4 vertices (or 5 with GL_TRIANGLE_STRIP) and map the texture to it. This is a piece of cake for the GPU, since it was designed to do that with many many more vertices, so the performance bottleneck will be uploading the video frame into GPU memory and downloading it again.

The transformations

GStreamer already provides some separate plugins that are basically suitable for doing one of these transformations.

Translation

videomixer: The videomixer does translation of the video with the xpos and ypos properties.

frei0r-filter-scale0tilt: The frei0r plugin is very slow, but it has the advantage of doing scale and tilt (translate) in one plugin. This is why i used it in my 2011 GSoC. It also provides a “clip” propery for cropping the video.

Rotation

rotate: The rotate element is able to rotate the video, but it has to be applied after the other transformations, unless you want borders.

Scale

videoscale: The videoscale element is able to resize the video, but has to be applied after the translation. Additionally it resizes the whole canvas, so it’s also not perfect.

frei0r-filter-scale0tilt: This plugin is able to scale the video, and leave the cansas size as it is. It’s disadvantage is being very slow.

So we have some plugins that do transformation in GStreamer, but you can see that using them together is quite impossible and also slow. But how slow?

Let’s see how the performance of gltransformation compares to the GStreamer CPU transformation plugins.

Benchmark time

All the commands are measured with “time”. The tests were done on the nouveau driver, using MESA as OpenGL implementation. All GPUs should have simmilar results, since not really much is calculated on them. The bottleneck should be the upload.

Pure video generation

gst-launch-1.0 videotestsrc num-buffers=10000 ! fakesink

CPU 3.259s

gst-launch-1.0 gltestsrc num-buffers=10000 ! fakesink

OpenGL 1.168s

Cool the gltestsrc seem to run faster than the classical videotestsrc. But we are not uploading real video to the GPU! This is cheating! Don’t worry, we will do real world tests with files soon.

Rotating the test source

gst-launch-1.0 videotestsrc num-buffers=10000 ! rotate angle=1.1 ! fakesink

CPU 10.158s

gst-launch-1.0 gltestsrc num-buffers=10000 ! gltransformation zrotation=1.1 ! fakesink

OpenGL 4.856s

Oh cool, we’re as twice as fast in OpenGL. This is without uploading the video to the GPU though.

Rotating a video file

In this test we will rotate a HD video file with a duration of 45 seconds. I’m replacing only the sink with fakesink. Note that the CPU rotation needs  videoconverts.

gst-launch-1.0 filesrc location=/home/bmonkey/workspace/ges/data/hd/fluidsimulation.mp4 ! decodebin ! videoconvert ! rotate angle=1.1 ! videoconvert ! fakesink

CPU 17.121s

gst-launch-1.0 filesrc location=/home/bmonkey/workspace/ges/data/hd/fluidsimulation.mp4 ! decodebin ! gltransformation zrotation=1.1 ! fakesink

OpenGL 11.074s

Doing all 3 operations

Ok, now lets see how we perform in doing translation, scale and rotation. Note that the CPU pipeline does contain the problems described earlier.

gst-launch-1.0 videomixer sink_0::ypos=540 name=mix ! videoconvert ! fakesink filesrc location=/home/bmonkey/workspace/ges/data/hd/fluidsimulation.mp4 ! decodebin ! videoconvert ! rotate angle=1.1 ! videoscale ! video/x-raw, width=150 ! mix.

CPU 17.117s

gst-launch-1.0 filesrc location=/home/bmonkey/workspace/ges/data/hd/fluidsimulation.mp4 ! decodebin ! gltransformation zrotation=1.1 xtranslation=2.0 yscale=2.0 ! fakesink

OpenGL 9.465s

No surprise, it’s still faster and even correct.

frei0r-filter-scale0tilt

Let’s be unfair and benchmark the frei0r plugin. There is one advantage, that it can do translation and scale correctly, but rotation can only be applied at the end. So no rotation at different pivot points is possible.

gst-launch-1.0 filesrc location=/home/bmonkey/workspace/ges/data/hd/fluidsimulation.mp4 ! decodebin ! videoconvert ! rotate angle=1.1 ! frei0r-filter-scale0tilt scale-x=0.9 tilt-x=0.5 ! fakesink

CPU 35.227s

Damn, that is horribly slow.

The gltransformation plugin is up to 3 times faster than that!

Results

The gltransformation plugin does all 3 transformations together in a correct fashion and is fast in addition. Furthermore threedimensional transformations are possible, like rotating around the X axis or translating in Z. If you want, you can even use orthographic projection.

I also want to thank ystreet00 for helping me to get into the world of the GStreamer OpenGL plugins.

To run the test yourself, check out my patch for gst-plugins-bad:

https://bugzilla.gnome.org/show_bug.cgi?id=731722

Also don’t forget to use my python testing script:

https://github.com/lubosz/gst-gl-tests/blob/master/transformation.py

Graphene

gltransformation utilizes ebassi’s new graphene library, which implements linear algebra calculations needed for new world OpenGL without the fixed function pipeline.

Alternatives worth mentioning for C++ are QtMatrix4x4 and of course g-truc’s glm. These are not usable with GStreamer, and I was very happy that there was a GLib alternative.

After writing some tests and ebassi’s wonderful and quick help, Graphene was ready for usage with GStreamer!

Implementation in Pitivi

To make this transformation usable in Pitivi, we need some transformation interface. The last one I did was rendered in Cairo. Mathieu managed to get this rendered with the ClutterSink, but using GStreamer OpenGL plugins with the clutter sink is currently impossible. The solution will either be to extend the glvideosink to draw an interface over it or to make the clutter sink working with the OpenGL plugins. But I am rather not a fan of the clutter sink, since it introduced problems to Pitivi.