Godot uses a considerably different approach to rendering (and rendering abstraction) than other, popular, game engines. The motivation behind it was not to achieve the maximum performance in extreme use cases, but accomodate better to most user's needs.
This document was written in hopes to find more developers that would like to help us write rendering code, as it explains the overall design. Without it it's quite difficult to get into the internals. Rendering engineers are rare to find, so this can be a starting point. If you are not an existing or aspiring rendering engineer and you still want to help, please donate to our Patreon :) so we can eventually hire intern work for this.
Please also ask for anything that might still be unclear to be explained further if needed. As always, feedback is very welcome as this is an open project.
As mentioned in other articles, the whole rendering process of Godot happens in a server. The server completely abstracts the rendering internals and provides access handles to creating and manipulating objects via opaque IDs.
This design makes some things easy:
Of course it also has drawbacks:
In any case, the advantages outweight the drawbacks for our use case.
The default VisualServer implementation in Godot is VisualServerRaster, which is a Visual Server designed for rastering. This implementation handles spatial indexing of objects and, every frame, building a render list of objects drawn. It also handles the pairing of objects (lights to geometry, GI to geometry and lights, etc.).
Rastering is done via another opaque interface, Rasterizer, which, in Godot 3.0, was further optimized to take batches of objects instead of having to call one by one to render (which hurts performance).
Godot's rasterizer implementation is also considerably high level. We know it would probably be easier to just abstract a low level drawing API like OpenGL or Direct3D (or use open APIs, such as BGFX). However, this makes our work more complex in the long run.
The reason for this is that, when supporting multiple backends, different techniques may also need to be supported. For example:
glVertexAttribPointermust be used.
Given GLES2 isn't going to disappear any time soon due to low end mobile devices, we still need to support it for a few more years to come. In the future, this might be the same case with newer rendering technologies. Godot's high level abstraction of rastering allows us to focus on using the rendering techniques that best fit each technology.
Godot uses OpenGL ES 3.0 for high-end rendering (and OpenGL 3.3 on PC). This ensures great compatibility with all desktop PCs, mobile devices and WebGL 2. This may seem like we are restricting ourselves on purpose but, truth is, OpenGL ES 3.0 is already a very good and modern API. We can also access newer GL features via extensions, so we don't miss much.
We often are asked questions about why Vulkan or DirectX12 are not used instead. Truth is, the main advantage of using them is the ability to build render command lists using multiple CPU cores. This allows to render hundreds of thousands of objects at lower cost for the CPU. As great as this is, though, only very high-end and high-budget games will really make use of it.
The reasons may not seem that obvious, but:
At the end of the day, the use case where Vulkan and DirectX12 make the most sense is when you have hundreds of thousands of objects, which are all different (different geometry, textures, etc.), and which move. This is a difficult use case to achieve, even willingly.
Added to that fact, Vulkan still has years to go until it's properly supported in most desktop and mobile platforms, which makes it unattractive to implement for us (as it means considerably more effort to write, debug and maintain). As for DirectX12, it's only relevant for Windows/UWP, so there is no strong incentive for us to support it as a cross-platform engine.
I'm not trying to say that these APIs are useless, though. They allow very high-budget games to run better and, in the long run, make the job of writing drivers MUCH simpler to hardware manufacturers. They are simply not good enough for our use case.
Most game engines nowadays start with HLSL or CG as a base and offer it to the user directly. This may sound like the obvious way of doing things but, in truth, it creates a big bottleneck on allowing users to write shaders.
This is because:
The advantage may be that users can freely modify the full engine shaders, but at the end of the day this modification ability is limited to not breaking how the engine interacts with them. To make matters worse, it just excludes a huge amount of users from writing their own shaders due to the high complexity in this process.
In Godot, we took a different route: we wrote our own shader compiler that is mostly based on GLSL ES 3.0 (in only 4k lines of code!) and restricts the environment only to the engine internals. We also made the syntax for interacting with Godot (exposing parameters) really simple.
Added to that, several elements make shaders in Godot easy and a joy to write:
Shaders are, then, translated to native language (real GLSL) and fitted inside the engine's main shader.
This approach also has several more advantages. Given the high control we have over the shader compiler, we can:
And all the above happens transparently to the user! With Godot 3, we expect far more users will be writing shader code in proportion to other game engines given how easy it is.
For Godot 3, after long discussions, we decided that we won't use a deferred renderer. Forward rendering-based optimizations for high amount of lights are now mature and well tested in production. This also makes Godot depart from the way that popular game engines set to follow many years ago.
The most obvious advantages of leaving deferred rendering are:
Then there are also less obvious advantages, but which are high priority for us:
The main disadvantages are:
Temporal antialiasing (TAA) exists mostly as a solution to the deferred rendering problem and its inability to use multisample antialiasing (MSAA).
In Godot, MSAA is supported so this is not really a problem. Still, TAA has a few more advantages:
Unfortunately, it has drawbacks too:
Balancing pros and cons, it's probably not worth implementing it, but the door is open for it.
Godot uses the following framebuffer layout:
For MSAA rendering:
|Half-float||Diffuse + ambient red||Diffuse + ambient green||Diffuse + ambient blue||Ambient scale|
|Half-float||Specular red||Specular green||Specular blue||Metallic|
|8-bits||Normal X||Normal Y||Normal Z||Roughness|
Normally, only the first two buffers are rendered. The third is only used when Screen Space Reflections and SSAO are anbled. The fourth is used only for materials with subsurface scattering.
For effect processing:
Godot efficiently adds or removes steps according to how rendering is configured, so the list presented below should be interpreted as "worst case scenario". As you will probably notice, given effects are so harcoded into the rendering pipeline, they are extremely efficient. Godot with all the effects enabled runs nicely on low-end hardware.
The list of geometry to be rendered is evaluated once, and the opaque render list is filled once. As most geometry is just opaque, Godot optimizes by assigning the same material and shader for all objects that don't use alpha scissor or write to VERTEX.
All lights visible in a frame are configured and set up into UBO/TBO. Indices are assigned to each. In pure forward rendering, these indices are passed per object drawn. In clustered rendering, they are obtained via 3D texture + indices.
Godot uses a 64-bits integer to identify drawing order. This integer mixes render priority, flags, material index, geometry index, and many other variables. When sorted, it minimizes state changes in the render pass.
Opaque objects are rendered in a way where state changes (and calls to OpenGL) are minimized. In this pass, the MSAA buffers are used.
If a panorama sky is present, it's rendered in this step.
The depth and normal buffers are resolved and set for reading. Godot processes full-screen SSAO, which is later saved to an R8 buffer, then ping-pong blurred using separable convolution.
Finally, the diffuse + ambient buffer is resolved, and ambient occlusion is applied based on the "ambient scale" component, which stores the ratio between diffuse and ambient light. On objects that provide their own baked AO, this value is harcoded to 0, to avoid SSAO from interfering with them. This allows beautiful mixing between baked and real-time.
Up to this point, keep in mind that only diffuse + ambient light has been resolved.
A separable convolution is performed to blur the diffuse + ambient buffers, using the Subsurface Scattering buffer as reference of where this effect is applied. This effect looks correct in Godot, given it must only be applied to diffuse and ambient light.
Again, keep in mind that only diffuse + ambient light has been resolved here.
Screen space reflection is performed at half-resolution and stored at a half-resolution buffer. Godot uses a very correct raymatching approach for this, using Bresenham to draw lines and not process redundant pixels. For roughness, a blurred version of the source scene is used. This approach is not correct, but it's pretty fast.
Keep in mind that only diffuse lighting is used as source for SSR, otherwise specular blobs are wrongly reflected... which looks really bad.
Finally, the specular buffer is resolved and mixed with SSR depending on the metallic amount. This is not completely correct, as it can make some specular blobs lose strength, but in practice it looks fine.
Both specular and diffuse are mixed back, obtaining the final image for opaque materials. If Godot detects that any shader reads from the screen, it will generate mipmaps and put this composition in the effect buffer, while applying gaussian blur to each step. This allows to read from the screen blurred, and works great for materials with fake rough refraction.
Finally, everything is also copied back into the diffuse + ambient buffer to continue rendering.
Objects that were sent to the transparency render list are sorted by depth, from back to front. This rendering step happens on the diffuse + ambient MSAA buffer alone.
Finally, objects that use transparent materials are all rendered to the diffuse + ambient MSAA buffer. More state changes happen, so this is obviously a less efficient pass.
The final image is again resolved into the effects buffer.
A two-pass separable convolution takes place to blur objects beyond a specific depth, with interpolation during the transition area.
Again, a two-pass separable convolution takes place to blur objects closer to a specific depth, with interpolation during the transition area. The interesting part of this algorithm is that it is actually done in two passes, without any extra buffers... in a somewhat novel and unique way. The quality is not the best, but the difference is hardly noticeable and performance gains are big.
The image is converted to luminance and shrunk by sizes of 3x3 until it's a single pixel. For auto exposure, this value is used as interpolation target.
The image in the effect buffer is also gaussian blurred using the mipmap buffers. Godot allows to exchange up to 8 mipmaps to go from thick outlines to broad blurred bloom. Mipmaps are also shown using bicubic upscaling for better (less blocky) quality.
In a single, final compositing stage, the following happens:
And that's it!
We have seen the general rendering workflow used in Godot 3. The next step is to detail the 3D material shader steps.
First of all, user code runs, setting up all the parameters that will be used for shading.
As decals can change the material properties, they need to happen before anything else.
Sky contribution consists of two accumulators: ambient and specular light. Specular light is read directly from the sky, but ambient can be configured to range from a constant color to the blurred version of the sky.
Godot stores sky textures as a texture array, with each reflection in an array slice. This method is preferred over storing blurred versions in mip-maps (which is used anyway on mobile) simply due to increased performance and much less aliasing.
On a texture, a dual paraboloid is used. Clever readers might point out that an equirectangular panorama is better. The problem is that, for a texture array, the discontinuity at the poles screw up the derivatives in the
atan2() call. In any case, it does look good anyway.
In Godot each directional light means a new whole scene render pass, so use those with care. It's hard to need more than one anyway.
Godot uses regular cascaded shadow maps, with 4 cascades and optional blending between cascades. For performance, all cascades are stored in a single shadow texture.
All lights in Godot support contact shadow computation, to compensate potential bias problems. When using vertex lighting, only this shadow type is supported.
All lights also support transmittance (to use with subsurface scattering) if shadows are enabled, but this will force the object to perform reverse culling when rendering to a shadow.
Godot uses voxel cone tracing (VCT) for global illumination (GI). It uses it in a way similar to reflection probes, so they must be configured over specific areas to work. As less resolution probes are used than in the original NVidia paper (to compensate and gain considerably more performance), this also compensates back in quality.
Probes can be set as interior or exterior. If set as exterior, they will blend with the sky for the remaining cone trace alpha. If not, the sky will be ignored.
Up to two GI probes can be blended for a specific object, but this actually works pretty nicely. It's possible to have a large probe for a general area in low resolution and then smaller probes for specific places (e.g. a house).
Up to any amount of reflection probes can be mixed per object. Godot uses a really nice automatic weighted blending algorithm for this. Like GI probes, reflection probes can be set as interior or exterior (to determine blending with sky), as well as have their own ambient lighting to better tweak artistic look.
When a reflection probe is used together with a GI probe, the reflection is overridden by the one from the reflection probe (but still weight-blended properly).
All reflections are read from a single atlas texture (though this will be moved to a texture array soon).
After those, omni and spot lights are processed. All shadows are read from a single atlas texture.
Finally, diffuse and specular lights are weighted by their metallic parameter in order to ensure energy conservation.
Godot uses an approximation of the BRDF function in order to save a precious texture slot.
Fog is applied here instead of in a post-process. That's another nice advantage of not using deferred.
Godot uses atlases for most data read in the material shader in order to save on texture slots.
Godot uses a dynamic shadow (omni and spot) atlas per viewport. It allocates and reallocates shadows in the atlas depending on the view distance. This atlas is persistent, meaning that if nothing forces the shadow to redraw (and assuming the shadow did not become too close or too far away), it is kept as is. This allows having a very large amount of shadows on-screen.
Shadows are stored as dual paraboloid (DP) for omni and normally for spot. Even though dual paraboloid is used, lights are previously rendered to a cubemap and then converted to avoid artifacts. This conversion is very quick, and has the huge advantage of optionally allowing DP to be rendered directly on some lights (e.g. smaller ones) for better performance.
The shadow atlas divides the shadow texture in 4 equal squares. Each square can be further subdivided to create more or less shadow maps. The default subdivision is:
Users can configure this subdivision to fit less generic types of games, but the default works pretty well. Shadows will normally bounce between subdivisions depending on their distance to the camera. The effect is pretty much unnoticeable.
Directional shadow uses a single texture that is not kept between frames (although this might change in the future). If more than one directional light is used per scene, less space is available for each light.
Godot uses a simple reflection atlas supporting 64 reflections of equal size. Reflections can be computed on load, or on every frame (slower). Reflections are stored as dual paraboloid.
Decal images and projectors are stored into a single atlas. If images are too big to fit the atlas they will be shrunk and appear with less quality.
For Godot, we aimed to have a really simple 3D renderer (all the above, shaders included, amount to only around 10k lines of code) that is powerful, yet very easy to maintain (we don't have millions of dollars to pour into development, so we are forced to understand our priorities well).
If you are a render engineer, feel free to take any ideas you have read from here for your own renderers (in fact, feel free to take any code, as Godot is MIT licensed).
Thanks for reading, and any feedback is very welcome!