This is my 2nd blog post on using spherical harmonics for depth based lighting effects in Unreal 4.
The first blog post focused on generating the spherical harmonics data in Houdini, this post focuses on the Unreal 4 side of things.
I’m going to avoid posting much code here, but I will try to provide enough information to be useful if you choose to do similar things.
SH data to base pass
The goal was to look up the depth of the object from each light in my scene, and see if I could do something neat with it.
In UE4 deferred rendering, that means that I need to pass my 16 coefficients from the material editor –> base pass pixel shader -> the lighting pass.
First up, I read the first two SH coefficients out of the red and green vertex colour channels, and the rest out of my UV sets (remembering that I kept the default UV set 0 for actual UVs):
Vertex colour complications
You notice a nice little hardcoded multiplier up there… This was one of the annoyances with using vertex colours: I needed to scale the value of the coefficients in Houdini to 0-1, because vertex colours are 0-1.
This is different to the normalization part I mentioned in the last blog post, which was scaling the depth values before encoding them in SH. Here, I’m scaling the actual computed coefficients. I only need to do this with the vertex colours, not the UV data, since UVs aren’t restricted to 0-1.
The 4.6 was just a value that worked, using my amazing scientific approach of “calculate SH values for half a dozen models of 1 000 – 10 000 vertices, find out how high and low the final sh values go, divide through by that number +0.1”. You’d be smarter to use actual math to find the maximum range for coefficients for normalized data sets, though… It’s probably something awesome like 0 –> 1.5 pi.
Material input pins
Anyway, those values just plug into the SH Depth Coeff pins, and we’re done!!
That was a lie.
Those pins don’t exist usually… And neither does this shading model:
So, that brings me to…
C++ / shader side note
To work out how to add a shading model, I searched the source code for a different shading model (hair I think), and copied and pasted just about everything, and then went through a process of elimination until things worked.
I took very much the same approach to the shader side of things.
This is why I’m a Tech Artist, and not a programmer… Well, one of many reasons 😉
Seriously though, being able to do this is one of the really nice things about having access to engine source code!
The programming side of this project was a bunch of very simple changes across a wide range of engine source files, so I’m not going to post much of it:
There is an awful lot of this code that really should be data instead. But Epic gave me an awesome engine and lets me mess around with source code, so I’m not going to complain too much 😛
Material pins (continued…)
So I added material inputs for the coefficients, plus some absorption parameters.
The SH Coeffs material pins are new ones, so I had to make a bunch of changes to material engine source files to make that happen.
Be careful when doing this: Consistent ordering of variables matters in many of these files. I found that out the easy way: Epic put comments in the code about it 🙂
Each of the SH coeffs material inputs is a vector with 4 components, so I need 4 of these to send my 16 coefficients through to the base pass.
Custom data (absorption)
The absorption pins you might have noticed from my material screenshot are passed as “custom data”.
Some of the existing lighting models (subsurface, etc) pass additional data to the base pass (and also through to lighting, but more on that later).
These “custom data” pins can be renamed for different shading models. So you can use these if you’d rather not go crazy adding new pins, and you’re happy with passing through just two extra float values.
Have a look at MaterialGraph.cpp, and GetCustomDataPinName if that sounds like a fun time 🙂
Base pass to lighting
At this point, I’d modified enough code that I could start reading and using my SH values in the base pass.
A good method for testing if the data was valid was using the camera vector to look up the SH depth values. I knew things were working when I got similar results to what I was seeing in Houdini when using the same approach:
That’s looking at “Base Color” in the buffer visualizations.
I don’t actually want to do anything with the SH data in the base pass, though, so the next step is to pass the SH data through to the lighting pass.
You can have a giant parameter party, and read all sorts of fun data in the base pass.
However, if you want to do per-light stuff, at some point you need to write all that data into a handful of full screen buffers that the lighting pass uses. By the time you get to lighting, you don’t have per object data, just those full screen buffers and your lights.
These gbuffers are lovingly named GBufferA, GBufferB, GBuffer… You get the picture.
You can visualize them in the editor by using the various buffer visualizers, or explicitly using the “vis” command, e.g: “vis gbuffera”:
There are some other buffers being used (velocity, etc), but these are the ones I care about for now.
I need to pass an extra 16 float values through to lighting, so surely I could just add 4 new gbuffers?
Apparently not, the limit for simultaneous render targets is 8 🙂
I started out by creating 2 new render targets, so that covers half of my SH values, but what to do with the other 8 values?
Attempt 1 – Packing it up
To get this working, there were things that I could sacrifice from the above existing buffers to store my own data.
For example, I rarely use Specular these days, aside from occasionally setting it to a constant, so I could use that for one of my SH values, and just hard code Specular to 1 in my lighting pass.
With this in mind, I overwrote all the things I didn’t think I cared about for stylized translucent meshes:
- Static lighting
- Distance field anything (I think)
Attempt 2 – Go wide!
This wasn’t really ideal. I wasn’t very happy about losing static lighting.
That was about when I realized that although I couldn’t add any more simultaneous render targets, I could change the format of them!
The standard g-buffers are 8 bits per channel, by default. By going 16 bit per channel, I could pack two SH values into each channel, and store all my SH data in my two new g-buffers without the need for overwriting other buffers!
Well, I actually went with PF_A32B32G32R32F, so 32 bits per channel because I’m greedy.
It’s probably worth passing out in horror at the cost of all this at this point: 2 * 128bit buffers is something like 250mb of data. I’m going to talk about this a little later 🙂
I created a few different procedural test assets in Houdini with low complexity as test cases, including one which I deleted all but one polygon as a final step, so that I could very accurately debug the SH values 🙂
On top of that, I had a hard coded matrix in the shaders that I could use to check, component by component, that I was getting what I expected when passing data from the base pass to lighting, with packing/unpacking, etc:
const static float4x4 shDebugValues =
0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8,
0.9, 1.0, 1.1, 1.2,
1.3, 1.4, 1.5, 1.6
It seems like an obvious and silly thing to point out, but it saved me some time 🙂
Here are some of my beautiful procedural test assets (one you might recognize from the video at the start of the post):
“PB-nah”, the lazy guide to not getting the most out of my data
Ok, SH data is going through to the lighting pass now!
This is where a really clever graphics programmer could use if for some physically accurate lighting work, proper translucency, etc.
To be honest, I was pleasantly surprised that anything was working at this stage, so I threw in a very un-pbr scattering, and called it a day! 🙂
float3 SubsurfaceSHDepth( FGBufferData GBuffer, float3 L, float3 V, half3 N )
float AbsorptionDistance = GBuffer.CustomData.x;
float AbsorptionPower = lerp(4.0f, 16.0f, GBuffer.CustomData.y);
float DepthFromPixelToLight = Get4BandSH(GBuffer.SHCoeffs, L);
float absorptionClampedDepth = saturate(1.0f / AbsorptionDistance * DepthFromPixelToLight);
float SSSWrap = 0.3f;
float frontFaceFalloff = pow(saturate(dot(-N, L) + SSSWrap), 2);
float Transmittance = pow(1 - absorptionClampedDepth, AbsorptionPower);
Transmittance *= frontFaceFalloff;
return Transmittance * GBuffer.BaseColor;
It’s non view dependent scattering, using the SH depth through the model towards the light, then dampened by the absorption distance.
The effect falls off by face angle away from the light, but I put a wrap factor on that because I like the way it looks.
For all the work I’ve put into this project, probably the least of it went into the actual lighting model, so I’m pretty likely to change that code quite a lot 🙂
What I like about this is that the scattering stays fairly consistent around the model from different angles:
So as horrible and inaccurate and not PBR as this is, it matches what I see in SSS renders in Modo a little better than what I get from standard UE4 SSS.
- I can’t rotate my translucent models at the moment 😛
- Shadows don’t really interact with my model properly
I can hopefully solve both of these things fairly easily (store data in tangent space, look at shadowing in other SSS models in UE4), I just need to find the time.
I could actually rotate the SH data, but apparently that’s hundreds of instructions 🙂
Cost and performance
- 8 uv channels
- 2 * 128 bit buffers
Not really ideal from a memory point of view.
The obvious optimization here is to drop down to 3 band spherical harmonics.
The quality probably wouldn’t suffer, and that’s 9 coefficients rather than 16, so I could pack them into one of my 128 bit gbuffers instead of two (with one spare coefficient left over that I’d have to figure out).
That would help kill some UV channels, too.
Also, using 32 bit per channel (so 16 bits per sh coeff) is probably overkill. I could swap over to using a uint 16 bits per channel buffer, and pack two coefficients per channel at 8 bits each coeff, and that would halve the memory usage again.
As for performance, presumably evaluating 3 band spherical harmonics would be cheaper than 4 band. Well, especially because then I could swap to using the optimized UE4 functions that already exist for 3 band sh 🙂
To get away from needing extra buffers and having a constant overhead, I probably should have tried out the new Forward+ renderer:
Since you have access to per object data, presumably passing around sh coefficients would also be less painful.
Rendering is not really my strong point, but my buddy Ben Millwood has been nagging me about Forward+ rendering for years (he’s writing his own renderer http://www.lived3d.com/).
There are other alternatives to deferred, or hybrid deferred approaches (like Doom 2016’s clustered forward, or Wolfgang Engels culled visibility buffers) that might have made this easier too.
I very much look forward to the impending not-entirely-deferred future 🙂
I learnt some things about Houdini and UE4, job done!
Not sure if I’ll keep working on this at all, but it might be fun to at least fix the bugs.