The final version of the dissertation can be viewed here. I am happy that it is over but have enjoyed the whole process a lot. I feel I have improved the way in which I write about technical subjects which should be of use when I start working.
It has been a long while since my last journal entry. Mostly because I have been working hard on the dissertation. A couple of days ago I submitted my first full rough draft to my supervisor for some advice before I submit a final draft. At the moment I feel like it is in a good place. I have ensured to try and include as many diagrams and pictures where possible to help describe the development process and accurately present the final results. Currently, it is sitting at 16,000 words, which is 6,000 more than the guideline 7-10k words. I am not sure if this is a hard limit. There isn’t any mention of marks being deducted for going over the word limit, so hopefully, this doesn’t cause any issues. I will double check with my supervisor to ensure that this isn’t a problem.
I have also packaged and submitted the final version of my artefact. This included two apps to demonstrate the indirect lighting and AO in one and the depth of field in the other. It also included the full source with instructions to allow for someone to build the project if they want to. The code is not all that tidy. There are a lot of TODOs sprinkled around that could be removed. However, it is quite a large codebase and it would take a very long time to sweep some things away. I also want to continue work on the project which will be easier in its current state
I am pretty happy with how the project ended up. I think I produced something that looks good that is also functional. I think my main gripe like I have already stated is the state of the code. It is definitely not as tidy or efficient as it good be. But it is a big project so this was bound to happen. I think the depth of field effect definitely could do with some optimisations. As well as some additional time spent improving the quality of the blurring. There has been a lot of recent research regarding accurate Bokeh DoF effects that could be used to improve the appearance: Seen here.
I think overall I am happy with the way the project went. I was definitely ambitious with what I wanted to achieve and this meant I rushed some sections of the work. However, I think I managed to achieve all of the major goals I had set out in the beginning.
- I was able to get the deep G-Buffer generation working relatively quickly. It took a few iterations to get the reprojection working correctly but once complete it didn’t need to be touched again.
- Implemented both AO and indirect lighting. These seemed like pretty intimidating challenges as I had never dealt with either of them before. However, they actually ended up being a lot simpler than I had initially thought. This was mostly just down to doing the research and spending the time trying to understand how the algorithms work. I especially think trying to describe the effects in this journal helped my understanding as well.
- Failed to get ray traced reflections implemented. Actually, I didn’t even try. Mostly because I felt I was short on time which was better off spent working on something different as I already had plenty of data from the AO and indirect lighting. Spending the time working on the reflections wouldn’t have impacted the projects as much as just attempting to implement one of the alternative applications of deep G-Buffers.
- And I think the big one was actually getting the partial occlusion depth of field working. I knew that I at least needed to attempt something so I could write something about it in the discussion so the fact that it worked felt like a real accomplishment. Even though it isn’t great and to be honest I don’t feel like it has much use.
I feel like I have learned a lot during this process. I have improved my understanding of some screen-space rendering stuff and deep G-Buffers. But, more importantly, I have been able to implement something from scratch that I was interested in just by reading the available research. As well as adapting information that I have read to achieve something new. I feel like this will be important as I enter the working world as I feel like I am more confident in my ability to work independently.
This is likely the final journal entry as the submission for this module is only a few days away. I didn’t like the idea of spending time writing these journals every week (well, nearly every week). But in the end, I feel like spending the time putting my thoughts into written work helped me to achieve more than if I hadn’t spent the time doing it. So I am happy that I put the effort into writing detailed entries. I just need to find a way to save these journals for future reference.
So last week after finishing off the indirect lighting I decided the practical work was finished and that the rest of my time would be spent writing the dissertation. I stuck with this plan for the first couple of days, getting the first draft of the methodology and literature review finished. However, after that I found myself a little bit bored and a little bit stuck. I was finding it a little hard to write the other sections as it required an explanation of relevance and importance of my research. Although I’m sure it would be OK I felt a little cheap trying to justify the work already carried out in another paper. So I thought instead of spending time trying to reason why I hadn’t done my own thing I might as well just give it a go.
The paper mentions 4 different other applications where deep G-Buffers could be useful. Those areas were;
- Order-independent transparency
- Order-independent transparency (OIT) allows for a scene
of transparent objects to be rendered without any prior
- Order-independent transparency (OIT) allows for a scene
- Stereoscopic re-projection
- Stereo image reprojection is used to generate a
stereoscopic image from a 2D image using the available
- Stereo image reprojection is used to generate a
- Motion blur
- Motion blur approximates the slight blurring of objects during movements
- Depth of field.
- Depth of field simulates the focus of a lens in a camera.
Out of all of these, depth of field (DoF) was the one I was most familiar with as I created a DoF effect for a previous module in third year. It also creates the more appealing results visually which is always at the top of my criteria so it seemed like the perfect choice.
One DoF effect seen in the real world that is yet to be seen in games (as far as I can tell) is partial occlusion: Out of focus objects near the camera are semi-transparent resulting in partially visible background objects. This is due to wide aperture lenses on cameras where parts of the background object will hit the outer parts of the lens and thus contribute to the final image. As is shown in the below diagram. (taken from here)
I had a slight inkling as to how I could do this so thought it was worth a shot. Anyway, in the end, no matter if it works or not at least it gives me a little more to talk about in the dissertation.
There isn’t really much theory to be discussed since unlike the other techniques discussed DoF techniques generally use methods that produce the best looking results not the most physically plausible results. So, instead, we will get stuck into the implementation details.
The last time I implemented a DoF effect I used the methods displayed here in GPU Gems. The technique worked but is quite outdated compared to more modern techniques. So I instead opted to go for a technique I had recently read about, here. It turns out that this is another piece of work by Morgan McGuire so I owe him a lot for the amount of his work I have used in this project.
The technique works by separating the scene into two layers based on each pixels Circle of Confusion (CoC). The CoC is the projected circle that an out of focus cone of rays makes when it hits the camera. The diagram below hopefully does a better job of explaining how this looks. For us the CoC is a value that ranges from [-1, 1] that tells us how focused the point is and where it sits in the range of focus of the camera. -1 is the most blurry far point from the camera, 0 is in focus and 1 is the most blurry near point to the camera.
As you can probably guess the scene is split into a near region for CoC values between 1 and 0 and a far region for CoC values between 0 and -1. This allows for blurred edges on near field objects composited on top of crisp in-focus far field objects. This step gives us two textures that look something like this. The near field is on the left and the far field is on the right.
The next step of the process is blurring both layers. However, as we blur the near field the areas on the edges like the pillars will become slightly transparent. If we blur like this when we composite the layers back together we will get dark edges as there is no data in the near field to fill those transparent areas. So often you just include the near field values in with the far field texture. However, what we can do instead is include some data from the second layer of the deep G-Buffer to fill in the dark areas. This gives us the two potential far-field textures.
This means that when we blur the near layer it will be blended with the occluded object behind it. For example, in the top you can see that the back wall is filled in seamlessly where the pillars previously were.
The next blurring step is not all that interesting. We downsample to half resolution and perform a basic blur depending on CoC at the given pixel. The blurring algorithm needed quite a lot of tweaking to create the right effect (To be honest it’s still not perfect, it could definitely use some extra time). But eventually, after blurring has finished the two layers are composited back together to create the below image. On the left with standard far-field texture and on the right with deep G-Buffer filled far field texture.
Now looking at these two images it is impossible to tell the difference as the blur is only very slight however if we look at a more extreme example the difference becomes very noticeable.
Hopefully, you can see on the right the edges on the close dragon are slightly softer and there is a little bit more of the background visible in some areas (The red curtain is more visible through the jaw). Stupidly I picked two objects of the same colour so it is a little hard to make out where they cross over. But one very obvious spot is the tongue on the close dragon. It almost disappears totally revealing some of the dragon behind.
It works! Well, sort of anyway. There are still some little snags here or there that I believe could be worked out with some more work. However, the deadlines get closer and I feel I have done enough to at least prove that the effect is possible with the help of deep G-Buffers. The one key issue here is the importance of the minimum separation between layers. With other effects, the issues aren’t obvious as we are not looking directly at second layer information. However, with this effect the second layer data is clearly visible so if the minimum separation is too big and selects the wrong object there will be some clear discontinuities in the result. In the same way , minimum separation may catch on to the faces of internal faces for example the faces of the inside of a cup could blend with the outer face as it is blurred. This is an issue faced by other methods that have looked at partial occlusion so not really something to worry about to much but worth noting.
Anyway, this time I am certain that this is the last piece of practical work I will do. From now on I will be focusing purely on the written work. Now, at least, a little happier that I am not just copying someone else’s paper.
At the beginning of last week, I analysed a single frame from the demo application. This past week and a bit I have been using that information to implement the indirect lighting effect in my own application. I will now discuss the theory and implementation details of the algorithm before finishing with some discussion and analysis. And as always will finish with some screen shots and gifs.
The indirect lighting algorithm used in the deep G-Buffer paper is a modification of the paper “A Deferred Shading Pipeline for Real-Time Indirect Illumination” by Cyril Soler, Olivier Hoel and Frank Rochet. The full paper can be found here
In practice, the algorithm is very similar to the Scalable Ambient Obscurance (SAO) algorithm discussed in a previous post. It makes sense that the algorithms are similar as they are both estimating global lighting effects. The only different is that one is estimating the amount of light that won’t reach a point while the other estimates the amount of light that will reach a given point.
The above image is a visualisation of how the core of the algorithm works. We estimate the amount of outgoing radiance from point X based on incoming radiance from point Y along vector w. The equation to calculate this contribution is shown below.
Where outgoing radiance E at point X is the sum of the incoming radiance B at point Y multiplied by the clamped dot product between vector w and the normal at point X. It is a relatively intuitive algorithm, We just need to sample a number of given points around our current pixel, calculate their world-space position so we can calculate vector w which we then dot with the normal at our current pixel that we can sample from our G-Buffer. This is extended to two layers by calculating the result for both layers of the G-Buffer and then taking the value where w . Nx > 0 and w . Ny < 0. This is to ensure that incoming radiance from point Y could be directed towards point X. A note made in the paper is that the second check can be omitted to improve performance at the loss of accuracy. This can prove to be quite a substantial performance improvement as we don’t need to sample the normal for all Y reducing the total bandwidth requirements substantially.
In the previous posts, I broke down the work into a schedule that looked something like the one displayed below.
- Manual mipmap generation for colour and normals
- Scene lighting computed in HDR
- Generate radiosity from the lit scene.
- Shade scene using calculated radiosity (Excluding env map)
- Add env map to shading
- Bloom (Optional)
This actually turned out to be a pretty good task estimate based on what I could decipher from the frame analysis. I missed out the radiosity filtering but that was about it. I didn’t end up completing the work in the given order above so I will instead continue the explanation in the order the work was carried out. Starting with Pre-lighting.
We start with pre-lighting the scene as it is the first step in the process and one of the simplest to implement. This step calculates basic diffuse lighting for both layers of the G-Buffer which will be used as the outgoing radiance B in the radiosity calculation. We include the previous frames calculated radiosity to approximate multiple bounces of indirect lighting. This was a little complex as it requires temporal reprojection between frames to ensure a correct result when moving. Luckily I have gotten pretty familiar with reprojection methods as I have had to use them for both G-Buffer generation and temporal anti-aliasing. One interesting addition which I might touch on in another post is the inclusion of emissive lighting in this pass. Which allows for dynamic lighting contribution of emissive objects to the rest of the scene.
At the same calculating the diffuse lighting for both layers we pack the normal for each layer into a single buffer. This helps to reduce the required texture reads in the radiosity calculation as we can get both normals from a single texture. However, since we only have 4 channels we can only store the X and Y components of each normal. We then reconstruct the Z component in the shader. This comes with a reduction in accuracy. Although, when looking at the original and reconstructed normals I couldn’t really tell the difference so I doubt the implications are anything to worry about.
We use multi render target output to write to all values to three targets at the same time. This avoids having to set-up multiple passes and simplifies the pipeline a little bit which is helpful as more stages are added to the algorithm.
Custom MIP-MAP Generation
This works in the exact same way as was carried out for the SAO algorithm. Since we are sampling many points in a large radius the radiosity algorithm benefits in the same cache miss reduction as we discussed in the AO post. Since we had already done this for downsampling the camera-space depth buffer it was easy to apply the same process to the diffuse and normals rendered in the previous frame. The only additional work that was required for this was the inclusion of a RenderTargetBundleMipView class so that we could access a specific MIP map of all 3 targets and submit them together at the same time.
This completes all of the prep work required for the radiosity algorithm so we can now pass this downsampled data to the radiosity shader for computation.
The radiosity shader is actually quite simple. It shares a lot of the same functions that were used in the AO algorithm so not much new had to be added to get the final result. Once the sample point Y has been chosen we then just sample the data from the textures and pipe that data on to the equation. The only difference we see is that each sample includes a confidence weight that tells us how many correct samples were calculated for the given point. This is important as it tells us likely how accurate the calculated value is. Later this confidence value is used to mix the calculated radiosity with an ambient term taken from a static environment map.
The radiosity algorithm has a few parameters we can modify including the world-space sample radius and the total number of samples. These two values have the greatest impact on the quality of the final calculation. As you would image with any Monte-Carlo style calculation the more samples we can take the more accurate the final result will be. In the same way, a smaller sample radius will create a more accurate calculation as it creates a more dense area of samples. However, a smaller radius misses out on contribution from other parts of the scene and thus only creates local radiosity effects which we don’t always want. In practice there is no perfect radius, it really depends on the effect that is required.
As discussed in the theory section we can omit the second normal test which reduces the performance cost substantially allowing for greater sample numbers. This, however, does reduce the accuracy of the final result so you would need to decide if a less noisy result is preferred over accuracy. In most cases, you may decide to take this approach as you are more likely to notice the noise over the reduction in accuracy.
To help reduce some of the noise we put the raw radiosity through some filtering. This includes temporal accumulation with the previous result as well as the same bilateral blur we used to reduce the noise in the AO.
The temporal step works in the same way as the TSAA algorithm which requires that we add some jittering to the radiosity calculation. This is done by adding the current scene time to the calculated sampling angle, this results in a rotated sampling pattern in screen space effectively including samples that were missed out from the previous frame. We then average these new values with the previous frame, gradually accumulating more and more samples to effectively increasing the perceptual sample count over multiple frames. This works great to reduce the noise in the final result but requires more temporal reprojection to ensure we are accumulating the correct samples when the camera is moving. Just as before since I have had to do this a few times this was relatively easy to add.
Now that we have computed and filtered the indirect lighting we can apply it in the final shading pass. As discussed previously we use the confidence value that was calculated with the radiosity to mix the result with a static environment map. At this point, I didn’t have an environment map available so I instead just mixed it with a constant ambient term which worked in a similar fashion. Even at this stage, I found that I was able to produce some nice screen shots. for maybe a couple of days work.
The result looks a little flat as there is no shadowing and only the constant environment term however, I think the indirect lighting adds a nice softness to the scene, which you can see on the underside of the arches.
High dynamic range is important to add contrast to the scene so that we can store the high-intensity direct lighting with the ambient environment and indirect lighting. This will help to make some of the colour bleeding stand out more. As in the above image, there is not much colour bleeding from the curtains onto other surfaces as we saw in the frame we looked at in the previous post.
The addition of HDR was relatively simple. I moved most of the buffers to floating point formats and included a final tonemapping shader before presentation as well as including an intensity on the lights so that we could produce direct lighting outside of the [0, 1] range. I ended up using a filmic tonemapping curve that I strangely found on Twitter here. I don’t know if it is any good but includes exposure control and seemed to get a lot of likes from other developers so I just went with it.
One of the most important parts that is not a core step in the algorithm is shadows as this will create the contrast between the direct and indirect lighting as we know shadowed areas are only going to be able to be lit by ambient and indirect lighting. I have pretty much already limited the project to the use of direct lights so this limits the shadow maps to using orthographic projection only which simplifies some potential problems. The lights in the engine are components in the scene just as any mesh or camera would be, so they already come with their own transform that we can simply apply to a camera to render our shadow map.
Since we are using orthographic shadow maps we can also compute the ideal projection matrix to reduce the amount of wasted space in the final shadow map. We do this by taking the bounding box computed for the scene and do some dot products with the cameras Forward, Up and Right vectors. The resulting values tell us where along the light’s direction we want to place the camera and where we should place the far clipping plane. It also tells us what the orthographic height and width of the scene are from the light’s perspective.
Although this doesn’t end up producing fantastic shadows it at least ensures that we make as much use of the data that we have available. Finally, in the shader we apply a brute force PCF filter to slightly blur shadow edges. This again isn’t perfect but there isn’t really the time to investigate more complex methods. But for most cases, this technique produces acceptable shadows.
The effect that both HDR and shadows have on the resulting image can be seen in the below screenshot.
Hopefully, you can see the slight red colour bleeding into the shadow just to the right of the red curtain. as well as the nice soft lighting on the shader ball. in the middle.
Environment Map Lighting
One of the main issue with the above image is the darkness of the background. This is due to the fact that the further from the camera the lower the sample confidence falls. To combat this we want to mix the indirect result with a sample taken from an environment map. Environment maps or cube maps are just ways of storing the 360 degrees of lighting around a single point. You can see examples of these cubemaps from the previous post where we could see the maps that the demo app was using. I ended up using a couple of tools to filter and finally save cube maps into a dds file. Namely, I used both cmft and AMD’s CubeMapGen. These were ideal since they allowed for saving to dds making it easy to load them into the engine using the DirectX Tool Kit which creates our shader resource views for us. Super simple.
The lighting is relatively simple. Morgan McGuire (One of the authors of the deep G-Buffer paper) explains a technique where we can produce reasonable accurate diffuse and specular environment lighting using the standard MIP maps generated for the cube texture. For ambient diffuse lighting, we sample from the lowest MIP level using the world-space normal. The lowest MIP level gives us a single pixel in which effectively stores the average colour for that direction which is something like what we would expect from proper environmental diffuse lighting. For specular environment lighting we reflect the view vector about the normal to give us a ray that is in the direction of the light that would be reflected into out eye. We select a MIP level based on the roughness value at the current point. McGuire uses the specular exponent for calculating the MIP level since we are using roughness I instead just opted for a linear progression between MIP levels using alpha (roughness squared) that we used for the specular lighting calculation. This gives us the most detailed MIP level at roughness 0 and the lowest MIP level at roughness 1.
With all of this added together, we are at a point where we can compare results between projects.
Below I have included the comparison used in the deep G-Buffer paper and one that I created using my own application.
As you can hopefully see the final results are very similar. There are some minor differences, I used a dragon (definitely cooler), I think I went a little OTT with the colour bleeding but this can be dialed in with a few parameters. I think they also used a roughness texture on the curtain as my curtains look a little more uniform that theirs. The light direction is a little different, I think they have the light pointed slightly towards the camera and finally I think the slightly different tonemapping produces slightly different colours. However, all of that aside I would say the core aspects of the algorithm are successfully reproduced. the soft lighting on the underside of the dragon, the colour bleeding at extreme angles and the general soft lighting that you don’t get when using just direct lights.
Overall I am VERY happy with the final result. The main reason I picked the project was because of the results that they showed in the original paper, without really reading about the deep G-Buffer. I didn’t think that I would be able to produce results comparable to what they had shown so I am happy that I have managed to produce something that looks even a little bit like what they had originally shown.
I can now see how I totally underestimated the amount of work that was required for this project. I would say that bar the screen-space reflections all of the project work is complete. There are still some final improvements I can make to improve the quality however, I will be leaving this for the showcase and will now be spending most of the time working on the dissertation. which does mean that I won’t implement the screen-space reflections. I just can’t seem to justify the additional work as I feel like I already have something that I have plenty to write about. Any additional work will likely be time wasted I could have spent writing the dissertation.
In the following posts I may include some additional details about the implementation, however, I will mostly focus on adding small progress results as I continue to add sections to the dissertation. So far I have most of the methodology I just need to selectively copy & paste some of this information over. Currently, with just the deep G-Buffer and AO it already hits the word count so there shouldn’t be too many issues finishing that off. I think I have quite a lot of images to include in the results, I will just need to take plenty of performance measurements to compare to those shown in the original paper. I feel like the discussion could be a little difficult to write. At the moment I am not really sure how I am meant to process the results and what my arguments are. I will need to further discuss this with my supervisor to see what they think.
I am amazed at how quickly the deadlines have been creeping up. As of this day, there are only ~4.5 weeks left before the dissertation deadline. I think this gives me plenty of time to write the dissertation. I am just worried about how much spare time I will have to discuss it with my supervisor to go over it and make improvements. Luckily I enjoy writing about the project so it should all go by quite fast.
We are now moving on to begin work on the screen-space radiosity effect. Now to help best understand how the effect works I am going to break down a single frame in the same way we did when looking at how AO worked.
This is the frame we are going to breakdown
This is a much more complex example than last time. It includes everything we covered last time plus lighting and radiosity. So we will start where we did last time and have another look at the G-Buffer
So as with last time the first step is generating the deep G-Buffer. The demo is using a 2-pass depth peeling method over the single pass prediction method. However the results are still the same so this doesn’t cause any issues.
The first 2 render targets are screen-space normals and the base colour. These look as follows for each layer.
So nothing different so far as you would expect.
Now, the last time the following two render targets appeared to be empty. However, this time they have been written to and we can see that we now have both a glossness/shininess factor and emissive.
The gloss target stores specular colour in the red green and blue channels and smoothness in the alpha. Below you can see these terms separated with alpha on the left and RGB on the right. Note: the levels were modified slightly for the RGB channels the actual values are close to 0.
In the below emissive texture I again had to modify the levels as the real values were close to zero. You can see that between the tops of the building it draws a sky cubemap into the emissive target.
The cubemap used is displayed below
This information isn’t required for the second layer as it isn’t properly shaded. Most of this is already available in my own G-Buffer so not much needs to be changed. All I am missing is the emissive skybox which I should be able to add quite easily.
We covered this in plenty of detail last time around so we will skip over exactly how the algorithm works.
It still performs the custom mip generation for the camera space depth and then generates and blurs the AO producing the following texture.
Now that all of the resources have been created the scene is lit with diffuse lighting from a single directional light. This data will later be used to calculate the radiosity. The previous radiosity is included to simulate multiple indirect bounces allowing for the previous frames values to affect the current frames values.
The normals. current depth buffer, previous depth buffer, diffuse colour, previous radiosity(I’ll come to this shortly), a shadow map (that appears to be empty) and the screen-space velocity are all passed into the shader.
Looking at the uniforms passed into the shader the MVP for the light is empty so we can assume that shadows are not currently enabled (It doesn’t look like they are) however we will eventually want them in our solution.
Looking at the shader. It just performs the direct lighting calculation and then applies the indirect lighting by sampling from the previous buffer using the screen-space velocities to reproject the correct sample. Later the specular and ambient lighting will be calculated using AO and gloss/spec. The HDR values are stored in an R11_G11_B10_Float buffer as to avoid using any additional space. This saves space as the alpha channel wasn’t required so its bits can be shared amongst the other channels.
In the following pass the second layer is shaded in the exact same fashion. However, it uses the same radiosity buffer used by the first layer.
The final shaded scenes for both layers appear as follows.
Everything looks saturated as everything is stored in High Dynamic Range(HDR) the values in this image range as high as 10 or more so when clamped to [0, 1] it looks a little blown out. This is sorted in a tonemapping pass before presentation.
In preparation for the radiosity calculation the scene data is downsampled in exactly the same way as was performed for the AO. This is also likely for the same reasons as it improves cache locality when using wide sample areas.
The downsampled data includes both shaded layers, both sets of normals and the camera-space depth again. The depth does not need to be downsampled again but I presume it was left in so that it doesn’t have a dependency on the AO being calculated previously.
One interesting point is that the normals for both layers are packed into a single RGBA8 texture. Here the z component is dropped so layer 1 XY are stored in RG while layer 2 XY are stored in BA. This reduces some memory costs at only the added cost of reconstructing the Z component (and minor loss of precision)
No images have been included as it doesn’t help to describe the process. Just imagine the pictures above but smaller. 🙂
Calculating Screen-Space Radiosity
The prepped data is now sent to a shader to calculate the indirect lighting. The data is written out to two RGBA16_Float buffers for storage. Here all four channels are required. RGB store the calculated bounced lighting and A stores “Ambient visibility” I think this means how confident the sample coverage is. i.e. How visible the sample is to the indirect lighting. Areas of low confidence will be filled with static environment map samples. We will see this value being used later as the indirect lighting is used to shade the scene. The first render target stores the calculated values for the first layer while the second target is supposed to store the values for the peeled layer. However, it seems as though this is disabled here as both outputs are identical and only show the first layer.
The generated texture looks as follows
You can see that the texture is very noisy for now. This will be fixed up in the next couple of passes. You can also see that the alpha approaches 1 at edges and depth discontinuties which is because sample count is low in these areas. Interestingly it looks a little bit like AO. Everything is very purple due to light bouncing off of the curtains and hitting into other points. Since most other areas are white this purple colour stands out.
In the next pass the currently calculated radiosity is merged with the previous frames. This works almost identically to temporal AA. The last frames value is reprojected and blended with the new values perceptually increasing the sample count over a few frames.
The accumulated frame can be seen below. It still looks noisy but this will be reduced later.
To reduce the noise in the resulting radiosity it is put through a bilateral filter in the exact same way we saw with the AO. This reduces noise while preserving edges. At the same time the guard band is removed avoiding the incorrectly computed values that naturally occur at screen edges.
You can see the final blurred indirect lighting below.
This is no different from the bilateral filter used for AO so will be easy to integrate into the current engine.
In a final pass at the end motion blur is applied. Since we are not moving nothing changes and since it isn’t directly related to the project we are going to skip it.
Now finally that the radiosity has been calculated the scene is shaded using the calculated data. Including Gloss\Spec, Emissive and AO.
As previously mentioned in the radiosity alpha channel is included a confidence value. We can see in this pass that an environment map is included that will be used to fill areas of low confidence.
I’m not sure if this is created on startup or was pre-generated and is loaded from disk. Either way we will want to find a way to generate some form of cubemap using the local scene to ensure filler values are accurate. This one is rendered from the perspective of the angel in the middle of the scene. This ensures lighting is accurate for it but would be wrong for objects in different locations. This is not too much of an issue as it is only a filler. The fact that the main values are purple helps to ensure some continuity between scree-space and cubemap radiosity.
The lighting is very similar to what we are using now except for the environment mapping. This should not be too hard to include. Morgan McGuire has written a paper about simple estimation of correct diffuse and glossy ambient lighting which is likely the same as he is using here.
The final shaded scene now looks almost identical to the capture at the top of this page. All data is still stored in HDR using the R11_G11_B10_Float buffers.
The blue border you can see is the guard band which is cropped out later.
Finally, tonemapping and a little bit of bloom are applied to the final image.
The bloom works by first tonemapping the image into the correct range before downsampling to half size along both axis, a total 4x shrinkage.
Finally with the bloom added back in we get the resulting image we saw at the start of the frame. Very nice!
The final result looks pretty cool. Especially when you compare it to static environment maps.
There is a lot of work to do. But it is not all that bad. As far as I can see these are the only tasks that need to be completed.
- Manual mipmap generation for colour and normals
- Scene lighting computed in HDR
- Generate radiosity from the lit scene.
- Shade scene using calculated radiosity (Excluding env map)
- Add env map to shading
- Bloom (Optional)
Quite a lot of work but alot of stuff is already in place. Mip map generation will run the same shader as used for AO. HDR lighting can be achieved by changing a few paramaters. And shading the scene with the calculated radisoity should just be an extension of what we are already currently doing.
The tricky bit will be getting the radiosity calculation correct. and properly accumulating it together. The environment mapping should also be easy. If worst comes to worst I can just nick the cubemap used for this demo.
Next entry I believe I should have most of the work done. I will give an explanation of how the indirect lighting works and hopefully demonstrate some screen shots.
A bit of a weird title but smoothing is a theme of the work I have been doing this week. As stated in last weeks entry I said at the end that I would improve on quality and performance of the AO as well as begin work on an antialiasing solution for the project.
And as promised I have managed to get both tasks completed so wanted to briefly discuss some changes and show off some results.So without further ado, I will begin with a look at the changes made to the AO pass.
The generated ambient occlusion I showed in the last post was accurate but very noisy. This is in part due to a low sample count (only 6 samples at the time) and that there was no smoothing applied. With any AO algorithm, there will always be some noise as to get complete convergence would require really high sample counts which we can’t afford for real-time applications. So, to combat this we apply a blur to the calculated output to smooth out the AO into undersampled areas. For this, we use a bilateral blur filter. A bilateral filter is one that smooths noise while preserving hard edges. This is important as although we want our AO to be smooth we also want it to be crisp. We don’t want occlusion leaking between foreground and background objects as this is physically incorrect and would cause temporal instability as different objects overlapped. Since this project is all about screen-space temporal stability we want to ensure that we can avoid this at all costs.
The bilateral filter works by comparing the depths of the sampled value in our kernel with the depth of the central pixel we are calculating the blur for. We then scale the contribution that the sample has depending on how similar the depths are. For major depth discontinuities close to edges the background pixels will have zero contribution to the final value avoiding leaking between foreground and background objects. The blur used is a seperable Gaussian blur with a width and height or 15 pixels. This is quite a wide kernel so produces really smooth noise. On top of this to improve total quality the number of samples has been pushed to 20 instead of 6. This takes the total evaluation time for the AO up to 1.3ms from 0.43ms (including the two blur passes the total time is roughly 1.6ms). These values are still with optimisations disables so they could improve marginally in a release build.
Below are captures of a frame showing the difference between each blur step.
I suggest opening the picture in a new window to see the full res image. Hopefully, you can see how the AO has been smoothed out substantially by the final image. Below is a close-up comparison of one of the leaves. Showing how the sharp edge has been preserved through the blur.
It looks a little blurrier that it should as the image has been scaled up. However, you should be able to see that the crisp edge of the leaf has been preserved by the end of the blur while the noise on its surface has been smoothed out.
You may also be saying to yourself “Damn those are some crisp edges!”. This is due to the Temporal Supersampled Antialiasing (TSAA). Which is the topic of next section,
Just as a quick little extra piece of info before moving on to the next section. I was finding that there were some ugly artefacts produced when blurring the AO on first implementation. I knew that this had something to do with lacking depth precision giving us slightly odd blur weights in some areas. Since the AO is stored in a four channel texture where Red is the AO term and Blue is our normalised depth we have only given 8 bits of precision to our depth, giving us only 256 discrete values. For large scenes, this is nowhere near enough and thus produces these weird artefacts. So after reading a bit. Specifically here in the algorithm overview. I found out that we could pack 16bit precision depth into two channels of an 8 bit per channel texture. This gives us another 65,000 potential values and we don’t use up any additional memory as those channels were there but never used. Below is a comparison of 8bit precision and 16bit precision respectively.
In example 1 there is a streak on the left pillar due to a large jump in depth value on a smooth surface giving us unwanted contrast.
In example 2 there is a similar streak.
In example 3 you can see that with greater precision we get greater edge preservation. Giving us overall crisper AO at edges.
These are great improvements for no more than 3 minutes work.
Because we are using a deferred renderer hardware Multisampled Antialiasing (MSAA) is not feasible due to the additional memory requirements. So instead we need to apply our own AA as a post process.
Temporal Supersampled Antialiasing works by applying sub-pixel offsets to the camera projection and then accumulating the current frame with the previous frame. This allows us to spread the cost over every frame so the additional overhead per frame is tiny. This makes it both simple to implement and easy to integrate into any pipeline. This simple average between frames works well for static scenes however when the camera begins to move we get what’s called “Ghosting”.
This happens because the position of the colour for the current frame is in a different position than the previous frame so we accumulate with the wrong colour. To overcome this we need to know where the current pixel was in the last frame. Well, luckily for us, we have already calculated the previous location when sampling the depth for generating the deep G-Buffer. So in our final accumulation stage, we just need to offset our sample location by the screen-space velocity of the current pixel that we calculated earlier. Making this even simpler of a task.
The results are also very impressive. Below is a comparison of the same scene with AA disabled and AA enabled. (It’ll be easier to see if you open the image in another tab)
Around the edge of the teapot is a lot smoother as well as the edges of the fabric in the background. Although aliasing is not always obvious AA helps to improve the smoothness of the overall image.
I didn’t want to go into too much detail about TSAA as it is not a core part of this research but is interesting to look at. The results are pretty impressive. However, as the camera projection jitters, it can make some parts of the image flicker as pixels change colour. A more complex version of the algorithm has been used in Unreal Engine 4 which can help to avoid this flickering as well as improve the general quality of the final image. At the time it is not all that important but if there is time spare towards the end of the project I hope to further look at adopting UE4’s technique to get the best possible quality.
One remaining issue with the AO that I would like to remedy is the banding that you can see on smooth surfaces. As can be seen on the pillars either side of the lions head.
This happens because we are using face normals for the AO pass rather than smoothinterpolated normals. I will switch to using interpolated normals and see how it performs. We were using face normals as it produces greater accuracy normals than can be stored in an RGBA8 texture however smooth normals would be preferable over accuracy.
I am now in a position where I feel ready to start work on the indirect illumination. This will be a lot of work but I am excited to get some pretty looking images. I am already quite impressed by how good the current pictures look so I am looking forward to see how far I can push it. This will also include finding even more assets to include other than the teapot and Sponza.
The indirect illumination work will start in a similar way to how I did AO. I will first capture a frame from the demo to see how it works in the demo. I then hope to get it working for single bounce illumination before moving on to multiple bounces. These topics will likely be the themes for the next three entries in this journal.
So as promised I will now go over how the Ambient Obscurance (AO) algorithm works. Its advantages compared to other algorithms, and how it is implemented in my engine. And as usual, I will follow up with some screenshots.
As I briefly mentioned in a previous post where I did a frame breakdown of the deep G-Buffer demo, the chosen algorithm is the Scalable Ambient Obscurance (SAO) algorithm. The researchers who wrote the deep G-Buffers also happened to write the SAO paper and some even worked on the paper SAO was based on (AlchemyAO). For additional details, you can find all papers at the following links: Deep G-Buffers, SAO, AlchemyAO.
NOTE: All following images and diagrams are taken from these papers or presentations unless otherwise specified.
It is best to define what we are talking about before getting into the details. So, what is AO? Well, Ambient Occlusion or Ambient Obscurance (I’ll shortly get into the difference) is a global illumination effect that approximates how exposed a point in the scene is from ambient or environmental lighting. In practice, it darkens cracks, corners and any surface that is in some way sheltered. For example less light will reach the area under your feet, creating a darkened area where you stand. This is often called a contact shadow and ensures that objects in the scene look grounded and not floating.
Here is an example from a presentation by John Hable about the AO used in Uncharted 2.
It is hard to place exactly where the guard is standing without the contact shadows under foot. It almost looks like he is floating.
So what is the difference between Ambient Occlusion and Ambient Obscurance? To be honest, I am not 100% sure that I understand the difference but I will attempt my best to explain the difference as I understand it.
Ambient Occlusion is the calculated occlusion over the hemisphere at a single point. Effectively giving us a value from 0 to 1 describing how difficult it is for ambient light to reach that point. A value of 1 means no ambient light will reach that point and a value of 0 means that the point is in no way occluded and all ambient light can reach that point from any direction within the local hemisphere. The image below taken from the Alchemy AO paper helps to explain this.
Here C is the point we are calculating occlusion for. n is the normal for the point and you can see the hemisphere we are testing that expands perpendicular to the surface normal. As this point is in a deep crevice the calculated occlusion will likely be quite high.
Ambient Obscurance is a modern form of ambient occlusion. The Alchemy paper gives the definition as “Ambient Obscurance: is a modern form of ambient occlusion, where the impact of the term falls off with distance so as to smoothly blend high-frequency occlusion discovered by the ambient obscurance algorithm with the occlusion discovered by a lower-frequency global illumination algorithm” I can’t totally decipher exactly what this means, but I believe it is saying that Ambient Obscurance is a more complex modification of Ambient Occlusion to allow for greater artistic control and easier integration with existing global illumination solvers.
Anyway, now that we have some of the definitions out of the way lets get onto the algorithms.
Since a lot of the work presented in the SAO paper relies on an understanding of the Alchemy AO algorithm I will briefly explain the aims and techniques discussed in the paper.
The technique was published by Vicarious Visions for one of their games in development at the time. They needed an AO algorithm that was able to produce robust occlusion and contact shadows for large and small scale occluders and one that was easily integrated with a deferred renderer. They found that most AO algorithms available at the time either failed for fine details as they were required to work at a lower resolution before upsampling. Or, that they failed to produce accurate contact shadows that stuck to the outline of the object. Below you can see a comparison of the Volumetric Obscurance algorithm and the Alchemy AO algorithm.
Comparison of volumetric obscurance (left) and alchemy AO (right). Notice the floating shadows on the at a. Whereas in at b they contour to the shape of the object.
The developers derived their own occlusion estimator to calculate the occlusion at a sample point. They included modifiable parameters to allow for artistic control of the final result. The final version of the equation is show below.
To calculate A, we take a sum of the calculated occlusion for a number of points Q within a world space radius of the point we are calculating for C. Here s is the number of samples and v is the vector from our point C to the randomly selected point Q. Taking the dot product with the normal at point C (n) will tell us where the point sits on the tangent plane perpendicular to n. For example if we imagine a point sitting on the same plane as C, vector v will be perpendicular to the normal and thus return a dot product of zero, meaning no occlusion. Intuitively we can understand this as a point on the same plane as C will not be able to occlude it. The diagram below helps to explain this further.
So now that we understand some of the mathematics at least on a basic level lets move on to discuss the implementation of the algorithm.
After the G-Buffer is generated the depth buffer and camera-space normals are passed to a full-screen shader. For each pixel the shader projects the world-space sample radius (aka the size of the hemisphere aligned with the tangent plane) into a screen space radius in pixels. Using a hash function of the pixel coordinates the shader selects a random rotation angle that is used to select random sampling points within the projected screen space disk. For each sampled point it extracts the camera-space Z from the depth buffer and runs the equation to calculate the occlusion. Finally the sum is divided by the total number of samples and the normalised occlusion value is returned. And that’s it really. In reality it is somewhat simple it just takes time to modify the paramaters to get the desired artistic result.
The algorithm scales with number of samples the sample radius and the screen resolution. They found that as the sample radius increased the number of texture cache misses increased, thus increasing the render time. This is one of the main draw backs of the algorithm which is where the need for our next algorithm SAO came about.
Scalable Ambient Obscurance
SAO builds upon the work produced for the Alchemy AO algorithm, leaving all the maths the same just making some tweaks to the way in which the algorithm is implemented. As discussed previously the main drawback of Alchemy AO was the limit on the sample radius, really only allowing for calculation of local occlusion effects. SAO presents a set of techniques that allow the algorithm to work for both forward and deferred renderers, as well as guarantee a fixed execution time independent of scene complexity and sample radius.
There are 3 core optimisations presented in the paper. First, they discuss methods to ensure accuracy of reconstructed depths is maintained throughout a frame. Second, they show that by pre-filtering the depth into a mip chain larger radii can be used with reduced cache misses by selecting the most efficient mip level. Finally they show that more accurate results can be generated when calculating face normals in the shader from the depth buffer. They show that the error in calculated normal is 0.2 degrees which is less inaccuracy than you get from reading normals from an RGBA8 texture.
As mentioned above these optimisations together manage to ensure a constant execution time independent of scene complexity or sample radius. As shown below.
Deep G-Buffer AO
SAO still suffers from common issues that occur with any screen-space technique. Specifically they are not temporally stable and generally miss out information for partially occluded surfaces. Below is an example taken from the deep G-Buffer paper. You can see where AO has been missed by a single-layer on the left that has been corrected by the 2-layer G-Buffer on the left.
As you can see partially occluded surfaces can be prone to “ghosting” where you find halos around foreground objects. The toaster is a good example as you see that the oven is white at the edges of the toaster as the depth information behind it is not available.
The modifications to the technique are relatively minor. When sampling our test point Q it samples from depth for both the first and second layer. It then calculates occlusion due to these two points and takes whichever contributes the most. If you can imagine the example above, when sampling for the edge of the oven the shader will end up sampling from the toaster in the foreground which doesn’t contribute to the occlusion. While in the second example it will have access to the position behind the toaster, likely close to the oven which will contribute to the occlusion.
Using multiple layers will affect the run-time costs but the benefits are quite impressive. Although it may appear as only some minor differences between the two images, what you don’t see is that the inaccurate areas will change as the cameras move, Creating temporal inconsistency as areas flash dark and bright as objects become occluded.
Integration with my framework
Finally now that we understand how it works it’s time to look at some examples taken from my own implementation. As well as some discussion on problem areas and where the project progresses from here.
Below are some examples of the difference between single and double layer AO.
From the above you can see that we are missing some AO around the edges of the pillar in the first picture and similarly from inside the bowl in the second picture.
The final result is very noisy and that is due to a low sample count. At the moment I am using 6, whereas the deep G-Buffer demo is using 50. It is also because I have yet to implement the final depth aware blur that diffuses some of the noise. I just wanted to pick something simple for now to display the results. I will follow up with some additional screen shots of the final version with higher samples and blurring. This example ran the whole AO pass in 0.43 ms, for just 6 samples that is quite slow. However, the shader is running in debug mode and relies on a few runtime parameters. Eventually I will run a test with optimisations enabled and reduce the reliance on runtime variables allowing the shader compiler to make further optimisations.
Just to show off a little. I have included a little gif of the fully shaded scene using the calculated AO to modulate an ambient term. You can see as the scene switches to AO enabled and disabled, the impact that it has on the visual quality.
Again, please excuse the noise. That will be fixed in the next set of changes.
I am happy with the final implementation, I feel it produces some appealing results. I am also happy with how I have managed to grasp the concept and hope that I have managed to give an understandable explanation of how the algorithm works. From here I will make some modifications to ensure performance and quality are up to standard. I will then move on to look at Temporal Supersampling Antialiasing (TSAA) before moving on to looking at implementing indirect lighting. In a similar way to I did for AO, I will breakdown a frame from the demo to get a better understanding before re-reading the papers and attempting an implementation. I have fallen behind where I really should be with the written work so I hope to make some progress there in the next couple of weeks. This has ended up as a longer post than I had intended (2057 words) so I will likely only post a small update for next week unless something interesting happens.
In the last post, I said that the next post would be a thorough explanation of the AO algorithm. I am afraid I have lied to you as this post is not about AO. I will, however, eventually follow up with more details on my previous post just not yet. For now, I would just like to quickly give an update on my progress over the last week and how the project is shaping up.
So Looking at the estimated project schedule I submitted as part of the proposal. As of the 24th of Feb, I should be finished the full implementation of AO, Indirect lighting and ray traced reflections. Since I haven’t even properly started the AO implementation it is quite easy to see that my initial estimates were a tad ambitious. I think when planning this schedule I totally disregarded the fact that I would have other modules to focus on. I also hadn’t really thought about finding a job, which has been where quite a lot of my time has been spent this semester. However, I feel like things are starting to fall into place. And I believe that I should be able to make rapid progress through these tasks if I apply myself properly.
I will now sum up the progress I have made over the last couple of days. Giving insight into how I think the remainder of the work will play out.
So far everything I have demonstrated has used the Utah teapot. Proper testing of my implementation will require a scene far more complex that a single teapot. So, I have managed to convert and import the Crytek Sponza test model. This is important as it is the scene used in the demo provided with the paper. This will allow direct comparisons between the two projects.
On top of getting the model imported, I have made some good progress with the deferred shading. Taking the same approach as I had achieved with the forward lighting. I have attached some images below to demonstrate these accomplishments.
Here you can see two different viewpoints. The first image is the scene shaded, while the two images after are the camera-space normals for the first and second layer respectively. The lighting isn’t all that great and I am still pretty sure there are some errors in the shading. However, I am pleased that everything is working and we are getting some interesting images. Seeing two-layer separation working for complex models is exciting. I am interested to see how the project progresses from here.
Everything is now in a good place to begin work on the implementation of Ambient Occlusion. I have all necessary data stored in the G-Buffer, ready to start post-processing. So as promised I will follow this up with some additional information about the AO algorithm.
Included with the deep G-Buffer research paper is a demo along with some source code. Working through the source code is a little tricky as it is quite a complicated framework. So instead I have decided to use RenderDoc graphics debugger to analyse a single frame of the application. I will list my findings here out of interest and for future reference. Understanding how the technique was originally implemented will help me in implementing it in my own framework.
The frame I am going to analyse is shown below.
It is just a simple display of the dual layer ambient occlusion. I will follow the frame through step by step to understand exactly how the image was produced. I have started with a simple example and will eventually move on to cover the whole process including AO, reflections and indirect lighting.
As is normal with many applications it first begins by clearing all render targets and depth buffers. In this case, there are 9 different render targets and 2 depth buffers that are all cleared in preparation for the frame. A lot of these targets are currently redundant. Potentially because we are just looking at AO on its own. Or it’s just the nature of the engine that it binds unused render targets.
Anyway, after clearing all resources it generates the G-Buffer. Here the application is using the 2 pass depth peeling method rather than the single pass reprojection method mentioned in the previous post. This is likely used as their is no prediction involved in generating the second layer thus avoiding potential artefacts. This doesn’t affect how we look at the rest of the frame as we would have the same information either way just generated using different methods.
For the first layer generation, there are 6 total render targets bound. Render target 0 contains the view space normals while render target 1 contains diffuse colour. All other render targets appear to be empty bar render target 5 which is a uniform brownish colour.
Only 2 render targets are bound for the second layer, in the same layout as before. 0 contains view space normals while 1 contains diffuse colour. This tells us that the uniform brown from the first generation is likely of little importance.
After looking at the shader bound to the pipeline it tells us that render target 5 contains screen space motion vectors. We can understand why this is brown by looking at the given colour (0.5, 0.5, 0.0) and deducing that it is this way because we are looking at a stationary image. Under movement, we would get varying values depending on how much the objects moved in the frame.The values for x and y are 0.5 because they have been modified to fit the [-1, 1] range into a normalised texture. On calculation, the values would have been 0 denoting no movement.
Now that the G-Buffer has been generated we can use it to calculate the AO.
Calculating Ambient Occlusion
What’s interesting is that pretty much everything I just explained is pointless as none of it is required for calculating ambient occlusion (Sometimes the normals are used in AO calculation). However, what was needed was the depth buffers for each of the layers which were automatically generated as part of the rasterization process. The depths of each of the layers are used to generate the camera-space Z coordinates for each pixel. In a simple pre-pass both depth buffers are taken as input and are used to reconstruct the camera-space Z coordinate for each layer which are then saved into both channels of an RG16F render target. (RG16F meaning a 16 bit per channel floating point format with only a red and green channel)
The depths for both the first and second layer are displayed below. The values have been scaled slightly to make the visualisation easier. Here darker values are closer to the camera.
Visualising the generated camera space Z buffer would look similar to this in principle so is not shown.
Now that all required information is available to the application the ambient occlusion can be calculated.
The application uses a modified version of the Scalable Ambient Obscurance algorithm explained here. Funnily enough published by the same authors. The algorithm is complex and thus explanation of how it works will be left for another post.
The generated AO can be seen below. Everything looks red since it is just stored in a single channel of a texture. The final black and white image is just a visualisation of this calculated AO term.
As you can see the generated image is quite noisy. To reduce the noisyness of the image it is put through a seperable Gaussian blur.
As you can see from this trio of images the noise is reduced quite heavily.
And finally this blurred AO is converted to greyscale to generate the final image. (Plus some UI)
The process is relatively simple. The most complex part is the AO generation. I do feel that by taking a look at how the application works I understand more clearly what I need to add to my own engine. In simple terms, the process can be summarised in the following steps.
- Generate deep G-Buffer
- Convert depth to camera-space Z
- Generate AO
After the final blur, the AO texture is ready for use. In the following post, I will go into a deeper explanation of how the AO generation works and what was modified to make it work with deep G-Buffers.