Tracing
Gregory Pakosz reminded me to write a follow up on my path tracing efforts since my last post on the subject.
It's good timing because the friendly work-place competition between Tom and me has been in full swing. The great thing about ray tracing is that there are many opportunities for optimisation at all levels of computation. This keeps you "hooked" by constantly offering decent speed increases for relatively little effort.
Tom had an existing BIH (bounding interval hierarchy) implementation that was doing a pretty good job, so I had some catching up to do. Previously I had a positive experience using a BVH (AABB tree) in a games context so decided to go that route.
Our benchmark scene was Crytek's Sponza with the camera positioned in the center of the model looking down the z-axis. This might not be the most representative case but was good enough for comparing primary ray speeds.
Here's a rough timeline of the performance progress (all timings were taken from my 2.6ghz i7 running 8 worker threads):
Optimisation | Rays/second |
---|---|
Baseline (median split) | 91246 |
Tweak compiler settings (/fp:fast /sse2 /Ot) | 137486 |
Non-recursive traversal | 145847 |
Traverse closest branch first | 146822 |
Surface area heuristic | 1.27589e+006 |
Surface area heuristic (exhaustive) | 1.9375e+006 |
Optimized ray-AABB | 2.14232e+006 |
VS2008 to VS2010 | 2.47746e+006 |
You can see the massive difference tree quality has on performance. What I found surprising though was the effect switching to VS2010 had, 15% faster is impressive for a single compiler revision.
I played around with a quantized BVH which reduced node size from 32 bytes to 11 but I couldn't get the decrease in cache traffic to outweigh the cost in decoding the nodes. If anyone has had success with this I'd be interested in the details.
Algorithmically it is a uni-directional path tracer with multiple importance sampling. Of course importance sampling doesn't make individual samples faster but allows you to take less total samples than you would have to otherwise.
So, time for some pictures:
Despite being the lowest poly models, Sponza (200k triangles) and the classroom (250k triangles) were by far the most difficult for the renderer; they both took 10+ hours and still have visible noise. In contrast the gold statuette (10 million triangles) took only 20 mins to converge!
This is mainly because the architectural models have a mixture of very large and very small polygons which creates deep trees with large nodes near the root. I think a kd-tree which splits or duplicates primitives might be more effective in this case.
A fun way to break your spatial hierarchy is simply to add a ground plane. Until I performed an exhaustive split search adding a large two triangle ground plane could slow down tracing by as much as 50%.
Of course these numbers are peanuts compared to what people are getting with GPU or SIMD packet tracers, Timo Aila reports speeds of 142 million rays/second on similar scenes using a GPU tracer in this paper.
Writing a path tracer has been a great education for me and I would encourage anyone interested in getting a better grasp on computer graphics to get a copy of PBRT and have a go at it. It's easy to get started and seeing the finished product is hugely rewarding.
Model credits:
Sponza - Crytek
Classroom - LuxRender
Thai Statuette, Dragon, Bunny, Lucy - Stanford scanning repository