Branch free Clamp()

Posted: January 9th, 2009 | 3 Comments »

One of my work mates had some code with a lot of floating point clamps in it the other day so I wrote this little branch free version using the PS3's floating point select intrinsic:

float Clamp(float x, float lower, float upper)
	float t = __fsels(x-lower, x, lower);
	return __fsels(t-upper, upper, t);

__fsels basically does this:

float __fsels(float x, float a, float b)
	return (x >= 0.0f) ? a : b

I measured it to be 8% faster than a standard implementation, not a whole lot but quite fun to write. The SPUs have quite general selection functionality which is more useful, some stuff about it here:

(Not sure about this free WordPress code formatting, I may have to move it to my own host soon)


3 Comments on “Branch free Clamp()”

  1. 1 Jaymin K. said at 8:35 am on January 13th, 2009:

    on the PPU there is no real fsels instruction. Its just fsel with some casts to float done for you for convenience. This MAY ( and I say may because I haven't tested it out yet ) be faster if you make t a double and change your first __fsels to an __fsel. That way you can avoid one double to float cast and another float to double cast.

  2. 2 Jaymin K. said at 8:46 am on January 13th, 2009:

    I haven't looked at the generated assembly, but if you are calling this in a loop it may also help to do a bit of "pipelining,"
    i.e. do the conversions/casts for iteration n+1 before you do the clamp for iteration n.

    Anyway, its one thing to look into and you can tell really easily if it helps ( or makes it slower! )

  3. 3 mmack said at 10:34 pm on January 13th, 2009:

    Thanks, I should have realised that looking at the disassembly.

    I did some more timing today and am less convinced it's always faster than a standard implementation, although as you suggested converting some of the parameters to doubles did seem to help.

    I'll post some more detailed stats when I get a chance.

Leave a Reply