So after running my Metaballs demo on my girlfriends laptop it appears to be GPU limited due to fillrate. This is mainly due to the overkill number of particles in my test setup and the fact they hang round for so long, but the technique is fillrate heavy so it might be a problem.
It'd be nice to do a multithreaded CPU implementation to see how that compares, but the advantage of the current method is that it keeps the CPU free to do other things.
You could probably get more performance in some cases by uploading all the metaballs as shader parameters (or as a texture) and evaluating them directly in the pixel shader. Also I just realised I could probably render the 'density' texture at a lower resolution for a cheap speedup.