If it is always a different VM then you appear to be having some sort of networking or staf issue.
Questions:
Are you rebooting between runs or just running back to back?
If you cut the number of tiles in half does it suddenly start running?
Have you compared your STAF.cfg to what is listed in the benchmarking guide?
What version of STAF are you running, is it the same version across all the VMs and Clients?