We are not using the RTT numbers for scoring purposes. It can only be negative in very rare occurances where we may miss some watermark on the client side which we use to measure the client side performance.
Can you attach the tar-ball by running the report.pyc command on the harness (detailed on using the report.pyc are in the user guide). I can take a look and see why the tests are failing?
I also see that there is some ballooning activity going on, so it seems that you are doing a lot of memory overcommitment?