| 1 |
The most reliable way of running benchmarks is to do it in an otherwise idle |
| 2 |
system. On a busy system, the results will vary according to the other tasks |
| 3 |
demanding attention in the system. |
| 4 |
|
| 5 |
We have managed to obtain quite reliable results by doing the following on |
| 6 |
Linux (and you need root): |
| 7 |
|
| 8 |
- switching the scheduler to a Real-Time mode |
| 9 |
- setting the processor affinity to one single processor |
| 10 |
- disabling the other thread of the same core |
| 11 |
|
| 12 |
This should work rather well for CPU-intensive tasks. A task that is in Real- |
| 13 |
Time mode will simply not be preempted by the OS. But if you make OS syscalls, |
| 14 |
especially I/O ones, your task will be de-scheduled. Note that this includes |
| 15 |
page faults, so if you can, make sure your benchmark's warmup code paths touch |
| 16 |
most of the data. |
| 17 |
|
| 18 |
To do this you need a tool called schedtool (package schedtool), from |
| 19 |
http://freequaos.host.sk/schedtool/ |
| 20 |
|
| 21 |
From this point on, we are using CPU0 for all tasks: |
| 22 |
|
| 23 |
If you have a Hyperthreaded multi-core processor (Core-i5 and Core-i7), you |
| 24 |
have to disable the other thread of the same core as CPU0. To discover which |
| 25 |
one it is: |
| 26 |
|
| 27 |
$ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list |
| 28 |
|
| 29 |
This will print something like 0,4, meaning that CPUs 0 and 4 are sibling |
| 30 |
threads on the same core. So we'll turn CPU 4 off: |
| 31 |
|
| 32 |
(as root) |
| 33 |
# echo 0 > /sys/devices/system/cpu/cpu4/online |
| 34 |
|
| 35 |
To turn it back on, echo 1 into the same file. |
| 36 |
|
| 37 |
To run a task on CPU 0 exclusively, using FIFO RT priority 10, you run the |
| 38 |
following: |
| 39 |
|
| 40 |
(as root) |
| 41 |
# schedtool -F -p 10 -a 1 -e ./taskname |
| 42 |
|
| 43 |
For example: |
| 44 |
# schedtool -F -p 10 -a 1 -e ./tst_bench_qstring -tickcounter |
| 45 |
|
| 46 |
Warning: if your task livelocks or takes far too long to complete, your system |
| 47 |
may be unusable for a long time, especially if you don't have other cores to |
| 48 |
run stuff on. To prevent that, run it before schedtool and time it. |
| 49 |
|
| 50 |
You can also limit the CPU time that the task is allowed to take. Run in the |
| 51 |
same shell as you'll run schedtool: |
| 52 |
|
| 53 |
$ ulimit -s 300 |
| 54 |
To limit to 300 seconds (5 minutes) |
| 55 |
|
| 56 |
If your task runs away, it will get a SIGXCPU after consuming 5 minutes of CPU |
| 57 |
time (5 minutes running at 100%). |
| 58 |
|
| 59 |
If your app is multithreaded, you may want to give it more CPUs, like CPU0 and |
| 60 |
CPU1 with -a 3 (it's a bitmask). |
| 61 |
|
| 62 |
For best results, you should disable ALL other cores and threads of the same |
| 63 |
processor. The new Core-i7 have one processor with 4 cores, |
| 64 |
each core can run 2 threads; the older Mac Pros have two processors with 4 |
| 65 |
cores each. So on those Mac Pros, you'd disable cores 1, 2 and 3, while on the |
| 66 |
Core-i7, you'll need to disable all other CPUs. |
| 67 |
|
| 68 |
However, disabling just the sibling thread seems to produce very reliable |
| 69 |
results for me already, with variance often below 0.5% (even though there are |
| 70 |
some measurable spikes). |
| 71 |
|
| 72 |
Other things to try: |
| 73 |
|
| 74 |
Running the benchmark with highest priority, i.e. "sudo nice -19" |
| 75 |
usually produces stable results on some machines. If the benchmark also |
| 76 |
involves displaying something on the screen (on X11), running it with |
| 77 |
"-sync" is a must. Though, in that case the "real" cost is not correct, |
| 78 |
but it is useful to discover regressions. |
| 79 |
|
| 80 |
Also; not many people know about ionice (1) |
| 81 |
ionice - get/set program io scheduling class and priority |