
RAMspeed -- a cache & memory benchmarking tool

for DOS environments

v2.3.3

May, 2006


This command-line utility measures effective bandwidth of both cache & memory
subsystems. It is written entirely in C for portability purposes, though
benchmark routines are also coded in assembly language for performance reasons.
It should run under all DOS and Windows systems (versions for uniprocessor and
multiprocessor UNIX-like operating systems are distributed separately).


GENERAL INFORMATION

The software consists of two major components:

1) INTmark and FLOATmark, they measure the maximum possible cache and memory
performance, when reading and writing certain blocks of data (starting from 1Kb
and further, in power of 2) continuously through ALU and FPU, respectively. All
data streams are linear, so the top performance will be achieved for sure. In
other words, these benchmarks allow to determine how fast your hardware
actually is, unless you're willing to believe in those figures adertised (real
bandwidth versus theoretical bandwidth). Feel the difference!

2) INTmem and FLOATmem, though they are synthetic simulations, but tied closely
with the real world of computing. Each consists of four subtests (Copy, Scale,
Add, Triad), and they measure certain aspects of memory performance. It must be
obvious that any CPU isn't bottomless, so if some data was loaded into its
registers for processing, most likely it will be written back after that, and
probably to another memory location than it was read from. Performance of such
simultaneous read/write operations may tell a lot about a particular chipset
(where controllers driving system and memory buses are located usually), though
a CPU still matters. And it happens often that machines, while showing quite
good linear memory performance, degrade tangibly this way.

A few words about these subtests, to make a clear picture of what's going on.

Copy is the simplest among them. It just transfers data from one memory
location to another, i.e. copies it (A = B).

Scale is a little more advanced, it modifies the data before writing by
multiplying it with a certain constant value, i.e. scales (A = m*B).

Add reads data from the first memory location, then reads from the second, adds
them together, and writes to the third (A = B + C).

Triad is a merge of Add and Scale. It reads data from the first memory
location, scales it, adds data from the second one, and writes to the third
(A = m*B + C).

There are also MMXmark with MMXmem, and SSEmark with SSEmem, serving the same
purpose as expained above but utilising MMX and SSE instructions. There is no
plan to add support for the 3DNow! instruction set, because it contains no
instructions for load/store operations (MOVQ and MOVD from the MMX command set
are used instead).

INTmark and INTmem transfer data in doublewords (32 bits) or quadwords (64
bits), depending on capabilities of a CPU they're running on. Of course, 64-bit
transfers should be chosen if supported. FLOATmark and FLOATmem require a FPU
(though some emulator might be good too, but that's a whole different story),
and always use quadwords. If a particular CPU has no FPU, and there is no
mathprocessor installed, an emulator is required to calculate the results at
least. MMXmark and MMXmem are supposed to be executed on CPUs supporting the
MMX technology. They use quadwords for load/store operations and packed words
(64 bits, too) for calculations. SSEmark and SSEmem require SSE support by both
a CPU and an operating system. They use octawords for load/store transfers and
packed doublewords (128 bits as well) for calculations.

For *mark benchmarks, Time-Stamp Counter (TSC) is used for the highest possible
performance resolution, unless not supported or broken. In this case, a
standard call gettimeofday() will be utilised.

There is also the BatchRun mode (*mem benchmarks only), known formerly as the
LongRun mode but renamed to avoid a possible confusion with the power-saving
technology of Transmeta. This mode designed for high-precision benchmarking and
hardware stressing. When in this mode, benchmarks are run a defined number of
times with average results calculated and displayed.


RUN-TIME OPTIONS

USAGE: ramspeed -b ID [-g size] [-m size] [-l runs]
-b  runs a specified benchmark (by an ID number):
    1 -- INTmark (writing)      4 -- FLOATmark (writing)
    2 -- INTmark (reading)      5 -- FLOATmark (reading)
    3 -- INTmem                 6 -- FLOATmem
-g  specifies a # of Gbytes per pass (default is 4)
-m  specifies a # of Mbytes per array (default is 16)
-l  enables the BatchRun mode (for *mem benchmarks only),
    and specifies a # of runs (suggested is 5)

If compiled with the assembly sources, the following ID numbers will appear:

    7 -- MMXmark (writing)     10 -- SSEmark (writing)
    8 -- MMXmark (reading)     11 -- SSEmark (reading)
    9 -- MMXmem                12 -- SSEmem

The -b option is required, others are recommended.

There are no built-in logging capabilities, but you may redirect output to a
file instead of the stdout:

./ramspeed [options] > yourcomp.log

Default values of memory array size and pass size do well for a wide range of
computer hardware, but you may need to decrease them if intending to torture
something pretty old, and vice versa, to increase in case of some modern power
monster.

Note that the *mark benchmarks require [by default] 4Mb of memory array space
like mentioned above, but the *mem ones demand two to three times more. The
same applies to pass size.


COMPILATION

The software is intended to be compiled by the GNU C Compiler for DOS (DJGPP)
and GAS only. Take a look into the Makefile, and edit if necessary.

It's recommended to use the assembly sources (launch "make asmcode" to build);
otherwise, just "make" will do the job with the C sources only.

The optimisation flags may be changed or new added, but not really needed. If
you use the assembly sources, they are of a very little to zero importance.

RAMspeed needs a 32-bit DPMI server. If you intend to run it under Windows, you
already have it. If you intend to run it under any DOS, I recommend you to get
friendly with CWSDPMI first (cwsdpmi.exe -p -s-).

If you try to run pre-compiled ramspeed.exe on a machine without a hardware
FPU, and without some third-party FPU emulating software loaded, it will fail
soon with SIGNOFP error, because FPU is needed to calculate the results.

If SSE benchmarks fail with SIGILL error, it probably means that your CPU has
SSE support, but your BIOS has failed to enable it, and your operating system
as well.

Reduce background activity before running, and better disable power management
(APM or ACPI).


RESULTS AND COMPARISONS

Indeed, results shown are real and may be compared with those obtained from
other benchmarking titles. There are many of them, and they measure performance
in different ways using different algorithms. The most notable among them are
open source STREAM by John D. McCalpin and commercial SiSoft Sandra by
Catalin-Adrian Silasi.

STREAM is a very good benchmark. I used it as a reference when wrote INTmem
and FLOATmem, and though I had rewritten almost everything, the idea remains
the same. Nevertheless, STREAM has been written in C only. It employs a low
pass size, displays only the highest results, operates through FPU only,
doesn't accept command-line parameters, and overall much less accurate.

Accordingly to the helpfile supplied, Sandra uses the same algorithms as
STREAM. I have not much to say about Sandra because I don't know how exactly
it works, and thus I'm not sure that results it shows are fully comparable,
because I suspect that approaches implemented in Sandra aren't completely
adequate to those implemented in RAMspeed. To be absolutely correct, I'm sure
they aren't.

Note that the software defines a megabyte as 2^20 bytes (1048576 bytes), while
other benchmarking titles (such as Sandra) may define it as a million bytes,
thus showing a little "better" performance. The decision to count in "real"
megabytes (and gigabytes as well) has been taken because I consider cache and
memory performance correlating much more to data rather than to frequency. In
other words, it's not of a key importance to know how many mega- and gigahertz
live inside a particular machine, but it matters very much how many mega- and
gigabytes per second that one can push in each direction.


ISSUES

Sometimes write performance of FLOATmark may be better than read. That's not a
bug but an issue specific to how i387-compatible FPUs work, i.e. data store
requires one instruction, when data load requires one instruction for actually
loading, and one instruction to flush a register.

On some CISC CPUs (Intel 386 to Pentium, AMD 386 to 5x86, Cyrix 486), write
performance of *mark benchmarks is strange very much: it's constant all the
way, with no respect to any cache levels and their write policies. Those
writes sometimes may be even faster than reads, what is amazing totally.

Not really an issue, but results shown may and will differ when received under
different operating systems, sometimes significantly.


WINDOWS-SPECIFIC NOTES

If you have ever used DOS, you may skip the following. On the other hand, if
you have no idea what command prompt is, and how to get into it, keep reading.

To launch the command prompt under Windows 9x/NT/ME/2000/XP/2003, click on the
"Start" button and follow to "Run". In the dialogue screen that appears type
"command" (all quotation marks here and after are for formatting purposes only
and optional, all names are case insensitive), and press Enter. You should see
a black screen with a blinking cursor -- that is, the command prompt. If you
have unpacked RAMspeed to disk D:, for example, then type "d:", press Enter,
type "cd ramspeed", press Enter. Here you are. You may view directory contents
with "dir" command, or view the documentation like "type readme.txt | more"
(what you're reading now), or just run RAMspeed like explained in OPTIONS. It's
recommended to switch to the full-screen mode with Alt-Enter; to return to the
graphical mode just type "exit"; to return without closing press Alt-Tab and
choose a program to jump into.


FINAL NOTES

If you like this program, you're encouraged to send me a message with your
results attached. I'm just curious to know where it runs, and how good.

The latest version can always be downloaded from http://www.alasir.com/ramspeed


Greetings from the world of free software!

RMH
