■ Optimal buffer size for windows ReadFile API

Everybody knows that reading and writing data from/to file storage (e.g. in a hard disk) is much slower than accessing RAM memory. That's why higher level file management using C++ streams or the fprintf() C function use memory buffering behind the scenes so as to minimize disk access and improve I/O performance. What kind of criteria are used for these buffered functions, and how they adapt to disk cluster sizes is anybody's guess.

In windows programming and for maximum efficiency folks use the native file API like ReadFile, where there is no fixed buffer size. The question is, what's the best buffer size for reading a file sequentially from beginning to end? This calls for a controlled experiment. I used a JPG file of 1.13 MB size, and read it in a few times with various buffer sizes. Reading a 1MB file nowadays is done in no time so to obtain meaningful results I read the file many times repeatedly. The file was opened as such:

CreateFile("pic.jpg", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL);

Buffer (bytes)	NTFS (ms) 10 reps	NTFS (ms) 1000 reps	FAT (ms) 10 reps	FAT (ms) 1000 reps	NO_BUFFERING NTFS, 10 reps
16	5242		1778
32	2621		889
64	1310		453
128	640		218
256	327		125
512	172		62
1024	78		47
2048	47	1966	16	2262	1622
4096	31	1107	15	1514	936
8192	15	702	16	1170	499
16384	0	515	0	1060	421
32768	0	390	15	936	281
65536	16	343	16	858	234
131072	0	312	0	843	203
262144	0	281	16	780	218
524288	0	297	0	811	203
1048576	16	312	15	920	234

Table 1. Reading speed (ms) of a 1MB file depending on buffer size used

The timings in table 1 take into account the disk formatting; the first half of the table is for NTFS (4096 bytes cluster size) and the second half is for a FAT USB stick (8192 bytes cluster size). Obviously the hardware is different so there's not much point comparing NTFS with FAT here. It is clear that reading the picture with a small buffer, say 16 bytes at a time is very slow — despite Windows internal buffering. Increasing the buffer size from 16 to 2048 bytes results in a 100-fold increase in read speed!

As the buffer reaches the disk cluster size (4096 bytes and above) the reading speed is near instantaneous. Reading the file 10 times occurs in no time, so to get better results I read the file 1000 times (see the 3rd and 5th columns in table 1). We can see that there are some differences up to the 32KB buffer size, but from then onwards the speed remains the same (within the accuracy provided by GetTickCount).

What do we infer from these timing results? If you are reading a file sequentially, then the bigger the buffer size the merrier. Do not use FILE_FLAG_NO_BUFFERING flag which disables the internal windows buffering. As you can see from the last column, the performance is dreadful, 100 times slower than buffered I/O (it takes as long to read 10 times the file as 1000 reads in the buffered case).

Here's another twist: say you are scanning a file to find some text (keyword) in it. The keyword may happen to be at the beginning of the file or at the very end (or it may be absent!). What's the best reading strategy in this situation where we don't want to read the whole file? The maximum sensible chunk size in this case is 32KB. The reading speed is identical as if we were to read the whole file, but if we happen to hit the keyword at the first chunk, we saved ourselves tons of time reading the remaining bytes in the file. In fact, as disk access is orders of magnitude slower than memory access, I recommend reading only 8KB at a time; whatever we lose in disk access speed we gain by the chance of finding the keyword earlier in the file. This opportunistic strategy is used in xplorer².

And now you know why some file managers are better than others — and that's scientifically proven <g>