Blazing Disk IO with Java

Posted in: Big Data, Technical Track

It’s well known that mmap helps to improve performance in particular use-cases (especially if your working set fits an available memory). This doesn’t mean that the file itself has to fit memory. mmaps provides real benefits over the other IO if you are going to actively access a limited part of a big file. OS will eventually flush modified buffers to disk and you may need to perform manually “force” for the specific buffers.

How fast is it then? There are many blog posts about how to achieve maximal performance with non-documented functionality: using unsafe.copyMemory, and native FileChannelImpl.mmap0.

I’ve created the following simple test by combining everything together. I moved some preparations outside the measurement: prepared arrays with data to be written into the files. Reading is actually zero cost.  Technically, the data is already in-memory, so calculating sum has no difference with whether we would read bytes from any other memory segment.

The code prepares dummy 128Mb-arrays with random bytes, copy-paste them into file until the target size, read files calculating sum of bytes – all done in parallel to maximize performance.

I tested this code on my laptop (win10, java 1.8.0_121-b13, 32Gb RAM, 4 core CPU) and it gave me up to 9.8Gb/s with 12Gb file!

Valentin@7510 MINGW64 ~/workspace/test-nio$ sbt "run 3 4 8 4"
[info] Loading global plugins from C:\Users\Valentin\.sbt\0.13\plugins
[info] Loading project definition from C:\Users\Valentin\workspace\test-nio\project
[info] Set current project to nioTest (in build file:/C:/Users/Valentin/workspace/test-nio/)
[info] Running testnio.MMapBigFile 3 4 8 4
[info] File size = 12Gb
[info] write time = 1.642380621  

Note: there is a known issue with using mmap0 on Windows platforms with file size > 4Gb. It may look like a mystery until you check sources.

    mapAddress = MapViewOfFile(
        mapping,             /* Handle of file mapping object */s
        mapAccess,           /* Read and write access */
        highOffset,          /* High word of offset */
        lowOffset,           /* Low word of offset */
        (DWORD)len);         /* Number of bytes to map */

As you can see len (8 bytes) is converted to DWORD (4 bytes), but it still works for files with size multiply of 4Gb because of MapViewOfFile specification, which says that 0 means “the mapping extends from the specified offset to the end of the file mapping”.


Interested in working with Valentin? Schedule a tech call.

About the Author

Valentin is a specialist in big data solutions and Oracle RDBMS. He has extensive expertise in Cloudera Hadoop Distribution, as well as a deep insight into internals of various Hadoop services and metastore databases.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *