Blazing Disk IO with Java

Posted in: Big Data, Technical Track

It’s well known that mmap helps to improve performance in particular use-cases (especially if your working set fits an available memory). This doesn’t mean that the file itself has to fit memory. mmaps provides real benefits over the other IO if you are going to actively access a limited part of a big file. OS will eventually flush modified buffers to disk and you may need to perform manually “force” for the specific buffers.

How fast is it then? There are many blog posts about how to achieve maximal performance with non-documented functionality: using unsafe.copyMemory, and native FileChannelImpl.mmap0.

I’ve created the following simple test by combining everything together. I moved some preparations outside the measurement: prepared arrays with data to be written into the files. Reading is actually zero cost.  Technically, the data is already in-memory, so calculating sum has no difference with whether we would read bytes from any other memory segment.

The code prepares dummy 128Mb-arrays with random bytes, copy-paste them into file until the target size, read files calculating sum of bytes – all done in parallel to maximize performance.

I tested this code on my laptop (win10, java 1.8.0_121-b13, 32Gb RAM, 4 core CPU) and it gave me up to 9.8Gb/s with 12Gb file!

[email protected] MINGW64 ~/workspace/test-nio$ sbt "run 3 4 8 4"
[info] Loading global plugins from C:\Users\Valentin\.sbt\0.13\plugins
[info] Loading project definition from C:\Users\Valentin\workspace\test-nio\project
[info] Set current project to nioTest (in build file:/C:/Users/Valentin/workspace/test-nio/)
[info] Running testnio.MMapBigFile 3 4 8 4
[info] File size = 12Gb
[info] write time = 1.642380621  

Note: there is a known issue with using mmap0 on Windows platforms with file size > 4Gb. It may look like a mystery until you check sources.

    mapAddress = MapViewOfFile(
        mapping,             /* Handle of file mapping object */s
        mapAccess,           /* Read and write access */
        highOffset,          /* High word of offset */
        lowOffset,           /* Low word of offset */
        (DWORD)len);         /* Number of bytes to map */

As you can see len (8 bytes) is converted to DWORD (4 bytes), but it still works for files with size multiply of 4Gb because of MapViewOfFile specification, which says that 0 means “the mapping extends from the specified offset to the end of the file mapping”.

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Valentin is a specialist in Big Data and Cloud solutions. He has extensive expertise in Cloudera Hadoop Distribution, Google Cloud Platform and skilled in building scalable performance critical distributed systems and data visualization systems.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *