Skip to main content

Single Node Benchmark

Testing Approach

Perform a single node benchmark on JuiceFS by fio.

Testing Tool

The following tests were performed by fio 2.2.10.

Fio has many parameters that can affect the test results. In the following tests, to provide a performance reference, we will follow the principle: try to use the default values of corresponding parameters, and do not optimize for a specific system, hardware, etc. The following parameters will apply:

ItemsDescription
--nameName of the job.
--directoryPrefix filenames with this directory. In this case, it will be the mounting point of JuiceFS, we will use /jfs here.
--rwType of I/O pattern, we will use read (sequential reads), write (sequential writes) here.
--bsThe block size in bytes used for I/O units.
--sizeThe total size of file I/O for each thread of this job.
--filesizeIndividual file sizes, will use on small files sequential read and write test.
--numjobsNumber of concurrent jobs, fio use multi-process by default.
--nrfilesNumber of files to use for this job.
--refill_buffersBy default, fio will reuse the testing files that generated at the beginning of each job. If this option is given, fio will refill the I/O buffers on every submit. That can ensure the generated test files have sufficient randomness.
--file_service_typeDefines how fio decides which file from a job to service next. This parameter can choose from random, roundrobin, sequential. This parameter is used in small file sequential read and write tests to ensure that fio is read and write files one after the other, without parallel file operations.

Since JuiceFS uses ZStandard for compression when writing data to object storage, the --refill_buffers parameter is used for each test task below to make the data generated by fio is as random and irregular as possible and achieve a low compression ratio to simulate the poor performance scenario as the real circumstance. In other words, in the production environment, JuiceFS\'s performance is mostly better than the performance of this test.

Testing Environment

Performance will vary based on differences in cloud service providers, object storage, virtual machine types, and operating systems.

In the following test results, the JuiceFS file system are created on the AWS us-west2 zone S3 (see the Getting Started for creation), all fio tests based on the c3.8xlarge EC2 instance (32 CPU, 60G RAM), Ubuntu 16.04 LTS (Kernel 4.4.0) system.

The reason for choosing a c3.8xlarge model is because of its 10Gbit network that can ensure the network bandwidth won’t be the bottleneck.

The following tests use the default configuration to mount JuiceFS without special instructions. (see the Getting Started for creation).

Large File Read and Write Test

In many scenarios, such as log collection, data backup, and big data analysis, large file sequential reading and writing is essential. This is also a typical scenario to use JuiceFS.

The JucieFS page size is the main factor affecting sequential read and write throughput. The larger the page size, the stronger the sequential read and write throughput, see the test results below.

Note: Here you need to create and mount the JuiceFS file system with different page sizes in advance, and replace the --directory parameter with the corresponding JuiceFS mount point in the test script.

Large File Sequential Read

image

fio --name=big-file-sequential-read \
--directory=/jfs \
--rw=read --refill_buffers \
--bs=256k --size=4G

Large File Sequential Write

image

fio --name=big-file-sequential-write \
--directory=/jfs \
--rw=write --refill_buffers \
--bs=256k --size=4G

Large File Concurrent Read

image

fio --name=big-file-multi-read \
--directory=/jfs \
--rw=read --refill_buffers \
--bs=256k --size=4G \
--numjobs={1, 2, 4, 8, 16}

Large File Concurrent Write

image

fio --name=big-file-multi-write \
--directory=/jfs \
--rw=write --refill_buffers \
--bs=256k --size=4G \
--numjobs={1, 2, 4, 8, 16}

Concurrent write in a single process has already reached the bandwidth limit of AWS EC2 and S3 with 600MB/s. To further exploit the advantages of JuiceFS, it is recommended to use a multi-machine parallel approach, especially in big data computing scenarios.

Added: AWS offers the Elastic Network Adapter (ENA) solution in 2018 to increase EC2 to S3 communication bandwidth to 25Gbps, need to use new EC2 instance and HVM AMI.

Small File Read and Write Test

Small File Sequential Read

JuiceFS enables the 1G local data cache by default when mounted. The cache can considerably improve the IOPS of small file reads. You can turn off the cache by adding the --cache-size=0 parameter when you mount JuiceFS. Here is a performance comparison.

image

fio --name=small-file-seq-read \
--directory=/jfs \
--rw=read --file_service_type=sequential \
--bs={file_size} --filesize={file_size} --nrfiles=1000

Small File Sequential Write

Add the --writeback parameter when mounting JuiceFS for the client-side write cache (for details, see cache <client_write_cache>{.interpreted-text role="ref"}), which can considerably improve performance on small files sequential write. See the following comparison test.

By default, the fio test approach will leave the file closing operation at the end of the task, which may result in the loss of data in the distributed file system due to factors such as network anomalies. So we use the --file_service_type=sequential parameter of fio, then fio will ensure that a file is written (execute flush & close, write all the data to the object storage) and then proceed a file in the test task.

image

fio --name=small-file-seq-read \
--directory=/jfs \
--rw=write --file_service_type=sequential \
--bs={file_size} --filesize={file_size} --nrfiles=1000

Small File Concurrent Read

image

fio --name=small-file-multi-read \
--directory=/jfs \
--rw=read --file_service_type=sequential \
--bs=4k --filesize=4k --nrfiles=1000 \
--numjobs={1, 2, 4, 8, 16}

Small File Concurrent Write

image

fio --name=small-file-multi-write \
--directory=/jfs \
--rw=write --file_service_type=sequential \
--bs=4k --filesize=4k --nrfiles=1000 \
--numjobs={1, 2, 4, 8, 16}

In the multi-process method used by the Fio test task, the performance is linearly related to the number of concurrent processes, with a small amount of attenuation as the number of processes increases.