Skip to main content

Use JuiceFS on AWS

AWS is the world's leading cloud computing platform, offering almost all types of cloud computing services. Thanks to the rich product line of AWS, users can choose JuiceFS components in a very flexible way.

Preparation

As you can see from the previous documents, JuiceFS consists of the following three components:

  1. A JuiceFS client installed on a server
  2. An object storage used to store data
  3. A database for storing metadata

1. Servers

Amazon EC2 Cloud Server is one of the most basic and widely used cloud services on the AWS platform. It offers more than 400 instance sizes and 81 availability zones in 25 data centers around the world, giving users the flexibility to choose and adjust the configuration of EC2 instances according to their actual needs.

For new users, you don't need to think too much about JuiceFS configuration requirements because a JuiceFS file system can be easily created and then mounted even with the least configured EC2 instances. Usually, you only need to consider hardware requirements.

JuiceFS clients will occupy 1GB of disk space as cache by default. When dealing with a large number of files, the client will cache the data on disk first and then upload it to an object storage asynchronously. Choosing a disk with higher I/O and reserving and setting a larger cache will allow JuiceFS to have better performance.

2. Object Storage

Amazon S3 is the de facto standard for public cloud object storage services, and the object storage services provided by other major cloud platforms are generally compatible with the S3 API. Thus, applications developed for S3 can switch object storage services between different platforms freely.

JuiceFS fully supports Amazon S3 and all S3-like object storage services, and you can see the documentation for all object storage services supported by JuiceFS.

Amazon S3 offers a range of storage classes for different use cases, mainly including:

  • Amazon S3 Standard: general-purpose storage for frequently accessed data
  • Amazon S3 Standard_IA: for data that is long-term stored but infrequently accessed
  • Amazon S3 Glacier: for long-term data archiving

Amazon S3 Standard classes is usually recommended for using JuiceFS because other classes may cost additional fee when retrieving data.

In addition, access to the object storage service requires user authentication via Access Key and Secret Key, which you can refer to the document Controlling Access to Storage Buckets with User Policies to create it. When accessing S3 through EC2 cloud server, you can also assign IAM role to EC2 to enable key-free invocation of S3 API on EC2.

3. Database

The ability of data and metadata to be accessed by multiple hosts is the key to a distributed file system. To allow the metadata information generated by JuiceFS to be accessible like S3 over Internet, the database used for storing metadata in JuiceFS should also be network-oriented.

Amazon RDS and ElastiCache are two cloud database services provided by AWS, both of which can be directly used for metadata storage in JuiceFS. Amazon RDS is a relational database that supports various engines such as MySQL, MariaDB, and PostgreSQL. ElastiCache is an in-memory caching cluster service and compatible with Redis and Memcached, and Redis should be used for JuiceFS.

In addition, you can also build your own database on EC2 cloud server for JuiceFS to store metadata.

4. Cautions

  • JuiceFS will not affect the existing systems.
  • When selecting cloud services, it is recommended to select all cloud services that are in the same region, i.e., all services being on the same intranet, which can result in the lowest latency and the fastest inter-access. Also, according to AWS billing rules, it is free to transfer data between basic cloud services in the same region. In other words, when you select cloud services in different regions, for example, EC2 is selected in ap-east-1, ElastiCache is selected in ap-southeast-1, and S3 is selected in us-east-2, the inter-access between each cloud service in this case will incur traffic charges.
  • JuiceFS does not require the use of object storage and databases from the same cloud platform, and thus it is flexible to combine cloud services from different platforms as needed. For example, you can use EC2 to run JuiceFS client with AliCloud's Redis database and Backbalze B2 object storage. Of course, JuiceFS storage composed of cloud services on the same platform and in the same region will perform better.

Deployment and Usage

Next, we briefly describe how to install and use JuiceFS using ElastiCache cluster, with EC2 cloud server, S3 object storage and Redis engine in the same region as an example.

1. Install the client

Here we are using a Linux system with x64 bit architecture. Execute the following commands, the latest version of JuiceFS client will be downloaded.

JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')
wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"

After downloading, unzip the application into the juice folder.

mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice

Install the JuiceFS client to the system $PATH, e.g., /usr/local/bin.

sudo install juice/juicefs /usr/local/bin

Execute the above command. The successful installation of the client will be indicated by the returned help message of juicefs as shown below.

$ juicefs
NAME:
juicefs - A POSIX file system built on Redis and object storage.

USAGE:
juicefs [global options] command [command options] [arguments...]

VERSION:
0.17.0 (2021-09-24T04:17:26Z e115dc4)

COMMANDS:
format format a volume
mount mount a volume
umount unmount a volume
gateway S3-compatible gateway
sync sync between two storage
rmr remove directories recursively
info show internal information for paths or inodes
bench run benchmark to read/write/stat big/small files
gc collect any leaked objects
fsck Check consistency of file system
profile analyze access log
stats show runtime statistics
status show status of JuiceFS
warmup build cache for target directories/files
dump dump metadata into a JSON file
load load metadata from a previously dumped JSON file
help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:
--verbose, --debug, -v enable debug log (default: false)
--quiet, -q only warning and errors (default: false)
--trace enable trace log (default: false)
--no-agent disable pprof (:6060) agent (default: false)
--help, -h show help (default: false)
--version, -V print only the version (default: false)

COPYRIGHT:
Apache License 2.0

Tip: If you execute the command juicefs and the terminal returns command not found, it may be because the /usr/local/bin directory is not in the system's executable path $PATH . You can use the command echo $PATH to check the path and reinstall the client to the correct location. You can also add /usr/local/bin to the $PATH.

JuiceFS has good cross-platform compatibility and is supported on both Linux, Windows and macOS. If you need to know how to install it on other systems, please check the official documentation.

3. Create File System

The format subcommand of the JuiceFS client is used to create (format) a JuiceFS file system. Here we use S3 as the data store and ElastiCache as the metadata store, install the client on EC2 and create a JuiceFS file system with the following command format.

$ juicefs format \
--storage s3 \
--bucket https://<bucket>.s3.<region>.amazonaws.com \
--access-key <access-key-id> \
--secret-key <access-key-secret> \
redis://[<redis-username>]:<redis-password>@<redis-url>:6379/1 \
mystor

Option Description:

  • --storage: Specify the type of object storage, here we use S3.
  • --bucket: Bucket domain for object storage.
  • --access-key and --secret-key: Secret key pair to access the S3 API.

For Redis 6.0 and above, authentication requires both username and password, and the URI pattern is redis://username:password@redis-server-url:6379/1. For Reids 4.0 and 5.0, authentication only requires password, and username needs to be left blank when setting the Redis server address, for example: redis://:password@redis-server-url:6379/1.

When using the IAM role to bind EC2, you only need to specify --storage and --bucket options, and do not need to provide the API access key. It is also possible to assign ElastiCache access to the IAM role, so that you only need to enter the Redis URL without providing Redis authentication information:

$ juicefs format \
--storage s3 \
--bucket https://herald-demo.s3.<region>.amazonaws.com \
redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1 \
mystor

Seeing the output like the following means that the file system was created successfully.

2021/10/14 08:38:32.211044 juicefs[10391] <INFO>: Meta address: redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1
2021/10/14 08:38:32.216566 juicefs[10391] <INFO>: Ping redis: 383.789µs
2021/10/14 08:38:32.216915 juicefs[10391] <INFO>: Data use s3://herald-demo/mystor/
2021/10/14 08:38:32.412112 juicefs[10391] <INFO>: Volume is formatted as {Name:mystor UUID:21a2cafd-f5d8-4a76-ae4d-482c8e2d408d Storage:s3 Bucket:https://herald-demo.s3.ap-southeast-1.amazonaws.com AccessKey: SecretKey: BlockSize:4096 Compression:none Shards:0 Partitions:0 Capacity:0 Inodes:0 EncryptKey:}

4. Mount the file system

The process of creating a file system will store the information of the object storage including API keys into the database, so you do not need to input the bucket domain and the secret key of the object storage when mounting.

Use the mount subcommand of the JuiceFS client to mount the file system to the /mnt/jfs directory.

sudo juicefs mount -d redis://[<redis-username>]:<redis-password>@<redis-url>:6379/1  /mnt/jfs

Note: When mounting a file system, only the database address is required but not the file system name. The default cache path is /var/jfsCache. Please make sure the current user has enough read/write permissions.

You can optimize JuiceFS by adjusting the mount options, for example by --cache-size to change cache to 20GB.

sudo juicefs mount --cache-size 20480 -d redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1  /mnt/jfs

Seeing the output like the following means the file system was mounted successfully.

2021/10/14 08:47:49.623814 juicefs[10601] <INFO>: Meta address: redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1
2021/10/14 08:47:49.628157 juicefs[10601] <INFO>: Ping redis: 426.127µs
2021/10/14 08:47:49.628941 juicefs[10601] <INFO>: Data use s3://herald-demo/mystor/
2021/10/14 08:47:49.629198 juicefs[10601] <INFO>: Disk cache (/var/jfsCache/21a2cafd-f5d8-4a76-ae4d-482c8e2d408d/): capacity (20480 MB), free ratio (10%), max pending pages (15)
2021/10/14 08:47:50.132003 juicefs[10601] <INFO>: OK, mystor is ready at /mnt/jfs

To check how the file system is mounted, you can use the command df,

$ df -Th
File system type capacity used usable used% mount point
JuiceFS:mystor fuse.juicefs 1.0P 64K 1.0P 1% /mnt/jfs

Once mounted, the file system can be used like a local disk, and the data stored in the directory /mnt/jfs is coordinated by the JuiceFS client and eventually stored in the S3 object storage.

Multi-Host Sharing: JuiceFS supports being mounted by multiple hosts at the same time, you can install the JuiceFS client on any cloud server on any platform. The files can be shared to read and write once mounting the file system using the database address redis://:<your-redis-password>@herald-sh-abc.redis.rds.aliyuncs.com:6379/1, but you need to make sure that the host which the file system is mounted on has privileges to access the database and the S3 being used.

5. Unmount JuiceFS

The file system can be unmounted using the command umount provided by the JuiceFS client, e.g.

sudo juicefs umount /mnt/jfs

Note: Forcing to unmount the file system in use may result in data corruption or loss, so please be sure to proceed with caution.

6. Auto-mount on boot

If you don't want to re-mount JuiceFS storage manually every time you reboot your system, you can set up an automatic mount.

First, you need to rename the juicefs client to mount.juicefs and copy it to the directory /sbin/.

sudo cp juice/juicefs /sbin/mount.juicefs

Edit the /etc/fstab configuration file and add a new record:

redis://[<redis-username>]:<redis-password>@<redis-url>:6379/1    /mnt/jfs       juicefs     _netdev,cache-size=20480     0  0

The mount option cache-size=20480 means to allocate 20GB local disk space for JuiceFS cache. Please decide the allocated cache size based on your actual EBS disk capacity.

You can adjust the FUSE mount options in the above configuration as needed. For more details please check the documentation.

Note: Please replace the Redis address, mount point, and mount options in the above configuration file with your actual information.