Empowering NAS for AI Training with JuiceFS Direct-Mode NFS

2024-07-25
Herald Yu

By offering multi-user network data access services, network-attached storage (NAS) greatly simplifies data sharing and management. While the Network File System (NFS) is a widely used protocol for achieving this kind of sharing, it often faces performance and consistency issues in complex AI training scenarios.

In its latest version 1.2, JuiceFS supports using NFS as the underlying storage in direct mode. This innovation allows JuiceFS to use NFS services on NAS without pre-mounting. With JuiceFS' direct-mode NFS feature, users can create JuiceFS file systems using existing NAS storage space without preparing additional object storage.

In this post, we’ll explore the benefits of direct-mode NFS storage, how JuiceFS uses NAS storage and caching to boost local AI model training, and the process of creating a JuiceFS file system using NFS storage.

Advantages of direct-mode NFS storage

Using NFS as the underlying storage for JuiceFS in direct mode has these advantages:

  • No pre-mounting required: You can directly use NFS as the underlying storage for JuiceFS, eliminating the need for pre-mounting and simplifying configuration and management.
  • High performance: JuiceFS enhances NFS storage performance through caching and pre-fetching, supporting high-concurrent read and write operations.
  • Cross-platform sharing: JuiceFS can transform NFS storage into a distributed file system, enabling cross-platform sharing. It can be used not only on Linux, macOS, and Windows operating systems but also in container environments such as Hadoop, Kubernetes, and Docker.

How JuiceFS boosts local AI model training

With JuiceFS, users can store training data and model files on their existing NAS. Using JuiceFS’ distributed, high-performance, and highly available features, users can access this data simultaneously across multiple compute nodes. This enhances the efficiency of AI model training.

On the training servers, users can access NAS data through various methods such as JuiceFS mount points, S3 Gateway, WebDAV, CSI Driver, and Hadoop API. JuiceFS will automatically cache the data to improve training performance.

JuiceFS supports multiple caching strategies, allowing users to choose the appropriate one based on different scenarios to enhance training performance. For example, users can set the cache size using the --cache-size parameter, specify the cache directory using the --cache-dir parameter, and use the warmup strategy to warm up data. For more details on JuiceFS caching strategies, see JuiceFS Cache.

How to create a JuiceFS file system using NFS

It’s easy to create a JuiceFS file system using NFS storage. You only need to configure the NFS service on the NAS or file server and then specify the address of the NFS storage when JuiceFS creates the file system.

For example, using NFS storage with the NFSv3 protocol, create a JuiceFS file system with the following command on any computer with the JuiceFS client installed on the same network:

sudo juicefs format --storage nfs \
    --bucket 192.168.1.88:/data/nfs \
    redis://192.168.1.88/0 \
    myjfs

In this code block:

  • --storage nfs specifies the NFS storage.
  • --bucket specifies the address of the NFS storage.
  • redis://192.168.1.88/0 specifies Redis as the metadata storage.
  • myjfs is the name of the file system.

For more information about direct mode of using NFS storage, see JuiceFS NFS.

Notes

When creating a JuiceFS file system using NFS as the storage layer, you need to pay attention to the following points:

  • JuiceFS does not currently support the NFSv4 identity authentication mechanism, so you need to configure NFS storage according to the NFSv3 protocol. There is no need to specify --access-key and --secret-key when creating a file system.
  • To give full play to the caching capabilities of JuiceFS, it’s recommended to prepare sufficient high-speed SSD space as a cache device on the server where the JuiceFS client is located to improve performance.
  • NFS uses the root_squash mechanism by default, which maps operations performed by the root identity to nobody:nogroup. Therefore, you need to configure permissions on the NFS server to ensure that the JuiceFS client has permission to access NFS storage.

Summary

JuiceFS 1.2 and later versions support using NFS as the underlying storage in direct mode. This allows JuiceFS to better work with NAS, improves JuiceFS' compatibility with NFS, and provides enterprises with an easier-to-use storage solution. Users can use existing storage resources to build a high-performance, highly available distributed file system locally to provide better support for AI model training, data analysis and other scenarios.

You’re welcome to try JuiceFS 1.2 and use NFS in direct mode to create a file system for empowering local AI model training.

If you have any questions for this article, feel free to join JuiceFS discussions on GitHub and our community on Slack.

Author

Herald Yu

Related Posts

Configuring Samba and NFS on JuiceFS to Unlock Unlimited Cloud Storage

2023-08-29
Learn how to use JuiceFS as the underlying storage for Samba and NFS to achieve infinitely scalable…