Skip to main content

File Import and Conversion

By default, JuiceFS stores files in blocks and separates metadata from data. This storage format and separation architecture enable JuiceFS to be a high-performance and strongly consistent file system.

However, in some rare scenarios, users prefer to store original files directly in object storage, allowing the files in object storage to be separated from JuiceFS metadata usage. Alternatively, they may want to directly import a large number of existing files from object storage into JuiceFS, enabling them to be accessed via POSIX and benefit from JuiceFS' powerful caching capabilities. Storing complete files in object storage and using them in JuiceFS is referred to as the "compatible format," distinguishing it from the default "optimized format."

Starting from version 5.0, JuiceFS has significantly improved support for the compatible format, providing the following features to meet the aforementioned requirements:

  • The import feature for object storage, also known as the juicefs import command. This command has been available for some time, but starting from version 5.0, imported files support read caching as well.
  • The convert feature, which reassembles optimized-format blocks in JuiceFS back to original files and uploads them to object storage. This allows you to directly access the original files in object storage, with caching support.

Convert

When the convert feature is enabled, files are converted into complete files and stored in object storage after a specified period.

convert

Typical use cases

  • Files are initially written by JuiceFS, but later needed to be directly accessed from object storage to integrate with cloud ecosystems. The emphasis is on directly access here, because JuiceFS itself does provide S3 API through S3 Gateway, if you just need to provide S3 API for your file system, use our S3 Gateway instead.

  • Utilize the archiving capabilities of object storage to archive cold data. The archived data can be taken out and accessed without JuiceFS metadata.

    Still, the emphasis is on use without JuiceFS metadata, because JuiceFS natively supports separation of cold and hot data, simply use --storage-class to specify a storage class, which is much simpler.

  • Compliance with data regulations that require files to be stored in its original intact format.

  • Other scenarios that demand data be separated from JuiceFS metadata, and can be taken out to use without JuiceFS.

Forbidden use cases

Convert is a experimental feature designed for some very special occasions, if your use case isn't listed above, you should never use this feature because it poses some important limitations on the file system (for example, converted files are read-only), continue to the below section to learn more.

These are some of the cases that should not (but can be easily mistaken) be used with the convert feature:

  • You need a S3 endpoint to access your JuiceFS file system. Our S3 Gateway is specifically built for this type of use, and shouldn't involve the convert feature at all.
  • Separate hot/cold data. JuiceFS Client can specify a storage class (via --storage-class) during juicefs auth, so that different clients can handle files destined for different storage classes.

Synopsis

The effect of conversion on the object storage file list is as follows:

# Before conversion
mybucket/
├── chunks
│ ├── 41
│ │ └── 1
│ │ ├── 1000001_0_4194304
│ │ └── 1000001_10_4194304
│ ├── 43
│ │ └── 1
│ │ ├── 1000003_0_4194304
...

# After conversion, files are written into object storage as is, preserving the directory structure. The original sharded-format data blocks are deleted.
mybucket/
├── bigfile1.tar.gz
├── chunks
├── dir/bigfile2.tar.gz

Because the purpose of conversion is to decouple from the JuiceFS sharded format and store data as is in object storage, files are stored according to the directory structure in the file system. Therefore, if the convert feature is enabled, the file system must exclusively occupy the object storage bucket. To avoid conflicts and potential data loss, it should not be used for multiple purposes or other JuiceFS file systems.

After conversion, files no longer support content modifications, and write operations will result in permission errors. While they cannot be edited, they can be moved using the mv command. In JuiceFS, this command is interpreted as a "cross-device copy + delete." It reads the file normally from the compatible format, writes it back to JuiceFS in sharded format as a new file, and delete the original file. As the mv command converts the file from the compatible format back to the sharded format, the file can be edited again until the specified time has passed, at which point it can be converted again.

For file systems with the convert feature enabled, directories created a while after their creation (including empty directories) cannot be moved (mv). They can only be deleted and then recreated.

danger

Converted files do not support the trash feature. Once deleted, they do not appear in the trash, and they cannot be recovered. The object storage side will also perform asynchronous cleanup through client background tasks.

The convert feature is currently in the testing phase. If needed, contact Juicedata engineers for assistance.

Cache for conversion

Files in the compatible format support local cache and distributed cache. Even though converted files are no longer in sharded format, when cached to local storage, they are still split into data blocks (sized according to the file system's block size). Therefore, the usage and management of caching for converted files is no different from sharded-format files.

However, it is important to note that after conversion, existing cache is invalidated due to metadata changes, and the files need to be warmed up again to reestablish local cache.

Import existing object storage files

juicefs import scans the given object storage address and writes the metadata information of the target file into JuiceFS' metadata engine, allowing these files to be accessed in JuiceFS. This operation does not actually copy any files; the files remain as they are in object storage. Therefore, this storage format is called the compatible format, meaning it is compatible with object storage.

When you use imported files, please note:

  • You can modify file names and permissions, but you cannot modify the object storage data. In other words, no matter what operation you perform, the original objects in object storage will remain unchanged.
  • Deleting these files will only delete their metadata and will not actually delete the source files in object storage.
  • The imported files' metadata in JuiceFS does not support the trash feature. If you delete imported files in JuiceFS, you will not find them in the trash. If you need to recover them, you can only re-import them.
  • Files imported into JuiceFS cannot be easily distinguished from regular files. If you need to check, use the juicefs info command and focus on the object field (rather than a chunks table) to determine whether it is stored in compatible format.
danger

Imported files also occupy file system space, count in directory quotas, and are included in billing.

Cache for import

Starting from JuiceFS 5.0, imported files also support local cache and distributed cache. Although imported files are not actually written to the JuiceFS file system and do not go through JuiceFS' sharded formatting process, when cached to the local disk, they are still split into data blocks (the size is the file system's block size). Therefore, the usage and management of cache for imported files are no different from normal files written to the JuiceFS file system.

When you use JuiceFS' cache feature to speed up the reading of imported files, it is important to note consistency issues: since the imported objects themselves are not managed by JuiceFS, if the objects are modified without being re-imported into JuiceFS, old versions of the cache may exist, and there is no guarantee that the latest data can be read. Therefore, if changes occur in the objects after they are imported into JuiceFS, they need to be re-imported. Existing cache data will automatically become invalid based on the modification time of the imported objects. This makes sure you can read the modified data.

For objects that need to be modified repeatedly, it is recommended to migrate the data to JuiceFS as a whole, using juicefs sync to write data to JuiceFS. Because of JuiceFS' POSIX compatibility, you can use any other tool as well.