In JuiceFS Enterprise Edition, we use the phrase "background job" (also abbreviated as "bgjob") to refer to a series of tasks that's dispatched by the Metadata Service, and executed in clients, this includes compaction, trash cleaning, and data replication. Keep in mind that "background job" should be distinguished from "asynchronous task", the former has developed its own special meaning in JuiceFS, while "asynchronous task" generally refers to various asynchronous execution processes. For example, if client write cache is enabled, data is uploaded asynchronously by clients, such process does not belong to background job, but still a async task nonetheless.
All types of background job introduced in this chapter deals with object storage, in order to allow finder control over this process (e.g. disable bgjob for some client, or limit compaction speed), you can modify client token settings from the Web Console, to dynamically adjust bgjob settings for clients.
If background job is not explicitly disabled, clients will run all types of background jobs for the mounted file system (with strict data isolation between different file systems). This means:
- When the
--subdiroptions is used to mount a sub-directory, the scope of background job is not affected, jobs coming from the entire file system is still dispatched to these clients;
- If clients are mounted with read-only tokens, background jobs will still run on them, thus read-only clients will write to object storage as well;
- Similarly, other mount options will not affect with the scope of background job. This doesn't necessarily mean that mount options do not affect background job at all: clients with their options tuned for better performance will run jobs faster, hence their metadata service will dispatch more tasks accordingly.
Compaction is the process of merging multiple slices into one, in order to avoid file fragmentation. Learn more in in How JuiceFS Stores Files.
Every time file is written, metadata service will check fragmentation status to see if compaction is in need, and dispatch as background jobs according to pre-defined rules. After compaction is completed in clients, multiple slices will be compacted into one, to improve read efficiency. So if you noticed unexplained network traffic when there's no explicit read or write in JuiceFS Client, this is usually just compaction traffic, and nothing to worry about.
Compaction comes with the following design:
- Job dispatch favors the client that initiate the write, this is also considered a form of data locality, because it likely already has all the slices needed for compaction in its local cache, so this strategy helps easing read overhead;
- For every chunk, compaction is carried out by a single client at any given time, scheduling is controlled globally by the Metadata Service.
You can observe compaction in:
- The monitoring tab in the file system page, from the Web Console. Compaction traffic is indicated by the "compact" line, under the object storage panel;
- On-prem The Meta info dashboard from Grafana. Read the
slices:usedline under the Memory distribution panel. This value should be close to 0 under ideal circumstances. Make no mistake that a high
slices:usedvalue (from 100M to several Gs) does not necessarily indicate disasters, if you do not notice apparent performance issues in the client side, no fix is required and you should just continue use normally (while these type of situation may comes with high number of slices, the set of files that's actually used maintains a pretty good fragmentation level, thus no impact on overall performance).
In different scenarios, compaction may run into different problems, continue reading for more.
Compaction and client write cache
Client write cache by itself is a feature that you should use with caution, if write cache is enabled, make sure client maintain an acceptable level of performance, so that staging data can be uploaded to storage in time. If this isn't the case, a slow client with write cache enabled, when running compaction, can serious trouble like read errors. Because with write cache clients, metadata is committed before data finishes uploading, slices is merged into one and file metadata state is changed, but the merged slice is uploaded too slowly, causing all reads from other clients to hang indefinitely or even timeout.
Slow compaction speed
If slices grow uncontrolled, file system performance quickly deteriorate and clients can even hang. If your troubleshooting concludes precisely this issue, refer to below methods for a fix:
- Background jobs are executed by clients, if the number of clients is simply not enough, or their token is disabled from bgjob by mistake, compaction will not run normally. Make sure there are an abundant number of clients, and bgjob is turned on for them (any form of JuiceFS Client is counted, including Hadoop SDK, S3 Gateway);
- Apparently, compaction is carried out by first download the relevant slices to local cache, consolidate, and then upload to object storage before the new state is committed to the Metadata Service. If your object storage service does not meet the required level of performance, or comes with bandwidth limitations, compaction may not run at proper speed and lead to severe fragmentation;
- On-prem Metadata Service also controls compaction scheduling, and client task queues, consult our engineers;
- Certain write patterns intrinsically cause more slices, like a continuous flow of small appends (each precedes a
flushcall), JuiceFS is always studying different application scenarios and improve upon special write patterns. If your application produces abnormal level of fragmentation, contact our engineers to look into the problem.
High compaction traffic
In certain large file scenarios, compaction produces a significant of upload traffic, which added with normal application uploads, can easily reach object storage throttling. You can set compaction traffic limit to mitigate this problem, go to the "Access Control" page and configure compaction bandwidth limit, to free up more bandwidth for actual application use. However this obviously slows down compaction and could bring performance risks. We recommend these practices:
- When application containers are ephemeral, compactions run in poor efficiency because pods are constantly being destroyed and re-created. Consider using a set of clients that persist, for example a dedicated cache cluster. Under this approach, you'll also need to assign different client token for the cache cluster, and application mount pods, so that compaction only runs in the stable cache cluster.
- Seed help from our engineers, analyze the cause for excessive fragmentation and resolve from the production end.
JuiceFS enables trash by default (you're asked to specify the trash retention days when creating a file system). When files are deleted, they first enter trash, wait for their expiration and then schedule for actual deletion. This deletion happens in the form of background job as well, and is the reason why your objects are not being immediately released, even after trash is emptied. Even if trash isn't enabled, deleted files will form bgjobs and be scheduled onto clients for execution, that is why there'll always be a delay between file deletion and object storage blocks release.
Cleaning trash is the process of deleting the object storage data for a file, and then remove its corresponding metadata. This also deals with object storage and hence could be too fast or too slow. If this happens, refer to below content for solution.
- Deletion happens too fast:
DELETEQPS can spike of cleaning is running too fast, which can cause object storage throttling or even consumes too much CPU. Use
--delete-limitto set limit on client QPS.
- Deletion is too slow: Background jobs are brokered to all clients (that did not disable bgjob), if deletion is too slow and you are in urgent need to free up object storage capacity, simply add more clients. Any form of JuiceFS Client is counted, including Hadoop SDK, S3 Gateway.
On-premises users can adjust configuration for the Metadata Service, which can directly control background job scheduling to manage deletion speed.
If data replication is enabled, data syncing is also carried out in background jobs. If syncing speed is not ideal, try to increase the number of clients.