Skip to main content

Fault diagnosis and analysis

Client log

JuiceFS client will output logs for troubleshooting during operation, the log levels are from low to high: DEBUG, INFO, WARNING, ERROR, FATAL, by default only logs above INFO level are output. If you need to output DEBUG level logs, you need to explicitly enable it when running the JuiceFS client, such as adding the --verbose or --debug option.

Different JuiceFS clients obtain logs in different ways, which are described below.

Mount point

When mounting, you can specify the log save path with the --log=<log_path> option. When this option is not given, the log is written to the /var/log/juicefs.log or ~/.juicefs/log/juicefs.log file in plain text by default.

You can use the grep command to filter different levels of logs for performance analysis or troubleshooting:

$ cat /var/log/juicefs.log | grep '<INFO>'
$ cat /var/log/juicefs.log | grep '<WARNING>'
$ cat /var/log/juicefs.log | grep '<ERROR>'
$ cat /var/log/juicefs.log | grep '<FATAL>'

Kubernetes CSI Driver

Depending on the version of the JuiceFS CSI Driver you are using, there will be different ways to obtain logs. For details, please refer to CSI Driver documentation.

S3 Gateway

The S3 gateway only supports running in the foreground, so client logs are output directly to the terminal. If you are deploying the S3 gateway in Kubernetes, you need to view the logs for the corresponding pod.

Hadoop Java SDK

The logs of the application process (such as Spark executor) using the JuiceFS Hadoop Java SDK will include the JuiceFS client logs, because they are mixed with the logs generated by the application itself, and need to be filtered by specific keywords (such as juicefs, pay attention here case is ignored).

Access log

Each JuiceFS client has an access log that details all operations on the file system, such as operation type, user ID, group ID, file inodes and how long this operation took. Access logs can be used for various purposes such as performance analysis, auditing, troubleshooting.

Access log format

An example format of an access log is as follows:

2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>

The meaning of each column is:

  • 2021.01.15 08:26:11.003330: The time of the current operation
  • [uid:0,gid:0,pid:4403]: User ID, group ID, process ID of the current operation
  • write: Operation type
  • (17669,8666,4993160): The input parameters of the current operation type. For example, the input parameters of the write operation in the example are the inode of the written file, the size of the written data, and the offset of the written file. Different operation types have different parameters.
  • OK: Whether the current operation is successful or not, if it is unsuccessful, specific failure information will be output.
  • <0.000010>: The time (in seconds) that the current operation took

You can debug and analyze performance issues with access log, or try juicefs profile to see real-time statistics. Run juicefs profile -h or refer to here to learn more about this subcommand.

Different JuiceFS clients obtain access log in different ways, which are described below.

Mount point

JuiceFS provides plain text file virtual files with .oplog and .ophistory files in the root directory of each mount point, outputing real-time file system operation logs. A virtual file named .accesslog is also available in client version 4.7.0 and above, and the log format is the same as .oplog.

  • .oplog and .accesslog: outputs real-time operation logs
  • .ophistory: also contains all the history logs since the log service started

Direct analysis of these log files can be difficult, but you can use juicefs profile tool (for more detailed usage, please refer to "Command Reference" document).

Kubernetes CSI Driver

Please refer to CSI Driver documentation and according to the version of JuiceFS CSI Driver you are using to find the mount pod or CSI driver pod, just view the .oplog or .accesslog file in the root directory of the JuiceFS file system mount point in the pod. The mount point path in the pod is /jfs/<pv_volumeHandle>, assuming the name of the mount pod is juicefs-1.2.3.4-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373, <pv_volumeHandle> is pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373, you can use the following command to view:

kubectl -n kube-system exec juicefs-chaos-k8s-002-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373 -- cat /jfs/pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373/.oplog

Hadoop Java SDK

You need to add the juicefs.access-log configuration item in the client configurations of the JuiceFS Hadoop Java SDK to specify the path of the access log output, and the access log is not output by default.

Real-time statistics

JuiceFS provides a plain-text virtual file with the file name .stats at the root of each mount point, reporting real-time statistics. It's helpful to look at the statistics in this file for troubleshooting and analysis. These statistics can also be collected through JSON API or Prometheus API. For details, please refer to the "Monitoring" document.

Different JuiceFS clients obtain real-time statistics in different ways, which are described below.

Mount point

In this case, the file system mount point for fault diagnosis and analysis is assumed to be /jfs. Here is some helpful example commands:

View CPU and memory usage:

cat /jfs/.stats | grep usage

View uptime:

cat /jfs/.stats | grep uptime

View the count of each object storage operation:

cat /jfs/.stats | grep object

View the amount of object storage read and write data:

cat /jfs/.stats | grep _bytes

Kubernetes CSI Driver

Please refer to CSI Driver documentation and according to the version of JuiceFS CSI Driver you are using to find the mount pod or CSI driver pod, just view the .stats file in the root directory of the JuiceFS file system mount point in the pod. The mount point path in the pod is /jfs/<pv_volumeHandle>, assuming the name of the mount pod is juicefs-1.2.3.4-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373, <pv_volumeHandle> is pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373, you can use the following command to view:

kubectl -n kube-system exec juicefs-chaos-k8s-002-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373 -- cat /jfs/pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373/.stats

Runtime information

By default, JuiceFS clients will listen to a TCP port locally via pprof to get runtime information such as Goroutine stack information, CPU performance statistics, memory allocation statistics. You can see the specific port number that the current JuiceFS client is listening on by using the system command (e.g. lsof):

note

If JuiceFS is mounted via the root user, then you need to add sudo before the lsof command.

$ lsof -i -nP | grep LISTEN | grep juicefs
juicefs 32666 user 8u IPv4 0x44992f0610d9870b 0t0 TCP 127.0.0.1:6061 (LISTEN)
juicefs 32666 user 9u IPv4 0x44992f0619bf91cb 0t0 TCP 127.0.0.1:6071 (LISTEN)
juicefs 32666 user 15u IPv4 0x44992f062886fc5b 0t0 TCP 127.0.0.1:9567 (LISTEN)

By default, pprof listens on port numbers starting at 6060 and ending at 6099, so the actual port number in the above example is 6061. Once you get the listening port number, you can view all the available runtime information at http://localhost:<port>/debug/pprof, and some important runtime information is as follows:

  • Goroutine stack information: http://localhost:<port>/debug/pprof/goroutine?debug=1
  • CPU performance statistics: http://localhost:<port>/debug/pprof/profile?seconds=30
  • Memory allocation statistics: http://localhost:<port>/debug/pprof/heap

To make it easier to analyze this runtime information, you can save it locally, e.g.:

$ curl 'http://localhost:<port>/debug/pprof/goroutine?debug=1' > juicefs.goroutine.txt
$ curl 'http://localhost:<port>/debug/pprof/profile?seconds=30' > juicefs.cpu.pb.gz
$ curl 'http://localhost:<port>/debug/pprof/heap' > juicefs.heap.pb.gz

If you have the go command installed, you can analyze it directly with the go tool pprof command, for example to analyze CPU performance statistics:

$ go tool pprof 'http://localhost:<port>/debug/pprof/profile?seconds=30'
Fetching profile over HTTP from http://localhost:<port>/debug/pprof/profile?seconds=30
Saved profile in /Users/xxx/pprof/pprof.samples.cpu.001.pb.gz
Type: cpu
Time: Dec 17, 2021 at 1:41pm (CST)
Duration: 30.12s, Total samples = 32.06s (106.42%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 30.57s, 95.35% of 32.06s total
Dropped 285 nodes (cum <= 0.16s)
Showing top 10 nodes out of 192
flat flat% sum% cum cum%
14.73s 45.95% 45.95% 14.74s 45.98% runtime.cgocall
7.39s 23.05% 69.00% 7.41s 23.11% syscall.syscall
2.92s 9.11% 78.10% 2.92s 9.11% runtime.pthread_cond_wait
2.35s 7.33% 85.43% 2.35s 7.33% runtime.pthread_cond_signal
1.13s 3.52% 88.96% 1.14s 3.56% runtime.nanotime1
0.77s 2.40% 91.36% 0.77s 2.40% syscall.Syscall
0.49s 1.53% 92.89% 0.49s 1.53% runtime.memmove
0.31s 0.97% 93.86% 0.31s 0.97% runtime.kevent
0.27s 0.84% 94.70% 0.27s 0.84% runtime.usleep
0.21s 0.66% 95.35% 0.21s 0.66% runtime.madvise

Runtime information can also be exported to visual charts for a more intuitive analysis. The visual charts support exporting to various formats such as HTML, PDF, SVG, PNG, etc. For example, the command to export memory allocation statistics as a PDF file is as follows:

note

The export to visual chart function relies on Graphviz, so please install it first.

$ go tool pprof -pdf 'http://localhost:<port>/debug/pprof/heap' > juicefs.heap.pdf

For more information about pprof, please see the official documentation.