Skip to main content

Automatically Install JuiceFS Java Client

Mount JuiceFS Filesystem

See: Mount JuiceFS Filesystem

Python Script Installation

INSTALL SCRIPT: setup-hadoop.py , run the script as follows on the ClouerManager node, Ambari node, or EMR Master node on the public cloud.

This script is written in Python and supports Python2 and Python3.

When you run the script, the JuiceFS Java client(juicefs-hadoop.jar) will be downloaded and installed in your entire cluster, as well as automatically deploy related configurations to core-site.xml (automatically configuration supports HDP and CDH distribution only).

Usage

python setup-hadoop.py COMMAND

COMMAND could be:
-h show help information
install install JuiceFS JAR file on the current node
install_all install JuiceFS JAR file on all nodes, need SSH connected via root user
config auto fill configuration, support Ambari and ClouderaManager only
deploy_config auto deloy configuration to all nodes, support Ambari and ClouderaManager only
test check if installation is correct on the current node
test_all check if installation is correct on all nodes

Installation Guide

  1. Install the JAR File
python setup-hadoop.py install

juicefs-hadoop.jar will be downloaded to /user/local/lib and create a softlink in the /lib for each component of the Hadoop distribution.

The location detail will be printed in the log.

  1. Configuration
python setup-hadoop.py config

This operation writes the required configuration items to the core-site.xml file of HDFS, and print it out in the log.

For CDH and HDP environments, run this command and follow the prompt to enter the administrator\'s password. If the juicefs auth or juicefs mount is executed successfully on this node, authentication information in /root/.juicefs/ should read automatically, and then write configuration to core-site.xml file via RESTful API.

This operation will also create a cache directory on the node follow the jucefs.cache-dir configured.

  1. Distribute the JAR file to the cluster
  • For CDH or HDP distribution, run:

    python setup-hadoop.py install_all
  • For EMR of the public cloud, run:

    export NODE_LIST=node1,node2
    python setup-hadoop.py install_all
  • For Apache community edition, run:

    # the classpath of each components, seperate by comma
    export EXTRA_PATH=$HADOOP_HOME/share/hadoop/common/lib,$SPARK_HOME/jars,$PRESTO_HOME/plugin/hive-hadoop2
    export NODE_LIST=node1,node2
    python setup-hadoop.py install_all

This operation will copy juicefs-hadoop.jar and /etc/juicefs/.juicefs to the specified node via scp.

Installation by Ansible

Ansible configuration template: setup-hadoop.yml,

Ansible hosts configuration

[master] # ssh connected between master and slave if the node has external network
master001

[slave] # all nodes in the Hadoop cluster, exclude master nodes
slave001
slave002

Usage

ansible setup-hadoop.yml \
--extra-vars '{"jfs_name":"your-jfs-name", "jfs_version":"0.5", "hadoop":"your-hadoop-dist", "cache_dir":["/data01/jfscache","/data02/jfscache"]}'

Parameters:
jfs_name JuiceFS volume name
jfs_version JuiceFS Java client version
hadoop Hadoop distribution name, support: cdh5, cdh6, hdp, emr-ali, emr-tencent, kmr, uhadoop
cache_dir Cache directory on the current node