Skip to main content

Using Kerberos

Kerberos is a authentication protocol widely supported in the Hadoop ecosystem.

Without Kerberos, HDFS Client cannot authenticate the current user, the HADOOP_USER_NAME environment variable can be used to configure user name (can be set to superuser), but this brings security issues. By using Hadoop in Secure Mode and enable Kerberos, users accessing HDFS is authenticated using Kerberos.

Enable Kerberos for JuiceFS

JuiceFS>=4.8 Hadoop Java SDK brings support to Kerberos

  1. Preparation

    1. Install KDC if you haven't already.

    2. Create a meta.keytab file for JuiceFS, replace VOL_NAME with:

      kadmin.local -q "addprinc -randkey meta/{VOL_NAME}"
      kadmin.local -q "ktadd -norandkey -k meta.keytab meta/{VOL_NAME}"
  2. Enable Kerberos support in JuiceFS console

    1. Enable Kerberos support in the volume settings page, and then upload the previous created meta.keytab file.

      Kerberos

    2. Superuser and Supergroup

      With Kerberos enabled, you can configure superuser / supergroup in console. The configured value will overwrite juicefs.superuser and juicefs.supergroup.

    3. Optional: Proxy User

      JuiceFS supports Proxy User as well, see HDFS Proxy User, and add proxyuser config when you need.

  3. SDK Configuration

    Add to core-site.xml:

    <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
    </property>
    <property>
    <name>juicefs.server-principal</name>
    <value>meta/_HOST</value>
    <description>
    The _HOST wildcard will expand into JuiceFS volume name at runtime.
    Change to meta/{VOL_NAME} to use specific file system.
    </description>
    </property>
  4. Verify

    • Hadoop shell

      # log in using kinit
      kinit {your-client-principal}
      # verify if JuiceFS works
      hadoop fs -ls jfs://{VOL_NAME}/
      # exit
      kdestroy
      # after logout, accessing files should fail with error: kerberos credential is needed
      hadoop fs -ls jfs://{VOL_NAME}/
    • Spark

      Needs to add the following config to Spark:

      --conf spark.yarn.access.hadoopFileSystems

By default, Hadoop compute client uses /etc/krb5.conf to access KDC, if KDC config file is located elsewhere, add -Djava.security.krb5.conf=/path/to/conf to your Java arguments.