Skip to content

Latest commit

 

History

History
83 lines (69 loc) · 2.91 KB

File metadata and controls

83 lines (69 loc) · 2.91 KB

FUSE

Our images of Apache Hadoop do contain the necessary binaries and libraries to use the HDFS FUSE driver.

FUSE is short for Filesystem in Userspace and allows a user to export a filesystem into the Linux kernel, which can then be mounted. HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment.

To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod. This Pod, however, will need some extra capabilities.

This is an example Pod that will work as long as the host system that is running the kubelet does support FUSE:

apiVersion: v1
kind: Pod
metadata:
  name: hdfs-fuse
spec:
  containers:
    - name: hdfs-fuse
      env:
        - name: HADOOP_CONF_DIR
          value: /stackable/conf/hdfs
      image: docker.stackable.tech/stackable/hadoop:<version> (1)
      imagePullPolicy: Always
      securityContext:
        privileged: true
      command:
        - tail
        - -f
        - /dev/null
      volumeMounts:
        - mountPath: /stackable/conf/hdfs
          name: hdfs-config
  volumes:
    - name: hdfs-config
      configMap:
        name: <your hdfs here> (2)
  1. Ideally use the same version your HDFS is using. FUSE is baked in to our images as of SDP 23.11.

  2. This needs to be a reference to a discovery ConfigMap as written by our HDFS operator.

Tip
Privileged Pods

Instead of privileged it might work to only add the capability SYS_ADMIN, our tests showed that this sometimes works and sometimes doesn’t, depending on the environment.

Example
securityContext:
  capabilities:
    add:
      - SYS_ADMIN

Unfortunately, there is no way around some extra privileges. In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE will need access to the underlying Kernel modules.

Inside this Pod you can get a shell (e.g. using kubectl exec --stdin --tty hdfs-fuse — /bin/bash) to get access to a script called fuse_dfs_wrapper (it is in the PATH of our Hadoop images).

To see the available options, call the script without any parameters.

To mount HDFS call the script like this:

fuse_dfs_wrapper dfs://<your hdfs> <target> (1) (2)

# This will run in debug mode and stay in the foreground
fuse_dfs_wrapper -odebug dfs://<your hdfs> <target>

# Example:
mkdir simple-hdfs
fuse_dfs_wrapper dfs://simple-hdfs simple-hdfs
cd simple-hdfs
# Any operations in this directory will now happen in HDFS
  1. Again, use the name of the HDFS service as above

  2. target is the directory in which HDFS will be mounted, it must exist otherwise this command will fail