If you're new to Mesos

See the getting started page for more information about downloading, building, and deploying Mesos.

If you'd like to get involved or you're looking for support

See our community page for more details.

Pre-provisioned CSI Volume Support in Mesos Containerizer

Mesos 1.11.0 adds pre-provisioned CSI volume support to the MesosContainerizer (a.k.a., the universal containerizer) by introducing the new volume/csi isolator.

This document describes the motivation and the configuration steps for enabling the volume/csi isolator, and required framework changes.

Table of Contents

Motivation

Container Storage Interface (CSI) is a specification that defines a common set of APIs for all interactions between the storage vendors and the container orchestration platforms. Building CSI support allows Mesos to make use of the quickly-growing CSI ecosystem.

We already have a solution to support CSI introduced in the Mesos 1.5.0 release, but that solution has a limitation: it requires CSI plugins to implement the ListVolumes and GetCapacity APIs so that the external storage can be modeled as Mesos raw disk resources and then offered to frameworks. However there are a lot of 3rd party CSI plugins the do not implement those two APIs.

Mesos 1.11.0 provides a more generic way to support 3rd party CSI plugins so that Mesos can work with broader external storage ecosystem and we will benefit from continued development of the community CSI plugins.

How does it work?

The volume/csi isolator interacts with CSI plugins via the plugin’s gRPC endpoint.

When a new task with CSI volumes is launched, the volume/csi isolator will call the CSI plugin to publish the specified CSI volumes onto the agent host and then mount them onto the task container. When the task terminates, the volume/csi isolator will call the CSI plugin to unpublish the specified CSI volumes.

Currently the volume/csi isolator will only call the CSI plugin’s node service but not controller service, that means:

  • We only support pre-provisioned CSI volume but not dynamic CSI volumes provisioning, so operators need to create the CSI volumes explicitly and provide the volume info (e.g. volume ID, context, etc.) to frameworks so that frameworks can use the volumes in their tasks.

  • We do not support the CSI volumes that require the controller service to publish to a node (ControllerPublishVolume) prior to the node service publishing on the node (NodePublishVolume).

Configuration

To use the volume/csi isolator, there are certain actions required by operators and framework developers. In this section we list the steps required by the operator to configure the volume/csi isolator and the steps required by framework developers to specify CSI volumes in their tasks.

Pre-conditions

  • Explicitly create the CSI volumes that are going to be accessed by Mesos tasks. For some CSI plugins (e.g. NFS), they do not implement the CreateVolume API, so operators do not need to create the volume explicitly in this case.

Configuring the CSI Volume Isolator

In order to configure the volume/csi isolator, the operator needs to configure the --isolation and --csi_plugin_config_dir flags at agent startup as follows:

  sudo mesos-agent \
    --master=<master-IP:master-port> \
    --work_dir=/var/lib/mesos \
    --isolation=filesystem/linux,volume/csi \
    --csi_plugin_config_dir=<directory that contains CSI plugin configuration files>

The volume/csi isolator must be specified in the --isolation flag at agent startup; the volume/csi isolator has a dependency on the filesystem/linux isolator.

The operator needs to put the CSI plugin configuration files under the directory specified via the agent flag --csi_plugin_config_dir. Each file in this directory should contain a JSON object representing a CSIPluginInfo object which can be either a managed CSI plugin (i.e. the plugin launched by Mesos as a standalone container) or an unmanaged CSI plugin (i.e. the plugin launched outside of Mesos).

message CSIPluginInfo {
  required string type = 1;
  optional string name = 2 [default = "default"];

  // A list of container configurations to run managed CSI plugin.
  repeated CSIPluginContainerInfo containers = 3;

  // The service endpoints of the unmanaged CSI plugin.
  repeated CSIPluginEndpoint endpoints = 4;

  optional string target_path_root = 5;
  optional bool target_path_exists = 6;
}

message CSIPluginContainerInfo {
  enum Service {
    UNKNOWN = 0;
    CONTROLLER_SERVICE = 1;
    NODE_SERVICE = 2;
  }

  repeated Service services = 1;
  optional CommandInfo command = 2;
  repeated Resource resources = 3;
  optional ContainerInfo container = 4;
}

message CSIPluginEndpoint {
  required CSIPluginContainerInfo.Service csi_service = 1;
  required string endpoint = 2;
}

Example of managed CSI plugin: {.json} { "type": "org.apache.mesos.csi.managed-plugin", "containers": [ { "services": [ "NODE_SERVICE" ], "command": { "value": "<path-to-managed-plugin> --endpoint=$CSI_ENDPOINT" }, "resources": [ {"name": "cpus", "type": "SCALAR", "scalar": {"value": 0.1}}, {"name": "mem", "type": "SCALAR", "scalar": {"value": 1024}} ] } ] }

Example of unmanaged CSI plugin: {.json} { "type": "org.apache.mesos.csi.unmanaged-plugin", "endpoints": [ { "csi_service": "NODE_SERVICE", "endpoint": "/var/lib/unmanaged-plugin/csi.sock" } ], "target_path_root": "/mnt/unmanaged-plugin" }

Enabling frameworks to use CSI volumes

Volume Protobuf

The Volume protobuf message has been updated to support CSI volumes.

message Volume {
  ...
  required Mode mode = 3;
  required string container_path = 1;

  message Source {
    enum Type {
      UNKNOWN = 0;
      ...
      CSI_VOLUME = 5;
    }

    message CSIVolume {
      required string plugin_name = 1;

      message VolumeCapability {
        message BlockVolume {
        }

        message MountVolume {
          optional string fs_type = 1;
          repeated string mount_flags = 2;
        }

        message AccessMode {
          enum Mode {
            UNKNOWN = 0;
            SINGLE_NODE_WRITER = 1;
            SINGLE_NODE_READER_ONLY = 2;
            MULTI_NODE_READER_ONLY = 3;
            MULTI_NODE_SINGLE_WRITER = 4;
            MULTI_NODE_MULTI_WRITER = 5;
          }

          required Mode mode = 1;
        }

        oneof access_type {
          BlockVolume block = 1;
          MountVolume mount = 2;
        }

        required AccessMode access_mode = 3;
      }

      // Specifies the parameters used to stage/publish a pre-provisioned volume
      // on an agent host.
      message StaticProvisioning {
        required string volume_id = 1;
        required VolumeCapability volume_capability = 2;
        optional bool readonly = 3;
        map<string, Secret> node_stage_secrets = 4;
        map<string, Secret> node_publish_secrets = 5;
        map<string, string> volume_context = 6;
      }

      optional StaticProvisioning static_provisioning = 2;
    }

    optional Type type = 1;
    ...
    optional CSIVolume csi_volume = 6;
  }

  optional Source source = 5;
}

When requesting a CSI volume for a container, the framework developer needs to set Volume for the container, which includes mode, container_path and source fields.

The source field specifies where the volume comes from. Framework developers need to set the type field to CSI_VOLUME and specify the csi_volume field.

The csi_volume field specifies the information of the CSI volume. Framework developers need to set the plugin_name field to the type field of one of the CSI plugin configuration files in the directory specified via the agent flag --csi_plugin_config_dir, and specify the static_provisioning field according to the information of the pre-provisioned volume. The fields in static_provisioning map directly onto the fields in the CSI calls NodeStageVolume and NodePublishVolume, please find more detailed descriptions about those fields in the CSI spec.

How to specify container_path:

  1. If you are launching a task without a container image and container_path is an absolute path, you need to make sure the absolute path exists on your host root file system as the container shares the host root file system; otherwise, the task will fail.

  2. For other cases like launching a task without a container image and with a relative container_path, or launching a task with a container image and an absolute or relative container_path, the volume/csi isolator will help create the container_path as the mount point.

The following table summarizes the above rules for container_path:

Task with rootfs Task without rootfs
Absolute container_path No need to exist Must exist
Relative container_path No need to exist No need to exist

Example

Launch a task with a CSI volume managed by NFS CSI plugin:

   TaskInfo {
     ...
     "command" : {
       "value": "echo test > volume/file"
     },
     "container" : {
       "type": "MESOS",
       "volumes" : [
         {
           "container_path" : "volume",
           "mode" : "RW",
           "source": {
             "type": "CSI_VOLUME",
             "csi_volume": {
               "plugin_name": "nfs.csi.k8s.io",
               "static_provisioning": {
                 "volume_id": "foo",
                 "volume_capability": {
                   "mount": {},
                   "access_mode": {
                     "mode": "MULTI_NODE_MULTI_WRITER"
                   }
                 },
                 "volume_context": {
                   "server": "192.168.1.100",
                   "share": "/mnt/data"
                 }
               }
             }
           }
         }
       ]
     }
   }

NOTE: To make the above example work, an NFS server (192.168.1.100) needs to be setup to export the directory /mnt/data.