TSTN-019: Deployment of Containerized Control System Components

  • Michael Reuter

Latest Revision: 2020-05-12

Note

This document presents the infrastructure and operation of containerized component deployment for the Vera C. Rubin Observatory.

1   Introduction

Many of the control system components, but not all, have been developed for deployment via Docker containers. Initially we leveraged docker-compose as a mechanism for prototyping container deployment to host services. With the proliferation of Kubernetes (k8s) clusters around the Vera C. Rubin Observatory and the associated sites, a new style of deployment and configuration is needed. We will leverage the experience SQuaRE has gained deploying the EFD and Nublado services and use a similar system based on Helm charts and ArgoCD configuration and deployment. The following sections will discuss the two key components and then will be expanded into including operation aspects as experience on that front is gained.

2   Helm Charts

The CSC deployment starts with a Helm chart. We are currently adopting version 1.4 of ArgoCD which works with Helm version 2. The code for the charts are kept in the Helm chart Github repository. The next two sections will discuss each chart in detail. For a description of the APIs used, consult the Kubernetes documentation reference. The chart sections will not go into great detail on the content of each API delivered. Each chart section will list all of the possible configuration aspects that each chart is delivering, but full use of that configuration is left to the ArgoCD Configuration section. For the CSC deployment, we will run a single container per pod on Kubernetes. The Kafka producers will follow the same pattern.

2.1   Kafka Producer Chart

While not a true control component, the Kafka producers are nevertheless an important part of the control system landscape. They have the capability to convert the SAL messages into Kafka messages that are then ingested into the Engineering Facilities Database (EFD). See [Fau20] for more details.

The chart consists of a single Kubernetes Workloads API: Deployment. The Deployment API allows for restarts if a particular pod dies which assists in keeping the producers up and running all the time. For each producer specified in the configuration, a deployment will be created. We will now cover the configuration options for the chart.

Table 1 Kafka Producer Chart YAML Configuration
YAML Key Description
image This section holds the configuration of the container image
image.repository The Docker registry name of the container image to use for the producers
image.tag The tag of the container image to use for the producers
image.pullPolicy The policy to apply when pulling an image for deployment
env This section holds environment configuration for the producer container
env.lsstDdsDomain The LSST_DDS_DOMAIN name applied to all producer containers
env.brokerIp The URI for the Kafka broker that received the generated Kafka messages
env.brokerPort The port associated with the Kafka broker specified in brokerIp
env.registryAddr The URL for the Kafka broker associated schema registry
env.partitions The number of partitions that the producers are supporting
env.replication The number of replications available to the producers
env.logLevel This value determines the logging level for the producers
producers This section holds the configuration of the individual producers [1]
producers.name This key gives a name to the producer deployment and can be repeated
producers.name.cscs [2] The list of CSCs that the named producer will monitor
producers.name.image This section provides optional override of the default image section
producers.name.image.repository The Docker registry container image name to use for the named producer
producers.name.image.tag The container image tag to use for the named producer
producers.name.image.pullPolicy The policy to apply when pulling an image for named producer deployment
producers.name.env This section provides optional override of the defaults env section
producers.name.env.lsstDdsDomain The LSST_DDS_DOMAIN name applied the named producer container
producers.name.env.partitions The number of partitions that the named producer is supporting
producers.name.env.replication The number of replications available to the named producer
producers.name.env.logLevel This value determines the logging level for the named producer
[1]A given producer is given a name key that is used to identify that producer (e.g. auxtel).
[2]The characters >- are used after the key so that the CSCs can be specified in a list

Note

The brokerIp, brokerPort and registryAddr of the env section are not overrideable in the producers.name.env section. Control of those items is on a site basis. All producers at a given site will always use the same information.

2.2   CSC Chart

Instead of having charts for every CSC, we employ an approach of having one chart that describes all the different CSC variants. There are four main variants that the chart supports:

simple
A CSC that requires no special interventions and uses only environment variables for configuration
entrypoint
A CSC that uses an override script for the container entrypoint.
imagePullSecrets
A CSC that requires the use of the Nexus3 repository and need access credential for pulling the associated image
volumeMount
A CSC that requires access to a physical disk store in order to transfer information into the running container

The chart consists of the Job Kubernetes Workflows API, ConfigMap and PersistentVolumeClaim Kubernetes Config and Storage APIs and VaultSecret Vault API. The Job API is used to provide correct behavior when a CSC is sent of OFFLINE mode, the pod should not restart. The drawback to this is if a CSC dies for an unknown reason, not one caught by FAULT state transition, the pod will not restart and requires startup intervention. The other three APIs are used to support the non-simple CSC variants. They will be mentioned in the configuration description which we will turn to next.

Warning

The volumeMount variant is still in the development phase, so it is currently not supported.

Table 2 CSC Chart YAML Configuration
YAML Key Description
image This section holds the configuration of the CSC container image
image.repository The Docker registry name of the container image to use for the CSC
image.tag The tag of the container image to use for the CSC
image.pullPolicy The policy to apply when pulling an image for deployment
image.useNexus3 [3] This key activates the VaultSecret API for secure image pulls
env [4] This section holds a set of key, value pairs for environmental variables
entrypoint [5] This key allows specification of a script to override the entrypoint
deployEnv [6] This key assists the VaultSecret in knowing where to look for credentials
[3]The value of the key is not used, but use true for the value
[4]See env example below
[5]Format is important. See entrypoint example below
[6]The name is site specific and handled in the ArgoCD configuration

Example env YAML section

env:
  LSST_DDS_DOMAIN: mytest
  CSC_INDEX: 1
  CSC_MODE: 1

The section can contain any number of environmental variables that are necessary for CSC configuration.

Example entrypoint YAML section

entrypoint: |
#!/usr/bin/env bash

source ~/miniconda3/bin/activate

source $OSPL_HOME/release.com

source /home/saluser/.bashrc

run_atdometrajectory.py

The script must be entered line by line with an empty line between each one in order for the script to be created with the correct execution formatting. The pipe (|) at the end of the entrypoint keyword is required to help obtain the proper formatting. Using the entrypoint key activates the use of the ConfigMap API.

Note

The configurations that are associated with each chart do not represent the full range of component coverage. The ArgoCD Configuration handles that.

2.3   Packaging and Deploying Charts

The Github repository has a README that contains information in how to package up a new chart for deployment to the chart repository. First, ensure that the chart version has been updated in the Chart.yaml file. The step for creating/updating the index file needs one more flag for completeness.

helm repo index --url=https://lsst-ts.github.io/charts .

Once the version number is updated, the chart packaged and the index file updated, they can be collected into a single commit and pushed to master. That push to master will trigger the installation of the new chart into the chart repository.

3   ArgoCD Configuration

The configuration and subsequent deployment of the control components is handled by the ArgoCD system. The code for the ArgoCD configuration is kept in the ArgoCD Github repository. The deployment methodologies will be handled in forthcoming sections. ArgoCD uses the concept of an app of app (or chart of charts). Each app requires a specfic chart or charts to use in order to deploy.

Each component has its own directory within the top-level apps directory. This includes the Kafka producers. There are a few special apps (further called collector apps) which collect the main CSC component apps into a group. Those will be discussed later.

The contents found within the application directories have roughly the same content.

Chart.yaml
This file specifies the name of the application with that key in the file.
requirements.yaml
This file specifies the Helm chart to use including the version.
values.yaml
This file contains base information that will apply to all site specific configuration. May not be present in all applications.
values-<site tag>.yaml
This file contains site specific information for the configuration. It may override configuration provided in the values.yaml file. The supported sites listed in the following table and not all applications will have all sites supported.
Table 3 Supported Sites
Site Tag Site Location
base La Serena Base Data Center
ncsa-teststand NCSA Test Stand
sandbox Currently a GKE instance, soon to be replaced by LSP-int at NCSA
summit Cerro Pachon Control System Infrastructure
tucson-teststand Tucson Test Stand. This is now largely defunct
templates
This directory contains a <app-name>-ns.yaml file defining a Kubernetes Cluster API: Namespace. This defines a specific namespace for the application.

All values*.yaml files start with the chart name the application uses as the top-level key. Further keys are specified in the same manner as the Helm chart configuration.

3.1   Collector Apps

Within the ArgoCD Github repository, there are currently two collector applications: auxtel and maintel. The layout for these apps is different and explained here.

Chart.yaml
This file contains the specification of a new chart that will deploy a group of CSCs.
values.yaml
This file contains configuration parameters to fill out the application deployment. The keys will be discussed below.
templates/<collector app name>.yaml
This file contains the ArgoCD Application API used to deploy the associated CSCs specified by the collector app configuration. One application is generated for each CSC listed in the configuration.
Table 4 Collector Application YAML Configuration
YAML Key Description
spec This section defines elements for cluster setup and ArgoCD location
spec.destination This section defines information for the deployment destination
spec.destination.server The name of the Kubernetes resource to deploy on
spec.source This section defines the ArgoCD setup to use
spec.source.repoURL The repository housing the ArgoCD configuration
spec.source.targetRevision The branch on the ArgoCD repository to use for the configuration
env This key sets the site tag for the deployment. Must be in quotes.
cscs This key holds a list of CSCs that are associated with the app
noSim This key holds a list of CSCs that do not have a simulator capability

[Fau20]Angelo Fausti. EFD Operations. Technote, March 2020. URL: https://sqr-034.lsst.io.