GitLab CI + Kubernetes Executor: Setup and Typical Issues

GitLab CI + Kubernetes Executor: Setup and Typical Issues

Intro

GitLab Runner has several types of executors, and the most widely used are shell and docker. While everything is clear about these two, the kubernetes executor type is not that popular. First, Kubernetes itself is a specific software and it does not fit every project; second, the kubernetes executor is a good choice in case your CI jobs require much server resources, which usually are CPU and RAM, but you don’t want to be extra charged for the time those resources are not being used. That said, if you have a Kubernetes cluster at a cloud provider and resource consuming CI jobs, then the kubernetes executor is your choice.

Use Case

In our case, the client is using Google Cloud Plarform (GCP) with Google Kubernetes Engine (GKE). They have an Elixir application which consumes much CPU and RAM during the ‘test’ stage; and since there are at least two developers working on that application at the same time, it sometimes happens that 2 pipelines can be run simultaneously. The kubernetes executor allowed us to automatically create powerful instances on-demand and automatically terminate them when they are no longer needed. By default, the pool dedicated for gitlab executors has 0 nodes, and new nodes are created during working hours.

Deploy

Kubernetes YAML configs to deploy a gitlab-runner with the kubernetes executor can be found on my GitLab

cluster-admin.yml – you need to become a cluster admin first
config-map.yml – environment variables for the gitlab-runner container
gitlab-runner-secret.yml – the token obtained at https://${YOUR_GITLAB_DOMAIN}/admin/runners
kube-namespace.yml – a namespace for your executors
kube-runner-scripts.yml – a runner shell script that deletes a previous runner and creates a new one, if the Pod is restarted
pod_cleaner.yml – a cron job that deletes executors that are stuck
registry-secret.yml – if you need to pull an image from your private registry
service-account.yml – a service account for the ‘gitlab-runners’ namespace
statefulset.yml – this StatefulSet deploys the gitlab-runner container

Pay your attention to the following configuration options:

config-map.yml:

kube-runner-scripts.yml

statefulset.yml:

Typical Issues

  • Executors Are Stuck
  • There is an issue with the kubernetes executor that it will not get terminated while there is at least one process running inside. The client’s developers used the inotifywait tool which processes kept hanging after the CI job had completed, which is normal since inotifywait is not aware about ‘job completion’, it just keeps monitoring the file. While with the docker executor the build container would be killed despite those processes hanging, with the kubernetes executor the gitlab-runner will never terminate the Pod. Even though the devs abandoned inotifywait in favor of an Elixir extension, this issue only disappeared partly, so same CI jobs, same conditions, but for some reason some Pods were stuck from time to time. The solution is a Kubernetes cron job that checks every hour if there is Pod that has been hanging for more than an hour (the CI job does not take more than 15 minutes) and kills it if any; the cron job also terminates old cron jobs as Kubernetes does not do it automatically.

    pod_cleaner.yml

  • GitLab Executor Killed by OOM Killer
  • The GitLab documentation suggests setting request and limit options for containers in our Kubernetes executor Pod. While the request option may be necessary, the limit option can cause much pain. If the limit for the ‘helper’ is too low, your executor simply won’t start – the Pod will be terminated upon its creation without any reason, but if you SSH into the node the look at the last entries of $ dmesg -T, you will see that that was the OOM killer who killed the Pod. In my case, everything was working fine for about 2 weeks, and then executors suddenly began to fail. I do not use limit for the ‘build’ and ‘service’ containers either, because you never know how much resources your CI job is going to need, as there can be load spikes, so you may end up with getting your CI jobs terminated by Kubernetes because they exceed their limits. That said, I recommend that you use request and do not recommend that you use limit.

Comments are closed.