Dan Garfield

Chief Technology Evangelist, Codefresh | Google Developer Expert | Forbes Technology Council

The #1 Reason Jenkins Fails

Jenkins down? Pipelines broken? Hackers making off with your data? It often stems from one fatal flaw. While Jenkins has been both loved and hated for being DevOps duct tape, every user knows there are plenty of issues to deal with. Over the past three years, as part of my work at Codefresh I’ve been helping teams migrate from Jenkins and I’ve seen all of these issues over and over again. The biggest issue may surprise you because it creates a cascade effect.
The simple answer:
Shared libraries.

Shared Libraries are Jenkins’ Biggest Flaw

In theory, the idea of operating libraries on a global space sounds great. You can install two plugins and one can rely on another. It’s just like a good ‘ol fashioned package manager right? Well, not really. Package managers handle dependencies, including versioning relatively well but Jenkins doesn’t do this and it causes enormous problems. 
Jenkins has been both loved and hated for being DevOps duct tape...
Here are are least 3 big ones.

1) A Total Lack of Isolation

Because shared libraries operate in a global space, different tooling can easily conflict. Let’s say my unit test suite relies on Java 8.2 but my performance testing suite still needs Java 7. Now I’ve got a pipeline that won’t work because this software can’t coexist. Developers, or devops folks, are stuck with the unenviable task for trying to reconcile all these versions. 
Add to that the fact that plugins can rely on each other and you’ll know why Jenkins admins never want to install or upgrade plugins. Upgrading one plugin can easily break another plugin. So how do teams resolve this?
The answer: proliferation.

2) Jenkins Instance Proliferation is a Security Nightmare

When people run into the problem of isolation they solve in one of two ways. They reconcile all the differences and standardize their toolset (super hard), or they spin up a new Jenkins instance (super easy). I’ve worked with organizations that have thousands of Jenkins instances because of this very problem. 
At Kubecon Seattle last fall, during the DevSecOps Day, one dev, who I’m sure would prefer I not mention their prominent company name, stood up and said “The number 1 reason we get owned is because of rogue Jenkins instances.”
“The number 1 reason we get owned is because of rogue Jenkins instances.” - Kubecon DevSecOps Day
Shared libraries and their lack of isolation is at the root of this problem. CI systems have long been a favored attack vector because of lack security attitudes and Jenkins wasn’t designed with a ‘security first’ mentality. Because Codefresh offers both isolation at the step level, as well as role-based access controls and a number of other security features, teams are typically able to work off of shared instances.

3) Admins Are the Only Ones Who Can Really Change Things

This shared library system also means users rely on admins to make changes. It’s one of the reasons dev teams decide to go rogue and spin up their own instances. If your team can’t deploy because you have to wait 5 days for the operations team to upgrade the Kubernetes plugin (which by the way will break everyone else’s pipelines) then you’re going to get fed up and spin up your own instance. After all, your job is to ship code, not coddle the operations team, right? 
Consider the alternative. One of the reasons we see so many engineering teams switching from Jenkins to Codefresh is because Codefresh uses container-based pipelines. Every step in a pipeline is literally it’s own container. If you want to use the official Terraform docker image, the syntax is easy:
Using Terraform with Codefresh:
DeployWithTerraform: 
  image: 'hashicorp/terraform:0.12.0' 
  title: 'Deploying Terraform plan' 
  commands: 
    - 'terraform init'
    - 'terraform plan'
This means I can pull in the tools that I want without getting admins involved. If I want to upgrade the image, I just select a new tagged version. Plus we have a whole library of these that have been checked for security and you can easily create your own steps.

Container-based Pipelines Are Quickly Becoming the Industry Standard

For all of these reasons, and many more, the move to container-based pipelines is becoming a must. The other added benefit of using container-based pipelines is stability. When you have to maintain a shared library for 1000 users, it’s difficult to upgrade and predict success. Especially because of the complex relationships between these shared libraries. 
With container-based pipelines, the tasks a container has to do are isolated down to one thing. This container’s job is to do canary releases and this container’s job is to make pull requests. Each image has only the tooling it needs to do its job and nothing more. They don’t rely on each other's codebases though they may consume the outputs in some cases. 
The great thing about the Codefresh model is these containers actually operate in a shared volume, so if I gather test results in one container, I can actually upload those test results in my next step. 
Copy to S3:
    image: 'codefresh/s3copy'
    environment:
      - SOURCE="path/to/somefile"
      - DESTINATION="s3://path-to-s3-bucket"
And that shared volume automatically provides caching with no overhead. 
Jenkins shared libraries on the other hand? Those are probably best forgotten.

Tags

Comments

December 23rd, 2019

While I agree with the idea that led you to write this article, I don’t agree with the analysis nor the final conclusion.

In my opinion Jenkins Shared Library is being misused by many people. Even by the Jenkins maintainers itself. They recommend that Shared Libraries must not be complex. This is probably by design due to the way groovy is implemented. But it should be the opposite. Jenkins Shared Library are so powerful that you can do exactly what you suggest to do with other platform, but with Jenkins instead. I’ve don’t exactly that with Jenkins at more than one company. Created a pipeline configuration file, I parse it, and then the pipeline executes based on it.

Why do that instead of migrating to another platform? Because Jenkins provides a wide range of plugins for any use case (including test report, for instance), and because the company can easily define standard way of writing pipelines. With most of other solutions (like GitLab CI for instance) you can’t. You just have a YAML file and that’s it. There’s no way to implement standard code and processes in the middle. Shared Pipelines allow you to define a pipeline in a YAML file (or any other way, it’s groovy code so it’s up to you) and still have glue between all stages, are have processes implemented.

I’m not saying it suits every use case, I just wanted to give you the heads up that this is also doable in Jenkins.

January 6th, 2020

Thanks for the thoughtful comment. I have definitely see organizations do Jenkins very well but it’s so rare. I think the attitude taken by the community of “well they just are doing it wrong” is missing the point. If only 5% of users “do it wrong” then the software has great heuristics. But if 90%, or even as you’ve said “Jenkins maintainers” themselves get it wrong then the heuristics are way off.

I wrote this article from the perspective of someone that works with a lot of people migrating away from Jenkins and why they made that choice. The point you bring up about being able to define standard code and processes, while possible in Jenkins isn’t easy, and is one of the things that drives people to solve that exact problem with Codefresh.

Topics of interest