[Disclaimer - the thoughts in this post are mine alone and the "we" in the title refers to an open-source project where I happen to be the lead developer.]
I wasn't really planning to write this blog post, but I'm quite concerned by this incident on Twitter - where a certain tool vendor was compelled to take down a blog post. The blog post in question was satire, and poked fun at Selenium - but it apparently hurt the feelings of some of the developers.
Now, I've been doing open-source for a while and I know how thankless a job it can be. But I've also drawn my fair share of attack from the big-guns of the "testing community" when I've published technical comparisons with other open-source projects. I've come to the inevitable conclusion that you will end up upsetting someone or the other on the internet - no matter how data-backed and objective you try to be. But still - it really irks me that certain influential personalities in test-automation circles do their best to shut down any kind of criticism. And in this case seem to have succeeded.
If you open-source something, you have to face the cold, hard reality that someone will criticize it at some point. One has to grow up and deal with it. Not shoot the messenger.
If things continue like this, we can't have progress. I'm going to throw caution to the winds, put myself squarely in the line of fire and get a few things off my chest.
I have been working on creating an open-source alternative to Selenium for a few months now. This did not happen overnight, and in the true spirit of open-source - I genuinely explored the option of contributing to the Selenium project. But I eventually chose the option of doing a complete re-write - and will explain my reasoning below. I will try to be respectful of the efforts that various developers over the years have put into the Selenium project. But I believe the project has some technical shortcomings, and it is important to discuss these - even though it may make some people feel uncomfortable. So here goes.
1) Contributing to Selenium is really hard - Selenium has been around since 2004 which is a terrific achievement, but it does mean that the code-base has a strong "legacy" feel to it - and has a lot of "surface area" and dead-code. I found it very hard to navigate the code base, looking for answers to what I thought would be simple questions. As a long term Java and open-source programmer, I am very particular that a project should be "build-able" locally after just doing a
and installing one or two development tools - but Selenium is very far from this ideal state. In fact, the Selenium committers regularly conduct a full-day workshop for those who aspire to contribute code to the Selenium project. What I especially don't like is that the project keeps changing the build tool. Originally I guess it was Maven, then it became "Buck" and now they moved to "Bazel" putting it even more out of the comfort zone of regular Java programmers like me. But there is a good reason why they need these "exotic" build tools - which is that Selenium needs to support multiple programming languages. Which brings me to my next point.
Now this is understandable, we programmers are a very fussy and opinionated lot, we pick a language of choice and then we tend to look down our noses at any other language. But I can't get the thought out of my head that when I look at the Selenium project, I see a massive duplication of effort and waste of energy. All we are trying to do is remote-control a web-browser, but the Selenium project ends up implementing the same thing 5 times over. Along with all the obvious challenges of having to co-ordinate releases across a relatively large development team - and having to document all of these things.
What is the alternative, you may ask. The way we have approached this in Karate is by creating a DSL (Domain Specific Language) which is programming-language "neutral". So although the engine and implementation behind the scenes is pure Java, users have to use this "one true" DSL for scripting tests. Since Karate can be used as a binary executable via the command-line, you don't need to be a hard-core Java programmer and this has been working well so far for API testing. And I believe browser-automation is yet another domain where teams should not insist on using their "favorite" programming language - if a simpler, cross-platform scripting option exists.
So I'm treating the fact that Selenium has to "spread itself thin" - as a weakness. Of course this is just my opinion, and time will tell.
3) Selenium is incomplete as a testing-framework - this seems to be by design, and I know the developers themselves will agree that this is true. To be really useful, a testing framework has to address the following concerns, there are more, but let me pick some obvious ones:
- CI integration
- Configuration and Environment Switching
- Grouping / Tagging
- HTML / reporting
Selenium does not solve for these, it is assumed that these needs will be filled either by separate unit-testing frameworks, home-grown frameworks or 3rd-party "wrapper" frameworks. Not surprisingly, there is a proliferation of frameworks both open-source and not, that all layer themselves over Selenium to address these concerns. A pet peeve of mine is that many in the test-automation community have an unhealthy obsession with "creating frameworks" instead of focusing on testing - and I consider Selenium to blame.
Karate has everything built-in, and you truly need only one library that does the job.
4) The WebDriver W3C spec has limitations - Selenium depends on the W3C WebDriver specification - the design of which is greatly influenced by the internal wire-protocol that Selenium used to use in the old days. The concept has certainly stood the test of time, but has been criticized for a few reasons - well described here. To summarize, it is stateless, requires multiple network "hops" to achieve simple browser-automation primitives, and is very loosely-coupled with the browser, which may sound like good design - but ends up being a limitation when you want to closely track what's happening within the browser. Chrome-native automation has emerged nowadays as a popular option, and even the Selenium team is exploring how to bolt on some of the advantages - but all that is still work-in-progress at this point.
Karate implements the Chrome DevTools Protocol without depending on any 3rd-party library. What's unique about Karate is that it also implements the WebDriver spec (also from scratch) and layers both options over a unified interface. You can indeed write your test-suites for Chrome, do all your dev and testing with it, and then swap out the config for another browser at run-time.
5) Limited Debug Support - some of the architecture decisions we made such as re-writing the parts of Cucumber we needed - have turned out to be extremely good ones. The fact that we now have an execution engine written from the ground up, and the fact that this happens to be an "interpreted" language that falls into the "keyword driven" category - means that we can do some pretty cool things, technically. I don't know of any other UI automation framework where you can step-backwards during a debug-session and hot-reload code.
6) Selenium "Waits" are hard to understand - one of the first things that hits you when you get into Selenium is the system of "Implicit Waits", "Explicit Waits" and "Fluent Waits" that everybody seems to be talking about and is the subject of many a blog post, tutorial and "interview-question" peddler. And I'm going to admit that I still don't fully understand what they do and how to get them to work. I was determined to design a better, less-confusing API when working on Karate's UI automation implementation - and here is the result.
7) The Element locator API is clunky - Another pet-peeve that I have when I observe the Selenium ecosystem is the amount of time spent obsessing over patterns such as the so-called "Page Object Model". I have a point-of-view that needing such patterns is a sign of weakness in your API, and it is because your API is not concise enough that people have to layer things on top of it in the pursuit of "re-use". And most of the Selenium "wrappers" don't do a good enough job IMO.
Karate from day-one has had a reputation of making commonly needed operations into one-liners, which is what you would expect from a true DSL. This section of the documentation describes our approach to the need to achieve "re-use" while still ensuring that your main "flow" remains readable. There are some interesting (and possibly controversial) decisions made - such as that the locator string can have a "prefix" that encodes the type of locator. See if you can spot some "friendly locators" below.
And by the way, I still don't understand what on earth a "Screenplay Pattern" is. So sue me.
8) Parallel and Distributed Testing is hard - and can be made simpler. You can't have a discussion on Selenium without the word "grid" coming up at some point. While doing "grid stuff" was really hard in the past - it has come a long way though, and there is Zalenium and 3rd party alternatives such as Selenoid / Aerokube that make things easier nowadays. Surprisingly, most testing frameworks fail at parallel testing, but Karate has the advantage that since it started out as a "headless" API testing framework, it had to get parallel testing right - from day one, and this is part of the core, not an add-on.
And we are experimenting with a way to do distributed parallel testing without needing to provision or install a "grid" equivalent, and which just works in your existing CI set-up - so please help us test this if you can !
9) Hybrid API and UI tests are hard - this is probably an unfair advantage that Karate has, because it started out as an API testing tool and which I strongly believe is the right architectural "sequence". I see many tool vendors and projects that started out as a UI automation solutions - but then had to retro-actively "bolt on" poor substitutes for API testing. More and more people are catching on to the fact that moving down the "testing pyramid" is a good thing.
So with Karate you can indeed achieve very effective approaches such as getting an Auth token via an API call and then dropping a cookie or two onto a blank page - which means that you can completely by-pass a time-consuming and potentially "flaky" sign-in "form" or UI. Karate has a serious advantage as a framework that can do both API and UI testing - within the same syntax, within the same test flow.
Karate happens to have API mocking (which can even serve HTML) and also performance-testing capabilities, and we are excited about how these can potentially complement - or be mixed into UI testing in the future.
By the way, we have contributors working on Appium support, which is a sign that Karate is flexible enough to handle mobile and even desktop application testing within the same core framework.
So I'll stop here with these top 10 !
Karate is certainly a new entrant into the UI automation "wild west", but the community reception and adoption of the API testing capabilities has been extraordinary, and we are confident that the community will appreciate the things that we have tried to improve, and the things that we have tried to do differently.
Please try it out, let us know what works and what doesn't - and if you found the arguments above compelling, please support this project by spreading the word - even if you can't contribute code ! Thanks :)