2,451 reads

Towards a Better GUI Programming Benchmark

by Eugen KissMarch 4th, 2018

Too Long; Didn't Read

GUI <a href="https://hackernoon.com/tagged/programming" target="_blank">programming</a> is <a href="https://medium.com/@eugenkiss/challenges-in-gui-programming-65d360466e3f" target="_blank">challenging</a>. We strive to find better ways to program GUIs and we do make progress. The fact that countless GUI toolkits with different ways to develop GUIs exist indicates that the road ahead is still long. If only there was a benchmark to compare GUI programs with one another. Then identifying good ideas would become much easier.

Company Mentioned

featured image - Towards a Better GUI Programming Benchmark

GUI programming is challenging. We strive to find better ways to program GUIs and we do make progress. The fact that countless GUI toolkits with different ways to develop GUIs exist indicates that the road ahead is still long. If only there was a benchmark to compare GUI programs with one another. Then identifying good ideas would become much easier.

In this article I present 7GUIs: A GUI Programming Benchmark. I reflect on it and look into the future in the second half by pondering how a better GUI programming benchmark could manifest itself.

Retrospective on 7GUIs

7GUIs is my attempt at creating a GUI programming benchmark. In a traditional benchmark competing implementations are compared in terms of their resource consumption. In 7GUIs implementations are compared in terms of their notation.

To that end, 7GUIs defines seven tasks that represent typical challenges in GUI programming (live version). Plenty implementations already exist and yours could be one of them. In addition, 7GUIs provides a recommended set of evaluation dimensions.

The ‘Flight Booker‘ task from 7GUIs

The project was conceived as a spin-off of my master’s thesis from 2014. Since then the GUI programming sphere has anything but stopped evolving. Four years later I take a look back.

Recent Updates

For a longer time I was not happy with the GitHub project’s structure. I was not good at responding to pull requests either. Nonetheless, people continued to bookmark the repository and add implementations. Thanks a lot to all contributers!

Then my thesis’ advisor sent me an email asking if he could help with the pull requests. Altogether this led to my decision to improve the project. The goal: Make its presentation and ownership clearer, and reduce the maintenance effort on my side.

Instead of using the GitHub Wiki for documentation there is a dedicated website now. The repository (mostly) contains links to implementations but not the implementations themselves anymore. Another addition is a live version of the 7GUIs tasks in React/MobX so that you can conveniently play with the resulting GUIs in your browser.

I am happy with how everything turned out but it is still too early to tell if the goals have been reached. In any case, I am open to feedback!

Call for Contributions

I would love to see more contributed implementations.

For example, my Elm implementation is dated and incomplete. Don’t you want to show how awesome Elm handles the 7GUIs tasks? What about implementations in popular SPA frameworks such as React, Ember, Angular or Vue?

I think you will fail at creating a better implementation than my version in React/MobX. Prove me wrong!

Impact

As I was looking back on the project I wondered:

Did 7GUIs achieve its goal to improve comparability of GUI programming approaches for framework designers and GUI implementers?

To some extent sure. It is hard to give a definite answer though. The amount of contributed implementations, forum discussions and some emails I received suggest that 7GUIs is hitting a need.

However, to my knowledge no written down comparisons have been created. From experience I know how hard it is to write a comparison. Alternatively, there is simply no demand for analyses. Maybe this is all there is to it. On the other hand, I believe that certain aspects of the benchmark could have been done differently to invite more comparisons (see the rest of this article).

All in all, 7GUIs appears to be useful despite the lack of manifested comparisons but having them would be even better and there are potential ways to make them more likely.

Shortcomings

During the four years after 7GUIs inception I was gaining much more experience in building GUIs. I noticed that some important GUI programming challenges were not covered by 7GUIs.

7GUIs put its focus more on traditional desktop applications. Like it or not, now more than ever, the most important platform for delivering GUIs to users is the browser and the interconnected web.

In addition, I looked at similar projects such as TodoMVC and HNPWA and tried to identify which aspects make them successful.

Just to be clear: I strongly believe that 7GUIs remains useful.

Still, I also believe that there is potential for a better benchmark. The results of my reflection are presented in the next section.

Towards a Better Benchmark

If I created a GUI programming benchmark today, what would I do differently from 7GUIs?

Photo by Matt Sclarandis on Unsplash

Challenges

As hinted in the introduction 7GUIs does not cover all relevant challenges that I observed in my professional experience as a software engineer. I will simply list the challenges without deeper explanations that I believe are worth tackling in a more complete benchmark. Please note that this neither is a complete nor a required set but a good starting point for a hypothetical new benchmark:

Handling a flaky and slow remote API. Not just read but also mutation operations (e.g. POST requests). This implies handling loading and error scenarios, caching and handling race conditions.
Handling loading in a smart way. For example, only showing a loading indicator after a certain amount of time. Showing it for a minimal amount of time to prevent flickering. Showing partial data during loading if it was previously fetched.
Optimistic updates
Prefetching
Change propagation
Rendering a large filterable or searchable list of items where results are fetched remotely
Undo/redo. Potentially together with remote syncing.
Navigation/routing between screens. Preventing navigation (e.g. unsaved changes)
Tabbed content
Modals. Potentially over several steps like a wizard: Dialog control
Pagination or infinite loading
Adaptive/responsive layout
Constraints (forms)
Notifications/toasts. Potentially triggered by remote events or scheduled tasks.
State restoration
Offline support
Two alternative views for the same data
Potentially something were Rx is typically suited for: Operations on past events, coordinating multiple interdependent requests.

Photo by NASA on Unsplash

You might have noticed that many challenges involve communication with a remote API. The world is becoming increasingly connected and this is reflected in the applications that are built and used nowadays. A good benchmark should take this into account.

Artificiality

For 7GUIs I made the conscious decision to have a set of several isolated tasks. A side-effect of this approach is increased artificiality. In itself this is not a bad thing—but it’s a trade-off.

By focusing on a few challenges per task it is easier to see how they are tackled. Also, it should be easier to compare solutions. As pondered above the question is whether this is important. A downside of artificiality is that possible interactions between challenges or approaches to them might be missed.

But the biggest downside is that writing a “real” application is simply more fun. If you create something, alongside solving the rather abstract challenges, that you can actually play with and use in practice you are more into it. Ultimately, the more engaging the task the more people would put the more effort into it which leads to a greater exchange of ideas—at least that is my impression.

To be concrete, a movie browsing/reviewing application would nicely fit the bills. It is general enough to map all relevant challenges and real enough to be fun.

Required Effort

If a benchmark is too small it cannot cover enough interesting challenges. It may also encourage very specialized solutions. If a benchmark is too big the required effort for its completion may disincentivize someone from writing an implementation.

Now that I have reimplemented 7GUIs recently in React/MobX I noticed that it feels slightly too big. Especially the tasks “Circle Drawer” and “Cells” could have been their own benchmarks.

TodoMVC in its original form feels too small. HNPWA, on the other hand, feels significantly too big. However, if the benchmark involves creating a non-artifical application then I believe the increased fun during development can outbalanace a higher required effort.

For me the ideal benchmark would therefore lie somewhere between TodoMVC and HNPWA but closer to HNPWA.

Specificity of Criteria

How specific should the acceptance criteria be?

Be too specific and you might introduce bias towards certain solutions. It would decrease fun, too, as you want to be able to “put your mark” on your implementation. Be too lenient and you make it hard to compare implementations missing the point of a benchmark.

In general, I feel that you should start by being quite specific and once people start complaining about certain criteria preventing good approaches, relax them. Over time, you should arrive at a good set of acceptance criteria.

Measuring Implementations

When it comes to having a set of criteria for comparing implementations against each other I am still a fan of the Cognitive Dimensions of Notations (CDs) framework which is “an approach to analysing the usability of information artefacts”.

The CDs logo

However, I concede that its presentation is dry. I tried to make it more digestible for 7GUIs but I feel the dimensions are still far from being easy to apply in practice.

In my mind, having concrete small JavaScript snippets for each dimension would help. A good and a not so good JavaScript version for a small problem. Then a short explanation why according to the discussed dimension the good version is better than the not so good one.

On the whole, measuring source code of implementations in a non-subjective, easy and useful manner is a hard problem so I’d like to see further ideas here.

Inviting Contributions

I noticed that some contributions to 7GUIs did not put much value into their presentation. Which is totally fine as this was never a requirement. Still, having higher demands contributions could actually lead to more contributions.

The idea is to make it somewhat prestigious to be listed as a contribution. This in turn should make having your contribution listed more attractive. Plus, comparing implementations against each other should become easier.

Some concrete ideas for requirements: Have clear instructions how to run your project, have screenshots, have a complete implementation, have your source code be easily browsable (e.g. host it on GitHub), have an article/document describing what’s good/special about your implementation, have someone else vouch for your contribution, have a jury judge your contribution (“peer review”). It’s hard to find a good balance between being lenient and strict but being stricter might workout better in the end.

It should be clear that you must have a high-quality reference implementation: Lead by example!

Threadit.Js

During my research I recently stumbled upon Threadit.js. I took a cursory glance at it and this benchmark seems to go into the right direction.

It is a “real” application. It provides a remote API that is flaky thus forcing implementations to handle error and loading states. It provides utility code so that you can concentrate on the core challenges. It is web-based.

Threadit.js has many good ideas and could at the very least serve as a starting point. Unfortunately, the live website http://threaditjs.com/ is down.

Conclusion

Projects such as TodoMVC, HNPWA and 7GUIs show that there is demand for GUI programming benchmarks. 7GUIs is useful but has its shortcomings. ThreadIt.js is generally close to what I believe is the right direction for a better benchmark. I will not be the one to drive such a project though. In this article I merely envisioned a better benchmark. Nevertheless, I am willing to support an endeavour to create a benchmark as I am convinced that the result will be impactful. What are your ideas?