One thing stands out most from what I have learned in the past 10 years.
Non-functional failure is the most dangerous technical risk in software.
Agile is designed to allow for change, encouraging experimentation. And if you are experimenting — with your design, user experience or technology — you should expect to fail. By failing you learn more, allow innovation and will have a better product afterwards. We expect this from controlled experimentation, but are caught out when a whole service fails. It is this macro-failure that we must beware of.
credit: Rory Hanratty
There are five macro reasons I can see why digital services fail.
My definition is,
non-functionals are concerned with how your software product works not what it does (functional).
So for example, if your service allows users to obtain a fishing license but in doing this, your data is exposed, then it is insecure. This is a non-functional issue.
Some have abandoned the word “non-functional” altogether and have adopted words like “constraint” but I’m not sure this adequately covers what is needed.
It is common to see software products that have non-functional failures because these concerns are often badly understood for software. If you think about a new car, its non-functional concerns are well understood: it should be drivable by one adult (accessibility), do 100 mph without falling apart (performance), prevent others from stealing it (security) and be able to do 20,000 miles until its first service (reliability).
It is dangerous to think non-functional issues are more relevant to software architects than users. There is a close relationship between what your service does and how it does it. Issues with how your service works are often barriers to any use of your service. If your service is not accessible on mobiles or tablets users will avoid using your service. If your service has performance issues users will not be able to use your service. Look what happened to Pokémon Go in its launch week.
So we all need to take more care to ensure non-functionals are taken seriously.
I have written before about the NFR Trap in relation to system performance. The trap is to believe your team won’t have non-functional issues because you’ve got “The NFRs”.
There are some who get performance optimisation. They test, they analyse and they fix. But when real users start to use their service they have performance problems. Unfortunately these folks often get snared by the NFR Trap.
The NFR — Non-Functional Requirements — did not describe the level of real-world usage that real users impose on the service. Instead very complicated, hard to understand and detailed requirements were constructed by a smart software architect that didn’t get the user need or user context.
With service design and agile development we now have a focus on users and their needs represented as user stories instead of business requirements. Yet non-functional requirements are often represented as a list of abstract statements about things like system performance, security, usability, accessibility, availability, maintainability and business continuity. They are abstract because they doesn’t relate to users (bad), don’t mean much to most people (bad) and are difficult to test (very bad).
Often The NFR are kept separate and referenced, but are very difficult to corroborate or approve. The NFR are often derived from templates that carry an undue reverence. Take this NFR template for example. It is typical of what is perpetuated by many teams but unfortunately its inadequate.
Lets look at two common examples to understand why traditional NFRs are inadequate. Ask yourself for each one what it means for the users.
Availability must be no less than 99.9%
What this means is that the system should be available (working) 99.9% of the time — except when it is down for scheduled maintenance.
There are a few problems with this. It doesn’t relate to availability when users need it. If peak usage is in the morning time but the service is used in the daytime only then availability is business critical in the morning, important in the afternoon and not required in the evening.
How will scheduled downtime affect users? It is rare to see scheduled downtime targeted in NFRs but users don’t care about this — downtime of any kind means they can’t use the service. So for a 24x7 service understanding what users can tolerate and the resulting design for minimising or zero downtime will be important.
How can you be sure 99.9% is even necessary? It isn’t untypical for these numbers to be guessed, written in The NFR by The Architect and never questioned ever again. A better question to ask is, what is the impact to users when the service is not available and what alternatives will they have? Designing a contingency or having high impact areas of your service less complex (to allow easier redundancy) may be time better spent for your users instead of a focus on an uptime threshold.
90% of all page requests must be completed within 1.5 secs
This is an attempt to describe how responsive your service should be based on experiences with popular websites. It is written to provide confidence that your service will be “fast” for its users. Lots of time may have been spent in meetings discussing whether the target should be 1.5 seconds, 2 seconds, 3 seconds or something. But isn’t this missing the point?
Surely the point is to understand what performance is expected by your users? What performance level will allow them to use your service without frustration? To understand this, it’s necessary to speak to users and do effective research with them and analyse performance data. Some parts of your service might be time-critical, others less so. Use this to prioritise critical features within your service avoiding generalised page response time targets like the one above.
While you’re doing this make sure you validate actual performance by testing it.
It’s vital to ensure features are measured, especially for performance. Performance testing is more important than performance requirements. This is because you can easily iterate and optimise performance based on test results and user research if you are doing it. Active performance testing of features by teams should become the new normal. And the results can even be used to guide acceptable performance for users.
There is a risk though that your testing could be providing false confidence. To help avoid bogus performance testing, you need to ensure a number of realism-factors are present.
So we’ve seen that The NFR are often too abstract and are rarely considered in context of the users. Let’s rip up The NFR and start again with non-functional needs that are user-focused, testable and a regular aspect of team development.
Non-functionals can be normalised within agile development by considering them as features. Many non-functionals as we’ve already seen heavily impact on user experience and so can be written as user stories. Alternatively orthogonal features such as performance expectations can be integrated into your stories as acceptance criteria.
When is the right time to do this? Beta. The Beta phase is where you build out an end-to-end service and starting using it with production data. Just make sure your non-functional features are developed in your backlog at the beginning of Beta. Waiting until near the end of the Beta phase is an invitation to fail.
At Kainos we have written some guidance for teams moving into Beta. These were written by a bunch of Kainos technical architects who have seen the lows of non-functional failure. These will help guide some of the more important non-functional features you should be thinking about.
Let’s not be complacent when building digital services for citizens and customers. Macro-failure is bad for everyone, let’s work hard to avoid it.
Thanks to @johnstrudwick who has very kindly edited this post into something much more readable.
Credit also to the bunch of Kainos architects who are co-authors of the 20 points: Rory, davey.mcglade, Gareth Workman and Caoimhin Graham.
And if you’re interested in working with us at @KainosSoftware to build great software, we are hiring.
Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising &sponsorship opportunities.
To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!