One thing stands out most from what I have learned in the past 10 years. Non-functional failure is the most dangerous technical risk in software. Agile is designed to allow for change, encouraging experimentation. And if you are experimenting — with your design, user experience or technology — you should expect to fail. By failing you learn more, allow innovation and will have a better product afterwards. We expect this from controlled experimentation, but are caught out when a whole service fails. It is this macro-failure that we must beware of. Macro Failure credit: Hanratty Rory There are five macro reasons I can see why digital services fail. . This is most obvious type of failure when software is not functional. It is the right product but it’s defective. Building a product that doesn’t work . Fundamentally the software was not delivered to its users. Starting but not finishing the product . Running over budget is not automatically failure but budget overrun added to one of the others is. Spending more money than expected major . You might have the best developers with the best technologies but if the product you give to your users is the wrong one then you’ve failed. Building the wrong product . This is the silent but deadly reason for failure. It is often badly understood by product owners. This is the failure I want to explore. Building a product that may work but can’t be used What is Non-Functional? My definition is, non-functionals are concerned with your software product works not it does (functional). how what So for example, if your service allows users to obtain a fishing license but in doing this, your data is exposed, then it is insecure. This is a non-functional issue. Some have abandoned the word “non-functional” altogether and have adopted words like “ ” but I’m not sure this adequately covers what is needed. constraint It is common to see software products that have non-functional failures because these concerns are often badly understood for software. If you think about a new car, its non-functional concerns are well understood: it should be drivable by one adult (accessibility), do 100 mph without falling apart (performance), prevent others from stealing it (security) and be able to do 20,000 miles until its first service (reliability). It is dangerous to think non-functional issues are more relevant to software architects than users. There is a close relationship between your service does and it does it. Issues with your service works are often barriers to use of your service. If your service is not accessible on mobiles or tablets users will avoid using your service. If your service has performance issues users will not be able to use your service. Look what happened to in its launch week. what how how any Pokémon Go So we all need to take more care to ensure non-functionals are taken seriously. Silent But Deadly: The NFR Trap I have written before about the in relation to system performance. The trap is to believe your team won’t have non-functional issues because you’ve got “The NFRs”. NFR Trap There are some who get performance optimisation. They test, they analyse and they fix. But when real users start to use their service they have performance problems. Unfortunately these folks often get snared by the NFR Trap. The NFR — Non-Functional Requirements — did not describe the level of real-world usage that real users impose on the service. Instead very complicated, hard to understand and detailed requirements were constructed by a smart software architect that didn’t get the user need or user context. With service design and agile development we now have a focus on users and their needs represented as user stories instead of business requirements. Yet non-functional requirements are often represented as a list of abstract statements about things like system performance, security, usability, accessibility, availability, maintainability and business continuity. They are abstract because they doesn’t relate to users (bad), don’t mean much to most people (bad) and are difficult to test (very bad). Often The NFR are kept separate and referenced, but are very difficult to corroborate or approve. The NFR are often derived from templates that carry an undue reverence. Take . It is typical of what is perpetuated by many teams but unfortunately its inadequate. this NFR template for example Two Examples Lets look at two common examples to understand why traditional NFRs are inadequate. Ask yourself for each one what it means for the users. Availability must be no less than 99.9% What this means is that the system should be available (working) 99.9% of the time — except when it is down for scheduled maintenance. There are a few problems with this. It doesn’t relate to availability when need it. If peak usage is in the morning time but the service is used in the daytime only then availability is business critical in the morning, important in the afternoon and not required in the evening. users How will scheduled downtime affect users? It is rare to see scheduled downtime targeted in NFRs but users don’t care about this — downtime of any kind means they can’t use the service. So for a 24x7 service understanding what users can tolerate and the resulting design for minimising or zero downtime will be important. How can you be sure 99.9% is even necessary? It isn’t untypical for these numbers to be guessed, written in The NFR by The Architect and never questioned ever again. A better question to ask is, what is the impact to users when the service is not available and what alternatives will they have? Designing a contingency or having high impact areas of your service less complex (to allow easier redundancy) may be time better spent for your users instead of a focus on an uptime threshold. 90% of all page requests must be completed within 1.5 secs This is an attempt to describe how responsive your service should be based on experiences with popular websites. It is written to provide confidence that your service will be “fast” for its users. Lots of time may have been spent in meetings discussing whether the target should be 1.5 seconds, 2 seconds, 3 seconds or something. But isn’t this missing the point? Surely the point is to understand what performance is expected by your users? What performance level will allow them to use your service without frustration? To understand this, it’s necessary to speak to users and do effective research with them and analyse performance data. Some parts of your service might be time-critical, others less so. Use this to prioritise critical features within your service avoiding generalised page response time targets like the one above. While you’re doing this make sure you validate actual performance by testing it. Test It, Don’t Just Require It It’s vital to ensure features are especially for performance. Performance testing is more important than performance requirements. This is because you can easily iterate and optimise performance based on test results and user research . Active performance testing of features by teams should become the new normal. And the results can even be used to guide acceptable performance for users. measured, if you are doing it There is a risk though that your testing could be providing false confidence. To help avoid bogus performance testing, you need to ensure a number of realism-factors are present. . Even better test on Production if it’s a new service. Test on production-sized infrastructure , ideally production data or anonymised production data. Test with production-sized datasets . You want your service to be as responsive on its worst day as its best day. This means testing at peak load. Test response time at peak load . Often response time testing is done from edge of the data-centre only (largely because it’s easier to measure). However this doesn’t help your users. Ensure the testing is end-to-end based on user devices. If you have a significant group of users with older devices on sub-broadband speeds their performance will be substantially slower. Ensure the testing throttles connectivity and client performance So we’ve seen that The NFR are often too abstract and are rarely considered in context of the users. Let’s rip up The NFR and start again with non-functional needs that are user-focused, testable and a regular aspect of team development. Non-Functional Features Non-functionals can be normalised within agile development by considering them as features. Many non-functionals as we’ve already seen heavily impact on user experience and so can be . Alternatively orthogonal features such as performance expectations can be integrated into your stories as acceptance criteria. written as user stories When is the right time to do this? Beta. The Beta phase is where you build out an end-to-end service and starting using it with production data. Just make sure your non-functional features are developed in your backlog at the of Beta. Waiting until near the end of the Beta phase is an invitation to fail. beginning At Kainos we have written some guidance for teams moving into Beta. These were written by a bunch of Kainos technical architects who have seen the lows of non-functional failure. These will help guide some of the more important non-functional features you should be thinking about. You have a Backlog that contains non-functional features. You have a Backlog that ensures operational (live running) aspects of functional stories are accounted for. You have a production environment for the start of private Beta. You have a deployment pipeline to build, package and release features into production rapidly. You have a deployment pipeline to build, package and release patches into production rapidly. You have invested in automation for builds, tests and deployment (application and infrastructure). You have instrumentation to understand what your users are doing with the service. You have got aggressive scale and performance targets. Don’t be satisfied with historic peaks. You have load, stress, bandwidth and soak tested your service beyond these performance targets. You have tested all integration points (internal and external to your service) for performance, scale and stability. You have performance, application and infrastructure monitoring to understand what your service is doing. You have alerting to proactively identify performance and stability issues. You are aggregating log information to a central point and making it available for all developers. You have accessibility testing planned with real users. You understand the security classification of the service and its data — and what this means for development, testing and production. You have in-sprint security testing or regular checkpoints with the security specialists. Security controls for both application and infrastructure required for live operations are present across all environments in the pipeline. You are able to deliver the service to a wide range of browsers and devices. You know how you will open source your code repositories without including sensitive configuration. You are able to test failure of your service and know how it will respond if parts aren’t available. You have discussed and agreed the contingency options for the digital service with the customer and where appropriate prioritised work on building contingency up front (particularly important when working with hard deadlines). The Finish Let’s not be complacent when building digital services for citizens and customers. Macro-failure is bad for everyone, let’s work hard to avoid it. Thanks to who has very kindly edited this post into something much more readable. @johnstrudwick Credit also to the bunch of Kainos architects who are co-authors of the 20 points: , , and . Rory davey.mcglade Gareth Workman Caoimhin Graham . And if you’re interested in working with us at @KainosSoftware to build great software, we are hiring is how hackers start their afternoons. We’re a part of the family. We are now and happy to opportunities. Hacker Noon @AMI accepting submissions discuss advertising &sponsorship To learn more, , , or simply, read our about page like/message us on Facebook tweet/DM @HackerNoon. If you enjoyed this story, we recommend reading our and . Until next time, don’t take the realities of the world for granted! latest tech stories trending tech stories