Automation of routine tasks paves the way to creating truly self-managed environments, where the system itself handles the configuration. Today we discuss how to create self-healing IT infrastructure. There always is some transition from here to there, some evolutionary process. For example, the next big thing in the automotive industry is the worldwide acceptance of self-driving cars. Despite certain fatal failures from Elon Musk’s , and . These manufacturers act according to a . Tesla autopilot Ford plans to produce self-driving cars Daimler-Benz is already testing self-driving trucks 5-step plan to achieve driverless cars The hard part of the transition is the attitude shift — the drivers must accept the role of passengers, not the masters of the road. The benefits are supreme, though — fully-automated delivery system working 24/7 and ensuring fewer car crashes and human casualties on the roads. The road to this utopia might seem long, yet the automotive giants cover it with seven-league steps. 5 steps to a self-healing IT infrastructure What about the IT industry though? The obvious direction of evolvement there is automation. When more and more routine tasks is automated, work hours and resources can be allocated to improving the infrastructure and increasing its performance, not to manually solving numerous tedious tasks. Building the self-healing IT infrastructure capable of performing the routine tasks automatically would simplify the DevOps workflows greatly. The bad thing is, there is no industry-defined roadmap to achieve this state of software delivery. Today we explain our vision on how to create self-healing IT infrastructure. This is the short roadmap of a long process that is most likely going to take around 5–10 years: Using the immutable Infrastructure as Code Covering all the system states and software code with automated tests Deploying the holistic logging and monitoring systems Leveraging the latest smart alerts, triggers, and prescriptive analytics Overseeing the performance of self-learning and self-healing IT infrastructure Below we cover these steps in more details. Immutable IaC as the basis of a self-healing IT infrastructure One of the most laborious and routine tasks a modern IT engineer has to face is provisioning the servers. It is a time-consuming and highly error-prone process when done manually. Even if the modern-day don’t have to install the physical servers into the racks, they do have to configure them in multiple dashboards before providing the ready development, testing, staging or production environments. Treating the replaces this process with working according to simple, understandable and easily adjustable manifests. Using Kubernetes to manage the Docker containers with applications and Terraform to programmatically deploy and configure the needed servers helps turn a long and error-prone process into a streamlined software delivery pipeline. DevOps teams immutable Infrastructure as Code Automated testing as the key to keeping the codebase efficient The is one of the . Instead of leaving the testing and bug fixing close to the end of the software delivery lifecycle, the developers shift all the testing — integration, security, completeness, etc. — to the left across the software creation pipeline. Automated unit tests for the product are written before the development of the product itself begins, and are always updated in parallel with the main development process. Thus said, over time the team has all the product codebase covered by the tests that run automatically, instead of running them manually on the daily basis. Integration testing ensures the system components are stable at all times and new releases will not cause the system failure after being pushed to production. shift to the left hottest DevOps trends of 2018 Logging and monitoring are the keys to self-healing infrastructure Logging and monitoring tools should be picked on the stage of the system architecture design and integrated with the solution components in order to efficiently collect all the essential details of the . Detailed logs greatly simplify finding the roots of the issues and building the response manuals. Once a DevOps engineer produces a solution for a specific issue, any system administrator (or even the developer) is able to follow the checklist in the future, lowering the workload of a qualified DevOps specialist. In addition, once the sufficiently large log database is gathered, Machine Learning algorithms can be applied to it to train the system to deal with routine issues automatically. DevOps system performance Smart alerts, triggers, and prescriptive analytics As the logging and monitoring tools become more sophisticated, they are able to include much more information in error reports. Instead of simply showing if the system component is up or down, generate smart alerts with much more detailed information, allowing to . Once such errors are solved, appropriate triggers can be created along with the prescribed responses for each situation. Due to this approach, multiple repetitive problems can be described and efficiently solved without even requiring the attention of the DevOps team. Most importantly, such triggers can be set for the roots of the issues. This allows preventing the problem instead of dealing with the consequences. modern monitoring tools cut the problem-solving time by 90% Overseeing the performance of a self-healing IT-infrastructure The last step of the way to a self-healing IT infrastructure is the process of constantly training the deployed Machine Learning algorithms against the ever-growing base of logs. In a utopian world of 2025+, the DevOps engineers will receive the notifications of the potential issues and approve the solutions offered by the self-healing IT infrastructures. To say even more, this process will also be subject to the machine learning, so with time, there will be less and fewer errors requiring human attention, ensuring stable infrastructure performance and allowing the DevOps teams to concentrate their effort on improving the overall system architectures, not firefighting small issues daily. Final thoughts of the steps to create self-healing IT infrastructure Despite this utopian picture, it is least likely the layman DevOps teams will lay their hands on such systems any time soon. The human and financial resources needed to develop a system required to implement this approach is far beyond the reach of an average business. Thus said, as always we have to wait and hope that the industry giants like will create the platforms similar to AWS Lambda or Kubernetes and open-source them. Only when — and if — it is done, the DevOps talents worldwide will be able to benefit from using the self-healing IT infrastructure. AWS or GCP What do you think on the aforementioned evolutionary process? On what stage of this path does your company or organization currently seems to be? Do you plan to move to the next stage soon? Please share your thoughts and opinions in the comments section below! Originally published at itsvit.com on September 24, 2018.

Routine

Tesla

How to create a self-healing IT infrastructure

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 disruptive DevOps trends of 2018

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

10 disruptive DevOps trends of 2018

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps