A list of lessons learned from the thrills and spills of various cloud migration projects.
I’ve been in the fortunate position to have been involved in several cloud migration projects in my career. Here’s a few insights I’ve collected which should hopefully help others starting out on their own cloud migration journey.
It’s hard work.
This work isn’t easy. There can be a surprising number of challenges to overcome, and certainly not just Technical challenges, for example:
- Organisational Politics — e.g. stakeholders in the project may lose their confidence in the team if it takes a long protracted time for delivery of the work without any tangible results in the interim.
- Resourcing — e.g. some individuals may be better placed than others to work on the project, but they are too busy on other projects. Lots of testing resource is also needed but it’s not always clear what to test.
- Financial — e.g. costing of provisioning of testing servers, cost of time reaching out to consultants.
Pure ‘Lift and shift’ never works.
Lifting and shifting in this context means to take a current software platform running in a hosting/data centre, and to move it in its exact and current state into a cloud based environment.
This is the antithesis to rearchitecting and employing a Cloud Native solution and taking advantage of the benefits that fully using a cloud provider’s platform has to offer.
It definitely makes sense to make as few changes as possible when you’re moving a big, hefty and terrifyingly complex platform into the cloud.
But some things you do in a hosting centre, should stay in the hosting centre.
Not only that, some of the things you do and the way you do them, will only ever work in a hosting centre.
At one point in a migration we made a fundamental assumption which was ultimately proven incorrect about Multicast being available in the cloud. This led to significant rework close to the project deadline to re-implement session management in our tomcat application server cluster.
We ultimately moved to a session persistence mechanism rather than session replication via Multicast and used a cloud based Redis cache implementation.
The end result was ultimately more resilient and meant that releases could be performed quicker with no customer impact (no need to wait for a node to become usable as sessions are replicated to it via Multicast, the node could use the persisted sessions straight away).
The trick it seems is to employ a compromise approach — start with the principle of least change but try to validate any assumptions early.
When a technical roadblock inevitably comes up, explore a cloud native solution first rather than fighting the cloud and shoe-horning in a glitter covered version of your data centre solution.
It’s absolutely terrifying.
I recall a critical point in one of the migrations where we unavoidably had to instigate a full outage to cut over to the new cloud-based platform.
There was a moment when it was a literal point of no-return — failure wasn’t an option anymore.
…and I got to press the button that made it happen 😱.
At some point after all of the work, the platform you build will eventually have to be used by potentially all actual real clients / customers and must function in exactly the same way as it did previously — and this is a scary thought indeed…
…but then is it?
- You’ve performed work that could have gone wrong before.
- You’ve produced things that have been put in front of customers.
The only thing that is scaring you is perhaps the sense of scale.
Recognise what exactly it is that is worrying you about the situation and do something productive about it.
Not everyone will ‘get it’.
Despite the complexity of cloud migration projects, the number of resources involved and the up-front expense — some people in your organisation just won’t get it.
“Right, so you’re just moving servers then?”
The benefits of cloud migrations are clear and numerous, but to those not directly involved in the project and who don’t know the intricacies of the problems you’re solving: it can be really hard to articulate how the migration will ultimately add value for the end-user.
The taboo here is that cloud migration projects can get cancelled simply because a leadership or executive team loses their faith in the project, or fails to see and understand the benefits.
They would understandably rather devote their finite resources to driving revenue and building new features rather than support a project which will cost a lot of money and time, but deliver the same software to the same customers and potentially cause them issues at the same time.
This is where I propose that we as supporters of cloud computing become true advocates of the cloud. Let’s get really clear about what the benefits are and explain them in simple and concise language.
There are all sorts of hidden benefits.
One of the great spin-offs of rebuilding a platform in the cloud is the use of infrastructure and automation technology you are almost forced to use if you didn’t previously.
In a previous migration project I was part of, there was almost no scripting or automation of any of the data-centre based environments whatsoever to start with.
In addition to this, there was a single shared on-site testing server which everyone in the organisation used and knew the private IP of (we had a memorial service for it when it eventually kicked the bucket).
If you’re taking full advantage of cloud computing, your servers should become like cattle, not specially cared-for kittens; cattle that can be butchered mercilessly at any moment ….🔪🔪🔪
For this to be able to happen (as it does in the cloud) without customers becoming angry at the disruption, you need to ensure new instances of your services can spin up without any human intervention — which is why you automate the build of your infrastructure and provisioning of your servers.
Mine and my colleagues’ simple pattern of using infrastructure and provisioning technologies was the foundation with which we built the new cloud based platform. It ultimately meant we could spin up an entire functioning platform within 30 minutes.
This was an incredible power we never had before.
We ultimately used it to also solve the problem of test environments. As a result of the migration effort, we could spin up as many test environments, exactly mirroring production, for as many teams as we wanted and the only hard limit was the cost.
…and all sorts of hidden disadvantages.
You might quickly find, not long after cut-over to the new and shiny platform that you end up with an enormous headache.
You could suddenly find that there are a large number of business processes and shadow IT that has grown up around the ability to connect to private IP addresses and specially cared-for, special-snowflake servers ❄️, that has not been accounted for at all.
Now as a result of the migration, lots of people in the organisation are suddenly unable to do their jobs; your warm fuzzy feeling at a job well done quickly gets swallowed up by frustration and apathy as you have to accommodate lots of changes at a very late stage in the project 😩.
There’s no easy way to address this other than to do your own research and to shout loudly from the hill-tops that a migration is going to happen; that you and the team need to know about all of the weird and wonderful configurations in the system and processes in the organisation that could possibly be affected by the migration.
Remember the simple things and don’t burn out.
I recall a situation where the night before we were due to give a demo of our progress at a project milestone to the CTO and we really wanted to impress him.
Myself and a colleague were still working on the platform until 3am (by our own choice). We were burning ourselves out but we wanted to get everything done and to get ahead of ourselves. A cloud migration project is truly exciting and spans across a whole lot of disciplines, so if you like solving problems, it’s incredibly easy to get sucked in.
We had encountered an issue where a certain proportion of the requests hitting the platform, were failing and we didn’t know why.
Eventually my colleague tracked down the fault to something fairly simple around some configuration for one of the application load balancers. We added some manual changes and tested the result.
It worked and we could go to sleep happy (for 4 hours).
…however we made a fatal error 😔.
The next day we attempted to demo the platform but the problem had resurfaced. Overnight the platform had rebuilt itself using the scripts we had wrote — with the faulty configuration reapplied and our manual changes obliterated… 😩.
It was an embarrassing and stressful situation and could have been easily avoided.
If we had only have rested, took a step back and taken stock of the problem we faced and enacted a plan — instead, the end was in sight, so we rushed and made simple mistakes.
Another lesson to point out, if it isn’t obvious: If you make manual changes, always automate them as soon as you understand them. Don’t wait.
Cloud migration projects should never be “done”.
The project should not be considered “done” the moment the platform is switched on and users are using it.
There are going to be problems, there will probably be outages.
A project of this type has probably never been done before in your organisation, so there are inevitably circumstances and issues that simply couldn’t have been foreseen.
All parties involved in a project should embrace this inevitability and not be surprised when things go wrong.
A DevOps Mindset
Embrace the DevOps mindset and continue to iterate on your newly deployed platform.
While there are many advantages as a result of adopting this mindset and making it intrinsic to what you do (e.g reduced costs, reduced customer impact during issues with the service), the benefit I feel most strongly about is the well-being of the team and the benefits a properly motivated and healthy team can bring.
When production issues arise — the team are ultimately going to have to solve the problem — so which way would you have it?
Would you rather the team solve this under pressure in an environment where there is a blame… *cough*… I mean “accountability” culture?…
…or in an environment where they feel psychologically safe, able to make mistakes (and learn from them) and they feel empowered to proactively solve future problems?
Embrace the DevOps mindset and reap the rewards.
The team is the most important thing.
In the migration projects I’ve been involved in, I was fortunate enough to be in teams comprised of really talented and dedicated folk, without which the projects would have been a complete and utter failure.
The team needs a good range of talent and skills because a cloud migration project is inherently so cross-cutting. I’ve seen cloud migration projects fail initially because it was assumed that one individual with a whole host of skills in one particular area could tackle everything else by themselves.
An overlooked but important skill in the migration team is leadership. Leaders can be anyone, they don’t need to be a manager. The leaders’ job is to move the obstacles out of the way out of the team, keep up morale and confidence, generally serve the team and fulfil the roles that the team are otherwise preoccupied to do.
Whatever the team needs, the leader should be the team’s servant and do it.
In previous migrations I’ve seen the leader take on the role of handling communications on the migration night, or even just make bacon sandwiches 🐷.
Cloud migration projects need a team that is diverse, highly skilled and dedicated — and most importantly has a good sense of humour to get through the tough times (I miss you Team Glitter ✨).
Thank you all for reading — I really hope it’s been useful. Please give me feedback about content you’d like me to write more (or less) about, I’d appreciate it 🙂.
P.S. Kudos to my friend Sam Tulip (Technical Architect), for the many conversations we’ve had around this and giving me a bit of inspiration for this article — also to Dom Gallagher — an amazing DevOps engineer who kept me sane during our migrations and taught me all about the finer points of network-y things like subnets.
One more thing.
If you’re new to this whole cloud malarkey and would like to know more — or you’re about to start out on your cloud migration journey in your organisation and would like to get some experienced help — you should reach out to my very wise colleagues in BlackCat Technology Solutions who have many decades of combined experience in cloud consulting.