The issue of cloud migration and cloud operations is tricky. There are numerous questions that arise from the idea of moving to the cloud. We have addressed some of these questions here and hope that this paper will help you better understand the issues surrounding cloud computing. Only 5–6 years ago, the idea that the cloud would be in widespread use was viewed by most IT businesses as dubious at best. Why give away your own — or, worse, your clients’ — sensitive data to an independent company, which could prove careless or malicious? Why upload it somewhere on the Web, when you have your own private and reliable data center with carefully developed systems and databases and physically available servers, over which you have full control? Today, we can safely say that a mindshift has happened: with the rise of massive data objects and distributed data management, the cloud has become a massive trend and general interest in cloud technologies continues to grow. According to , spending on cloud computing has grown at 4.5 times the rate of IT spending since 2009 and is expected to grow at more than 6 times the rate of IT spending through 2020. Forbes However, the issue of cloud migration and cloud operations is tricky. There are numerous questions that arise from the idea of moving to the cloud. We have addressed some of these questions here and hope that this paper will help you better understand the issues surrounding cloud computing. Every business situation is unique, and we recommend working with an IT consultancy to identify and implement the cloud solutions that best fit your needs. Should you need deeper insights tailored specifically to your business needs, we are always here to help. 1. Introduction 1.1. What are the advantages of moving to the cloud? No need to pay for the equipment, maintenance, and hardware upgrade. No need to pay electricity bills. No need to rent floor space for a data center. No need to hire people to run it. Access to a multitude of options and the opportunity to experiment at a low price, without any capital expenditures. Operational costs. The elastic environment model of the cloud is usage-based, which makes it possible to easily scale and increase computing power on-demand as well as reduce hardware resources when they not needed. Elasticity and pay-per-use. Quick disaster recovery and maximum resilience and uptime. Risk mitigation and maximum availability. This is a point of doubt for many, however thanks to the economy of scale, large cloud providers can invest far more resources to secure their applications and client data than most corporations. Security. With the cloud, it is possible to store, access and analyze almost unlimited volumes of data. For companies that can employ the power of big data to enhance their business performance, the cloud is the only possible data model. (However, such companies may prefer a private cloud model over a public/shared one.) Big Data and advanced analytics. 1.2. Is it for everyone, or are there any companies that are better with an in-house system? We should distinguish between three options: the traditional in-house model, the public cloud and a private cloud. The traditional model is characterized by the manual or semi-automated provisioning of server resources, which means that the data is much less safe in such environments. Cloud computing means the fully automated provisioning of server resources ( ). the pet vs. cattle metaphor Usually, when discussing the advantages of cloud migration we are speaking of the public cloud, which is owned and maintained by third-party cloud service providers. Private clouds are those that are built exclusively for an individual organization. Some organizations that use huge data warehouses and BI workloads may prefer a public cloud solution due to the benefits in workload performance, data integration, privacy and data security. Take for example the cases of Dropbox and Walmart who have recently migrated from AWS, built their own DCs and saved millions by doing this. But you need to be that big. For most small and mid-sized companies without a huge data warehouse like Walmart’s, a public cloud probably is the most efficient option. [see Bloomberg, 01.03.2018 ] [see Reuters, 14.02.2018 ], There is also another option that could be beneficial for customers who want to use cloud but are precluded by regulations, data sensitivity, or location of data from using the public model. Recently, Microsoft developed its own hybrid solution, , which gives customers a way to use a familiar cloud platform without placing their sensitive data into a multi-tenant environment. It should not be viewed as a standalone virtualization platform: it includes basic infrastructure-as-a-service (IaaS) functions that make up a cloud, such as virtual machines, storage and virtual networking, as well as some platform-as-a-service (PaaS) features including container service, serverless computing software, and MySQL and SQL Server support. Azure Stack 1.3. What should we do in advance? Are there any typical mistakes that we should watch for? To name just a few: The worst possible mistake is . All technical decisions must be aligned with the business strategy and communicated as broadly as possible within the company. Otherwise, there is a risk that even a very thoroughly designed solution will turn out to be an ‘unwanted child’ and discarded, together with all efforts and resources spent. Such matters as cost, governance, vendor lock and footprint should be carefully considered and discussed with the stakeholders in the company. not paying enough attention to the particular business situation Some technologies serve their purpose but may become too expensive in the future. That is why all technologies should be selected with the long-run perspective in mind. We recommend conducting short prototype projects (typically, 2–4 weeks). Prototypes should not be used to learn the basics of the relevant new technologies, but to put them under stress conditions and reveal the challenges that the team will face in the future, such as: Premature technology selection. performance and stress load testing scalability, configuration and deployment customization and support pricing For example, a good prototype for a database should include the deployment and benchmarking of queries with the volume of data that the database owners expect in a year. Lack of clear data governance can impact the quality of data loaded onto the cloud. Huge volumes of data may force the company to overlook whether this data is still relevant. Cloud migration has a truly transformational potential, offering an opportunity to apply big data analytics solutions to gain more business insights. However, for this to be possible, it is necessary to ensure data quality. Poor data management. It is important to involve real users as early as possible in order to discover any flaws in your planning quickly, help them adapt to the transition, dissipate uncertainty, and win more internal advocates. Lack of user involvement. 2. Technical issues 2.1. How do we plan the migration? How should we organize the entire process to make it efficient and painless? Here is our typical recommended migration framework: gather business requirements, confirm assumptions, validate hypotheses, and determine stakeholders, governance, and technologies. At this stage, a consultant company acts like a doctor, asking questions and compiling a list of symptoms such as current problems with data storage, dependencies, plans (new functionality, exponential growth of the client base, etc), legal and compliance issues and more. Align and manage areas of focus: We use proprietary Solution Design methodology for design and planning complex transformation programs. This iterative approach considers expectations, risks and constraint management and ensures successful completion of the transformation. Possible stages of this process include: inventory -> desired state shaping -> concerns and assumptions -> short PoCs -> minimal viable cloud -> pilot migration -> parallel tracks of migration & decommissioning. Planning: Project roadmap. One important recommendation is to always keep in mind the wider development context within the organization. Just like with any other complex transformation program, the cloud migration roadmap should be aligned with the roadmap of other company-wide projects to avoid conflicts and duplication of efforts as well as to ensure that in these other roadmaps there will not be any critical milestones that should be considered as we design and plan our own. To streamline the planning efforts and reduce the risk of missing a dependency at a later stage, it is crucial to learn which assets the company owns and determine which will be included in each migration. There are special dedicated discovery tools (such as Risc Networks’ CloudScape, ScienceLogic’s CloudMapper, or AWS Application Discovery Service) that help to update the inventory of applications. Planning: Portfolio discovery. Design and set-up the core infrastructure of the cloud platform. This is the stage when it is crucial to keep focus on the business strategy and determine which operations and processes are essential and how they are mapped to functionalities provided by the cloud. It is also important to outline a clear cloud governance model and define who will have access to which controls. Foundation: Migration of first platform consumer. It is important to start small and simple, migrate the system by parts and see how the process proceeds. PoC: Active implementation phase. Migration of data, apps, infrastructure evolution, and blueprint review. Multiple Design/Migration/Validation tracks. 2.2. How do we select a proper migration strategy? Can we just ‘lift and shift’ our apps, or should we transform the entire architecture? There are several migration strategies: Rehost Lift and shift style migration by deployment of the app in the cloud or migrating VMs. Pros/Cons Easiest and quickest Isolated scope of migration Suboptimal cloud utilization Does not work for systems with specific software and hardware dependencies Replatform Change the platform to be able to run in cloud. Pros/Cons: Might be the only way to migrate certain systems into the cloud Opportunity to revisit the entire ecosystem Time/budget constraints Technology constraints Refactor Change the implementation and architecture of your solution to remove dependencies, optimize performance and make it more robust, scalable and fault tolerant. Refactoring of applications Merging applications Pros/Cons Leverages cloud and SaaS capabilities Performance optimization Time/budget constraints At first, we can always choose the quickest and easiest option and then adapt the solution to the cloud as necessary. In our experience, approximately one third of clients are fine with just the rehosting option, and another third with the re-platforming option. However, bear in mind that the cloud has huge transformational potential and the migration is a great opportunity to optimize the old architecture, get rid of unnecessary dependencies and replace the obsolete parts of the system. There are symptoms suggesting that some re-architecture is required: “We are happy with how everything works, but we have this small problem…” (the cost of the cloud solution, MS SQL servers, etc). 2.3. Our system is huge. How do we transfer our data to avoid high network costs, long transfer times and security concerns? There are several solutions. One example is Amazon Snowball. It is a data storage and transfer appliance for AWS with a capacity up to 50 Tb that you can request from AWS (one or several in parallel). It is almost indestructible, self-contained and tamper-resistant. To quote the , “It is rugged enough to withstand a 6G jolt and light enough for one person to carry. It is weather-resistant and serves as its own shipping container.” Amazon Snowball was the first solution of the kind on the market, and since 2017 the and have also been introduced. Amazon website Azure Data Box Google Transfer Appliance 2.4. Our service processes multiple transactions per second. How do we move it to the cloud and ensure we don’t lose any transactions? With large, complex and specialized transactional systems (banking, for example), we never just ‘switch’ from the old system to the new one. We always strive to establish a double-write system, with one copy in the cloud and another backup copy remaining on premises, to allow for a rollback if something goes wrong. The migration is always phased to avoid any risks of malfunction in the new system. During the pilot migration we could transfer 5% of all transactions, or a particular type of transactions, or only transactions belonging to a particular client. Only after a thorough testing, when correct processing of all types of transactions is confirmed, would we proceed with the following phases of migration. 2.5. Should we transfer the entire system or are there parts better left out of cloud? A common approach is to keep hybrid infrastructure and use the cloud to enhance one’s technical capabilities, not to move to it, accept it as the only new model and never look back. One type of data often left out of cloud, is third party confidential data or software — many companies still prefer to store such data on premises for security and compliance reasons. Another type is specialized computing-intensive operations, e. g. scientific GPU computing, rendering and so on. There is usually a highly developed on-premises infrastructure in place for such operations, and the cloud may lack sufficient computing power or simply be too expensive. 2.6. Which particular technologies could you recommend? Based on our experience, the suggested approaches are, first and foremost, , and second, Those two practices involve massive automation of infrastructure, DevOps and release management. continuous integration and delivery (CI/CD) infrastructure as code (IaC). In particular, we recommend the following implementations: : Terraform by HashiCorp, AWS CloudFormation, Ansible, Puppet, Chef IaC : Jenkins, Team Foundation Server, TeamCity, Harness CI/CD : Docker, Kubernetes (may be used in all three main cloud services Kubernetes as a service). Containerization 2.7. Should we use any cloud-specific SaaS- or PaaS- components to replace parts of the legacy systems? It depends on the particular business case. on licenses, operation, security updates and patches, etc. Yes: because such components make it possible to cut numerous costs : that requires either numerous customizations and enhancements, or some very specific performance / SLAs, so standard SaaS and PaaS offerings will not work for you. No in case you use the system in a very specific way One example of the latter could be the choice between hosting a relational database on an Amazon EC2 instance or migrating its contents to an Amazon RDS instance. Amazon RDS is easier to set up, manage, and maintain than a database in Amazon EC2, and lets you focus on other tasks rather than day-to-day database administration. It is a simple out-of-the-box solution if you just want to run a regular database. Alternatively, running a database in Amazon EC2 gives you more control, flexibility, and choice, and makes it possible to set up specific performance or a customized configuration. 2.8. Should we ‘buy’ instead of ‘build’ (i.e., replace legacy components or entire solutions with servers on the market)? The answer is the same as for the question about SaaS- or PaaS-components. Buying is reasonable and cost-effective if your needs are not too specific. Building is recommended if you need a fully customized solution. We did solution design for a client, a multinational travel corporation, who wanted to combine several proprietary apps into one and needed to select a platform for this new system. They had very specific requirements with regard to UX, and they also needed to integrate with their other systems. We had to decide between building it ourselves, which would require millions in investment and many months of development, or buying and adapting ready-made components from . MS Dynamics 365 At first, the second option seemed cheaper. However, eventually we discovered that: MS Dynamics products lacked most of the required functionality, so we would have to customize them to make them work with our other systems, and this would still cost much more than building from scratch; any new features that we add would become the intellectual property of Microsoft. These are the two main dealbreakers that lead some companies to develop their own solutions instead of extending third-party products. 3. Financials 3.1. How do we measure financial gains? (Also, ‘cheaper or not?’) There are several factors you should consider in order to determine if the cloud is less expensive for you. a monthly (or yearly) (which could prove quite high depending on usage). The cloud is not cheap. Costs: invoice from a cloud provider . Hardware cost and maintenance cost. Software licenses. Renting data center space. Labor costs (not just server admins, also infosecurity, audit, physical security and building maintenance, etc.) Most of it will no longer be necessary after permanently migrating systems to the cloud. Savings: our current IT expenses Some expenses may be gone, but there will be new expenses. The cloud involves a new discipline and different DevOps practices. You will need to hire new people and train existing employees for a new required skill set. You will need to spend their time to optimize CI/CD in the cloud and to manage it. Costs: our new IT expenses. If you will need a thousand VMs for the Thanksgiving sale, you can easily do that with the cloud. Savings: elasticity. If you migrate to the cloud lift-and-shift style, without any optimization, then the will be higher. Costs: operational costs in mid-term However, in long-term the cloud will allow you to reduce costs even without any optimization — due to the . (Hardware breaks, if it is your server — you buy another one or deal with a vendor, you replace disks and clean up the dust from server blades. In case of the cloud, you receive an email 30 days in advance and in about 10 minutes you replace the VM in question with a newer version.) (There also many options for optimization!) Savings: lack of hardware replacement and maintenance 3.2. What are possible cost optimization options? To name just a few: the more you consume the less you pay, relatively. Each successive terabyte will cost you less. Wholesale discounts: Each storage has that lower your costs. You can use SaaS and pay less for the same service, and you do not need to support and manage server instances. SaaS- and PaaS-solutions provide a significant discount (up to 75%) compared to on-demand pricing if you commit to use them for 1–3 years ahead, especially if you pay upfront (all or part). Reserved instances are a bid for a low price version of on-demand instances. The catch is that they could be shut down by the provider when the spot instance price is higher than bid price. Spot price fluctuates based on supply of and demand for capacity. However, using spot instances can save you up to 90% compared to on-demand, and it is also possible to invest in the development of a fault-tolerant system that is able to survive the failure of a few instances (e.g. batch jobs or large analytics jobs for big data processing). This will improve your architecture and it will be a great option for reducing operational costs. Spot instances With the , you pay only for the actual amount of computational services provided (say, $0.20 per million of requests) — you do not need to pay for virtual machines anymore. Serverless is the next generation of cloud infrastructure, and there are more and more serverless applications every day. serverless approach 3.3. What are the hidden costs of the cloud and how can we avoid them? One big source of unexpected costs is . Sometimes people simply forget to turn off virtual machines that they do not really need. If nobody monitors the resource allocation, costs can start to grow uncontrollably. It is highly recommended to plan ahead for 12–18 months at least. the lack of data governance with too much of technical freedom In one case, a client used cloud virtual machines to run builds on CI. They started with 5 servers and the costs were very moderate, but after several months this number grew to 80 and the cloud costs went up to 20,000 USD/month. The system was designed in such a way that it was not possible to simply downscale the number of machines. We solved this problem by redesigning the system and moving CI processing to spot instances. Another example: Redshift. Redshift is an amazing database, very fast, very scalable, but also very expensive. You need to have a very good reason to use it. You can scale it up any time by adding more servers. Your cluster can grow to thousands of dollars per month, and if you are not careful, this cluster becomes an essential part of your system that your business will depend on. Without thoughtful long-term planning it is highly risky to invest in such technology as it can make you bleed money. are another topic. Enterprise systems, such as database servers, are sometimes licensed per CPU core. The more CPUs your server has, the more your license costs. If your cloud solution is very scalable and you have a lot of CPU cores in the cloud, the license can become prohibitively expensive. Software licenses 4. Security and Compliance 4.1. This old question again: why should we even trust a third party — a public cloud provider — with any sensitive data? Mature cloud providers invest heavily in security and compliance. The effort that will be required to match their level may be even too expensive to implement within most companies’ on-premises systems. Check the main provider pages on security , , and . Microsoft Azure AWS Google 4.2. How do we select a cloud provider? A potential cloud service provider must be checked carefully to make sure it is qualified, reliable, financially stable to operate over a long term, and has a good reputation on the market. Among industry best practices and standards, there are certifications like or the . Also check . ISO 27001 Cyber Essentials Scheme Microsoft’s short guide on provider selection 4.3. How to ensure security of the cloud and DevOps? From the security point of view, the cloud is not much different from any other service: you must implement best practices for the security management of your data, authentication and access controls. However, cloud solutions give you tools that better support a cloud environment. You must manage permissions for your cloud’s users, operators and developers. This is called identity and access management (IAM). The most important best practice is that you should limit user access to resources and data on a need-to-have basis, striving to have a minimal set of permissions. User permissions. includes managing network topology, firewall rules, traffic security policies, and points of contact with the Internet — in case of both security ‘at rest’ and ‘in transit’. Networking infrastructure security includes access control to the data storage, including files and databases, and encryption of data to mitigate damage even in case of data leak. Data security Software developed for the cloud should use the usual set of industry-accepted security best practices. Practice security-first programming, audit code changes, code reviews, security audits, etc. Secure software development. Do not leak your security credentials to the code repository, but instead keep credentials (such as the database server password) in secret management vaults integrated in your cloud solution. This allows you to easily monitor access to passwords, rotate credentials and revoke access that is no longer necessary. Secret management. At the end of the day, the biggest threat is neither security vulnerabilities nor breaches: it’s people. Your users may be not careful with their passwords or keys. Educate them, remind them regularly of the importance of data security, and do not forget to set a password rotation policy. Human factor. By Yuri Gubin , solutions consultant & cloud expert at DataArt Originally published at blog.dataart.com on October 01, 2018.