Introduction To AWS Lake Formation

Written by canyurt | Published 2020/06/11
Tech Story Tags: aws-data-lake | aws | big-data-analytics | cloud-computing | amazon | cloud | cloud-storage | aws-lake-formation

TLDR Amazon Web Services (AWS) recently announced the release of the new service “AWS Lake Formation” at the AWS re:Invent in Las Vegas. This article provides a brief explanation of what the service does and why it can be important for your organization. The introduction of this service is a valuable step from AWS to better set up and manage the vast amount of data you find in a big data environment. The data is stored in both its original form and in prepared forms and can be used in different types of analytics solutions.via the TL;DR App

What does it mean for your organization?

Amazon Web Services (AWS) recently announced, among many other important updates, the release of the new service “AWS Lake Formation” at the AWS re:Invent in Las Vegas. This article provides a brief explanation of what the service does. Furthermore, it explains why it can be important for your organization.

What is Amazon Web Services Lake Formation?

AWS Lake Formation is a service that lets you set up a secure data lake within only days, with a centralized, curated and secured repository that stores all your data. The data is stored in both its original form and in prepared forms. The latter can be used in different types of analytics solutions to provide insights and to guide your organization towards better business decisions. Simply put, you set up your data lake in days and you let AWS do the heavy lifting for you.

Why is it important?

The introduction of this service is a valuable step from AWS to better set up and manage the vast amount of data you find in a big data environment. Setting up data lakes is a challenging, but necessary task. Each day we see more of our clients moving away from traditional analytics environments with relatively more costly data warehouses and focus more towards data lakes with new capabilities and that are more cost efficient.
The biggest differentiation of the new data lake approaches is that the architectures now treat data warehouses as only one of the critical components, among several others (e.g. Big Data Storage & Processing – HDFS/S3/ Spark/ NoSQL DBs/ Elastic Search, Metadata Management/ Data Lineage, Machine Learning/ AI, ELT tooling, API Management…etc.).
One could argue that data warehouses are becoming data marts in these new data lake architectures, thus dividing the focus onto other big data components in the architecture as well.
Setting up a data lake needs preparation. Your organization should think and decide on an efficient operating model for big data with a “Future State Architecture” that is prepared for the future, so that business is equipped with the right tools to make the biggest impact.
As Deloitte we know this is not an easy task; it contains many pillars to take into consideration that are interconnected. AWS Lake Formation provides a packaged data lake solution covering many of these important pillars, if not all.
AWS Lake Formation is a part of the main platform of AWS: the AWS Cloud environment. This environment covers the heavy-lifting around how to ingest, catalog, clean, change, secure and control access to your data. It even contains built-in machine learning capabilities for data quality, for example filtering and managing of duplicate records.

Any considerations?

The benefits of such a platform are now clear, but as expected and similar to other cloud service providers, AWS Lake Formation is AWS focused and can be incompatible with other cloud or on-premise solutions. The new type of architectures we build frequently revolve around agility.
Most of our clients want to build data pipelines faster and faster, in order to drastically reduce time-to-market. This vital requirement mostly requires flexible and different technologies to be present in your organization.
When planning your data pipeline, you will end up with a list of services that you require. Most of the time the definition of these services will be suitable for different technologies, one service being suitable for an on-premise approach whereas other ones suitable for certain cloud provider(s). Therefore the data pipelines will mostly result in a need for multiple platforms including different cloud service providers and on-premise capabilities.
That is why we advise taking into account how to manage different platforms together with AWS Lake Formation. However, this does not change the fact that AWS with AWS Lake Formation is strongly positioned to help you setting up complex big data structures with relative ease.

Verdict?

In conclusion, AWS Lake Formation supports the important initiative to spend less time managing data so that organizations can spend more time obtaining insights from data and using analytics for supported business decisions.
Data management principals are vital and are non-negotiable for a successful data driven organization, but the real differentiation lie within the analytics scope to make the biggest impact. With AWS Lake Formation, AWS has taken an important step towards that goal.

Written by canyurt | data & analytics
Published by HackerNoon on 2020/06/11