Top 7 Announcements from Data and AI Summit 2021

Written by frankmunz | Published 2021/06/04
Tech Story Tags: ai | data | it | artificial-intelligence | development | announcements | summit-2021 | tech

TLDR The Data and AI Summit (formerly Apache Spark Summit) for the Busy IT Professional (DAIS) has been underway for day 1 of the event. The Linux Foundation-owned, open-source project Delta.io that brings reliability to your data lake matured to version 1.0 now. With Delta Sharing you can share massive, live data from your Lakehouse between clouds or on-premises. Databricks Machine Learning brings together managed MLflow, and introduces new components, such as AutoML and the Feature Store, and supports the full ML lifecycle.via the TL;DR App

Highlights from the Data and AI Summit (former Apache Spark Summit) for the Busy IT Professional
Right now, I am actually waiting for the keynote of day 2 to kick off. So let me summarize the announcements from day 1 of the Data and AI Summit (DAIS) for you.

Delta.io

Delta Lake has grown up :-). The Linux Foundation-owned, open-source project Delta.io that brings reliability to your data lake matured to version 1.0 now. Happy Birthday, Delta! Key features of Delta 1.0:
  • Delta Standalone Reader, JVM implementation that understands tx protocol, but does not need SparkRust implementation for Delta 1.0, also Support for Ruby and Go.
  • Delta 1.0 supports Spark 3.1 - also improves predicate pushdown.

Delta Sharing

A brand new open source project for data sharing was announced at DAIS. With Delta Sharing you can share massive, live data from your Lakehouse between clouds or on-premises. It’s secure, fast, cheap, and reliable using underlying cloud storage systems such as S3, ADLS, and GCS. And it’s open source & open standard. Recipients accessing shared data can directly connect to it through pandas, Tableau, or dozens of other systems that implement the open protocol. #DataAISummit.

Unity Catalog

Databricks rolled out a data catalog, Unity Catalog. Unity Catalog solves a major issue. Imagine a CSV file stored in your S3 data lake and you want to grant access to certain rows only?
Unity Catalog enforces permissions at the row, column, or view level instead of the file level. It governs tables and ML models. Simply use ANSI SQL standard GRANT statements, or discover data assets from the UI. Works for the Lakehouse on all clouds.

Delta Live Tables

My fav #DataAISummit announcement: Delta Live Tables. Data flow made simple: Specify the outcomes that a pipeline needs to achieve using SQL or Python. Treat your transformations and data quality expectations as code.
And now for day 2 of DAIS!

Databricks Machine Learning

Databricks Machine Learning brings together managed MLflow, and introduces new components, such as AutoML and the Feature Store, and supports the full ML lifecycle.

Feature Store

MLflow integration enables the Feature Store to package up feature lookup logic hermetically with the model artifact. When an MLflow model that was trained on data from the Feature Store is deployed, the model itself will look up features from the appropriate online store.
The Databricks Feature Store automatically tracks the data sources used for feature computation, as well as the exact version of the code that was used.

AutoML

AutoML allows you to quickly build and deploy machine learning models by automating the heavy lifting of preprocessing, feature engineering, and model training/tuning. AutoML detects the best preprocessing, ML model, and hyperparameters for you and creates a notebook with all steps required. It automatically tracks trial run metrics and parameters with MLflow and easily enables teams to register and version control their models in the Databricks Model Registry for deployment. You define the max runtime that you want to spend to solve the task.
There are more video resources from DAIS 2021 that I recommend: The fireside chat with Bill Inmon (DWH inventor)Delta 1.0 announcementDelta SharingSQL and Photon updatesUnity Catalog, and Delta Live Tables.
Slice & DAIS 2021 — EMEA Live Event
Join us for the first Slice & DAIS session that talks about all the new announcements in a beginner-friendly way.

Written by frankmunz | Databricks DevRel EMEA. AWS & GCP, Big Data & ML certified. My Uber rating was 4.9 before the world shut down.
Published by HackerNoon on 2021/06/04