Containerization of Spark Python Using Kubernetes

TLDR

Spark is a general-purpose distributed data processing engine designed for fast computation. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application. Kubernetes is a container orchestration engine which ensures there is always a high availability of resources. Spark uses the kube-api server as a cluster manager and handles execution. There is no default high availability mode available, so we need additional components like Zookeeper installed and configured. Spark-submit can use spark-submit directly to submit a Spark application to a Spark cluster.via the TL;DR App

no story

Written by Raghavendra_Singh | Raghavendra works for Sigmoid.

Published by HackerNoon on 2020/08/03