Containerization of Spark Python Using Kubernetes

Written by Raghavendra_Singh | Published 2020/08/03
Tech Story Tags: pyspark | hadoop | kubernetes | apache-spark | big-data | yarn | containers | devops

TLDR Spark is a general-purpose distributed data processing engine designed for fast computation. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application. Kubernetes is a container orchestration engine which ensures there is always a high availability of resources. Spark uses the kube-api server as a cluster manager and handles execution. There is no default high availability mode available, so we need additional components like Zookeeper installed and configured. Spark-submit can use spark-submit directly to submit a Spark application to a Spark cluster.via the TL;DR App

no story

Written by Raghavendra_Singh | Raghavendra works for Sigmoid.
Published by HackerNoon on 2020/08/03