We just released a new to help you (any Spark user) run commands smoothly — such as inserting dependencies, project source code and more. open source boilerplate template spark-submit TLDR: Here is an open source template to help you get started At Soluto, as part of Data Scientist day-to-day work, we create (Extract, Transform, Load) jobs. Our main tool for this is , specifically, , with . ETL Spark PySpark spark-submit is used for distributed computing on large-scale datasets. helps you launch your code application on your cluster. Spark spark-submit Here are some examples of jobs we run daily at Soluto: Creating offline content recommendations for users Aggregating single events into more logical tables — as part of our service we offer tech support via chat messaging. Instead of having multiple message events for a single support session, we create SessionsTable with one session entity that holds all the aggregated information of a single chat session Some of the basic needs when using Spark for ETL jobs: Passing arguments Creating Spark context and sql context Loading your project source code (src directory) Loading pip modules (with simple requirements file) We created a that can help you get started running ETL jobs using PySpark (both using spark-submit and interactive shell), create Spark context and sql context, use simple command line arguments and load all your dependencies (your project source code and third party requirements). simple template So if you’re starting a new Spark project, “Fork” it on and enjoy Sparking it up! GitHub Please feel free to share any thoughts, open issues and contribute code!