PyCharm and Apache Spark on Mac OS X

Written by pradyumnadoddala | Published 2016/01/10
Tech Story Tags: spark | python | big-data | pycharm | apache-spark

TLDRvia the TL;DR App

In case if you do not know how to setup Spark on Mac, please refer to the previous story.

Photo by Pradyumna Doddala using Photopea

Now that you have Spark installed and built on your Mac. Let us make few changes to get the IDE running.

Steps

Set the variable in the bash_profile

sudo vim ~/.bash_profile

vim editor

export SPARK_HOME=/usr/local/spark

export PATH=$PATH:$SPARK_HOME

Now open the PyCharm.

Create a new project, and use Pure Python template.

Now lets create a python file named whatever-you-wanted-to-name.

Add the Spark python library to the interpreter.

Steps for adding the /usr/local/spark/python as the library for the Project Interpreter.

The Word Count Program

For the word count program you would need a text file.

First create a sample text file, I am gonna give some part of the text that I already wrote in this post as the input.

Finally the program,

import os

os.environ["SPARK_HOME"] = "/usr/local/spark"

from operator import add

from pyspark import SparkContext

if __name__ == "__main__":sc = SparkContext(appName="PythonWordCount")lines = sc.textFile("sample.txt", 1)counts = lines.flatMap(lambda x: x.split(' ')) \.map(lambda x: (x, 1)) \.reduceByKey(add)output = counts.collect()for (word, count) in output:print("%s: %i" % (word, count))

sc.stop()

Run

The first run is mostly a disaster, because we miss many little things.

So if we can take a quick glance at the error, it says that a module named py4j.java_gateway is missing.

So we have to refer that to the Interpreter.

Again open the Preferences, open the current Interpreter settings and add the lib named py4j-0.9-src.zip

Adding the missing lib.

Now lets rerun the code.

We can see in the below screen shot, the words and the respective count are visible.

Final Run.

I hope this keeps you busy for the next few days on trying the amazing Apache Spark.

If you’ve reached this, you’ve made it!! Have a great day!

Clap away if this helped you out. It encourages me to write more posts. And thanks for the support.

Prady | @pradyumna_d | “File Your Cryptocurrency Taxes Using BearTax!”


Published by HackerNoon on 2016/01/10