Hackernoon logoPyCharm and Apache Spark on Mac OS X by@pradyumnadoddala

PyCharm and Apache Spark on Mac OS X

PyCharm and Apache Spark on Mac OS X are available for Mac users. The first run is mostly a disaster, because we miss many little things. We have to refer that to the Interpreter. The Word Count Program can be used to run a word count program. It uses a text file to output the words and the respective count. The word count is visible in the below screen shot, and the words are visible. I hope this keeps you busy for the next few days trying the amazing Apache Spark.
In case if you do not know how to setup Spark on Mac, please refer to the previous story.
Photo by Pradyumna Doddala using Photopea

Now that you have Spark installed and built on your Mac. Let us make few changes to get the IDE running.

Steps

Set the variable in the bash_profile

sudo vim ~/.bash_profile
vim editor
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME

Now open the PyCharm.

Create a new project, and use Pure Python template.

Now lets create a python file named whatever-you-wanted-to-name.

Add the Spark python library to the interpreter.

Steps for adding the /usr/local/spark/python as the library for the Project Interpreter.

The Word Count Program

For the word count program you would need a text file.

First create a sample text file, I am gonna give some part of the text that I already wrote in this post as the input.

Finally the program,

import os

os.environ["SPARK_HOME"] = "/usr/local/spark"
from operator import add

from pyspark import SparkContext

if __name__ == "__main__":
sc = SparkContext(appName="PythonWordCount")
lines = sc.textFile("sample.txt", 1)
counts = lines.flatMap(lambda x: x.split(' ')) \
.map(lambda x: (x, 1)) \
.reduceByKey(add)
output = counts.collect()
for (word, count) in output:
print("%s: %i" % (word, count))

sc.stop()

Run

The first run is mostly a disaster, because we miss many little things.

So if we can take a quick glance at the error, it says that a module named py4j.java_gateway is missing.

So we have to refer that to the Interpreter.

Again open the Preferences, open the current Interpreter settings and add the lib named py4j-0.9-src.zip

Adding the missing lib.

Now lets rerun the code.

We can see in the below screen shot, the words and the respective count are visible.

Final Run.

I hope this keeps you busy for the next few days on trying the amazing Apache Spark.

If you’ve reached this, you’ve made it!! Have a great day!

Clap away if this helped you out. It encourages me to write more posts. And thanks for the support.

Prady | @pradyumna_d | “File Your Cryptocurrency Taxes Using BearTax!”

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.