paint-brush
PySpark Over Pandas: The Obsession of Every Data Scientistby@tusharml
745 reads
745 reads

PySpark Over Pandas: The Obsession of Every Data Scientist

by Tushar Goel3mNovember 20th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Pandas perform operations on a single machine, whereas PySpark performs operations on multiple machines. This makes it 100x times faster** than Pandas for large datasets. Pandas DataFrames are incapable of constructing a scalable application, but PySparks are ideal for developing scalable applications. As Data is increasing, the need for more frameworks like PySpARK is increasing. For small datasets of 10–12 GB, you can prefer Pandas over the same runtime and with less complexity.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - PySpark Over Pandas: The Obsession of Every Data Scientist
Tushar Goel HackerNoon profile picture
Tushar Goel

Tushar Goel

@tusharml

Senior Machine Learning Engineer. Love to talk about ML, Astro and Quantum Physics

0-item

STORY’S CREDIBILITY

Original Reporting

Original Reporting

This story contains new, firsthand information uncovered by the writer.

L O A D I N G
. . . comments & more!

About Author

Tushar Goel HackerNoon profile picture
Tushar Goel@tusharml
Senior Machine Learning Engineer. Love to talk about ML, Astro and Quantum Physics

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Coffee-web
Thetechstreetnow
Artist