paint-brush
Spark Data Source API. Extending Our Spark SQL Query Engineby@anicolaspp
7,219 reads
7,219 reads

Spark Data Source API. Extending Our Spark SQL Query Engine

by Nicolas A Perez7mJanuary 9th, 2016
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In the last post, <a href="https://medium.com/@anicolaspp/apache-spark-as-a-distributed-sql-engine-4373e254e0f9#.x4kyh8jqr" target="_blank">Apache Spark as a Distributed SQL Engine</a>, we explained how we could use SQL to query our data stored within Hadoop. Our engine is capable of reading <strong>CSV</strong> files from a distributed file system, auto discovering the schema from the files and exposing them as tables through the <em>Hive</em> meta store. All this was done to be able to connect standard SQL clients to our engine and explore our data set without manually define the schema of our files, avoiding ETL work.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Spark Data Source API. Extending Our Spark SQL Query Engine
Nicolas A Perez HackerNoon profile picture
Nicolas A Perez

Nicolas A Perez

@anicolaspp

L O A D I N G
. . . comments & more!

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite