paint-brush
Announcing Camelot, a Python Library to Extract Tabular Data from PDFsby@vinayak
33,659 reads
33,659 reads

Announcing Camelot, a Python Library to Extract Tabular Data from PDFs

by Vinayak Mehta5mOctober 3rd, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The PDF (<a href="https://en.wikipedia.org/wiki/PDF" target="_blank">Portable Document Format</a>) was born out of <a href="http://www.planetpdf.com/planetpdf/pdfs/warnock_camelot.pdf" target="_blank">The Camelot Project</a> to create “a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks”. Basically, the goal was to make documents viewable on any display and printable on any modern printer. PDF was built on top of <a href="https://en.wikipedia.org/wiki/PostScript" target="_blank">PostScript</a> (a page description language), which had already solved this “view and print anywhere” problem. PDF encapsulates the components required to create a “view and print anywhere” document. These include characters, fonts, graphics and images.

Company Mentioned

Mention Thumbnail
featured image - Announcing Camelot, a Python Library to Extract Tabular Data from PDFs
Vinayak Mehta HackerNoon profile picture
Vinayak Mehta

Vinayak Mehta

@vinayak

https://vinayak.io

About @vinayak
LEARN MORE ABOUT @VINAYAK'S
EXPERTISE AND PLACE ON THE INTERNET.
L O A D I N G
. . . comments & more!

About Author

Vinayak Mehta HackerNoon profile picture
Vinayak Mehta@vinayak
https://vinayak.io

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite