The Wikipedia of Vector Space

Written by edwardstanfield | Published 2017/06/08
Tech Story Tags: search

TLDRvia the TL;DR App

So me and my friend were laughing the other day about a startup claiming a spot in the marketplace as the Apple of augumented reality. What do they even mean? Maybe they’re dillusional.

It made me think. I also have a dillusional idea. I want to index the whole web and then create a GUI, a coding tool, that lets anyone pose complex queries over that index.

Creating an index of a large amount of data, say Wikipedia, is not hard. I’ve done it a gazillion times on my laptop. The compressed json file is 13 GB. Wikipedia has been my test data so to speak. That and Project Gutenberg.

However, creating an index of the web is almost impossible. Only two guys have done it. Google and Microsoft. But they’re corporations. I want to be the first person to index the whole web, to sail, so to say, from Earth to the Moon on a raft I built myself and then return safely. What I’m saying is, I want to become the Wikipedia of vector space. I would also be comfortable being just the SpaceX of search.

My raft: Resin

Demo of a search engine: searchpanels.com

It’s not a hard problem, just one that needs careful planning. I want to fit the index on a single machine. The rest (postings, pointers to addresses into files, documents) can be distributed over many machines/disks. I think the cloud already solved this.

I just need some machines. And some disk space.

Sponsor me: Contact marcuslager at gmail dot com.


Published by HackerNoon on 2017/06/08