In response to Ayende’s code review of Resin, part III.
You may ask how seriously should one charge Resin. I’m curious, how serious are you willing to take me?
I’m going to try to create a OSS web scale search engine. I find relevance over a web scale corpus to be disgustingly intriguing. I’m so fascinated by it I’ve made it into my life’s quest. It’s what I shall be doing right up until the day I die. It’s what I want to be good at.
As of now, what’s the scale at which Resin can perform? My test data has been the English version of Wikimedia and ~20K novels from the Gutenberg project. Since Resin is a trie, it can take a lot of data. Lots and lots and lots and then all of a sudden, once your data reaches a certain scale, if it’s represented as a Unicode trie, it ceases to expand.
The point where all terms known to man or woman are contained within one data structure, is where I’m trying to steer this newly crafted ship.
Microsoft and Google says to use this:
Microsoft DocumentDB. It’s a little boring. Can I get a drink to cabin 237 please? http://i2.cdn.cnn.com/cnnnext/dam/assets/160108142736-regents-seven-seas-explorer-super-169.jpg
I’m having a lovely time with this at the moment:
Cheap and with no distractions. Just me, my plastic boat and the sea. Wait, someone’s already here. http://www.jeffkellerphotography.com/wordpress/wp-content/uploads/2012/08/Im-on-a-boat-Sea-kayaking-is-hard-work.jpg
There is a field name restriction mentioned that seems off. I do believe the name pattern for a tree is “{indexVersion}-{fieldNameHash}.tri”. For a glimpse into the commit Oren is reviewing click here.
See you in the comments section.