In my last article, I wrote about the need for Social Search — using human intelligence and experience to answer questions that Google and AI cannot answer.
For a social search platform like Proffer (or any system that relies on crowd intelligence) to work and to offer high quality responses**,** there needs to exist a reliable system for evaluating crowd-provided answers — for categorizing answers as better or worse, right or wrong. Without a mechanism to measure ‘correctness’ of an answer, there would be no way to rank or filter the answer space. Furthermore, in a crowdsourcing platform that offers rewards or penalties to encourage responders to participate, there’d be no way to determine which responders to reward, which responders to penalize, and by how much.
Peer review (subjecting an answer to the scrutiny of other experts in the same field) is the traditional approach for assessing crowd inputs. It’s a powerful idea and widely practiced in academia and online communities, but suffers from political manipulation, a lack of granularity/specificity in the measurement of expertise, an ability for reviewers to falsely accumulate expertise, and an unnatural constraint whereby votes from skilled and unskilled reviewers have equal impact on the outcome.
In building Proffer, we sought to create a protocol for decentralized peer review that avoids the pitfalls of traditional peer review and runs on the blockchain, adapting to cover future crowdsourcing and social search use cases on a global scale.
Don’t want to read? Go through the slides for a protocol overview and example.
This article is structured as follows:
Seeker asks a question that he/she wants crowd input on, e.g. “Why is the sky blue?” Seeker optionally backs the question with money (SeekerStake) that will be distributed among correct responders. This SeekerStake seeds an incentive pool called Token Backing Pool. A new token backing pool is created for each question that is asked on the platform.
Responders a.k.a peer reviewers see the question and can either respond with a new answer, or upvote/downvote answers previously submittedby other responders. Counterintuitively, responders must also contribute a stake (ResponderStake) to the Token Backing Pool as an expression of their confidence in their response; this stake is returned to them unless their answer is deemed incorrect (explained below) by the peer review protocol. The requirement of a ResponderStake dis-incentivizes spam and puts responders’ skin in the game.
The protocol keeps track of the “net skill” backing each answer that is added to the answer space. The SkillBacking for an answer is equal to the skill of the responder who first proposed it plus (+) skill of all responders who upvoted it minus (-) the skill of all responders who downvoted it. SkillBacking is a skill-weighted measure of the correctness of an answer at any given point in time.
“Skill” here refers to the very specific / granular skill of the user in the topic being addressed in the question, which in this case can be ‘Physics’ or ‘Optical Phenomenon’. The skill is read from the Global Expertise Bank and written back to it each time it’s updated.
Shown above, five different responders (Joe / Sam / Jenny / Andrea / Brett) contribute five new answers to the question “Why is the sky blue?”, paying Answer Stake into the Token Backing Pool for each answer.
Shown below, 15 different responders, named P1 thru P15, choose to upvote or downvote the five answers already in the Answer Space, paying Vote Stake into the Token Backing Pool for each vote. Note that each vote updates the net skill backing of each answer: an upvote adds the upvoter’s skill and downvote reduces net skill by the downvoter’s skill.
After P1 thru P15 have submitted their votes, the answer space has five answers, each with a net skill backing. At this point, we can either use a heuristic such as “net skill backing is > zero” to determine which answers are correct, or we can present all options to the Seeker who initially asked the question and allow him / her to choose what is the correct answer.
The best way to determine ‘correctness’ will depend on use case, so our protocol provides two configurable parameters rather than prescribing the judge of correctness/incorrectness:
Let’s assume for this example that Judge of Correctness = ‘seeker’ and Judge of Incorrectness = ‘peers’. This means that we can determine incorrect answers based on net skill backing. Answer 3, which received a net skill backing of -70, and Answer 5 with net skill backing of -15 are therefore Incorrect.
Answer 2 is deemed Correct because we make the assumption that the Seeker will select Answer 2 as the best answer since it has the highest net skill backing. Answers that are neither Correct nor Incorrect are labeled undecided.
We now use the following guidelines to compute two types of payouts — financial and skill — for each responder:
Guidelines for financial and skill payouts for each responder
Skill is an open/indefinite quantity. It can be given freely and taken freely. Performing skill payouts is therefore as easy as increasing or decreasing the responder’s skill on the Global Reputation Bank for the topic at hand.
Skill updates after Answers 3 and 5 determined to be Incorrect and Answer 2 to be Correct.
Money / tokens on the other hand are a closed / zero-sum quantity. They are re-distributed, not created, from incorrect responders to correct responders.
The protocol first returns the money that responders with Correct responses and undecided responses had put into the Token Backing Pool at the start of the process.
The remainder in the pool after these stakes have been returned consists of the original SeekerStake, and the AnswerStake(s) and VoteStake(s) of incorrect responders. Finally, this amount can be distributed across correct responders as their reward for answering or voting correctly.
Possible outcomes in crowdsourced peer review. Incorrect responders lose what they had put in. Correct responders recover what they had put in AND get an additional reward.
In steady state, the self-optimizing expert network described above updates the expertise of each responder over time based on his/her current expertise, his/her response, the crowd’s current expertise, and the crowd’s responses.
Expertise of person ‘p’ at time ‘t+1’ is a function of that same person’s expertise and response at time ‘t’, and the crowd’s expertise and response at time ‘t’
As long as expertise in the system is a reliable metric at time t
, we can rest assured that it will be reliable at time t+1
, getting closer to ground truth / ‘actual’ expertise of all users with time.
Our peer review protocol and the global expertise bank are capable of working out of the box, with all responders starting off with Skill = zero across all topics. The law of large numbers will ensure that with enough iterations, responders who started off with no skill will have won or lost skill based on the crowd’s voting for or against them, equilibrating somewhere close to their actual skill relative to their peers.
However, to improve the rate at which this skill equilibrium is reached, and to provide the highest quality answers right from t=0
, particularly for industry topics that require specialized knowledge or certification, we propose bootstrapping the expert network by manually selecting and on-boarding experts from pre-existing networks, both online and offline, formal and informal.
For example, in the field of healthcare, one could onboard physician networks and initialize skill for each physician in their respective practice (‘Cardiology’, ‘General Medicine’, etc.) based on a combination of past experience and in-person interviews.
The same can be done with practicing lawyers for topics concerning the Law, with school teachers for topics in k-12 education, with land developers and construction teams for topics concerning real estate, etc.
Seeding an expert network through a manual process would be neither trivial nor quick, but in a world where the nature of work is already tending towards ad-hoc gigs that allow workers to monetize their free time, an expert network on the blockchain with frictionless reward payments for experts who answer correctly could be an exciting platform to be a part of. This would be especially true for Proffer’s global expertise bank, as experts would be able to leverage reputation accumulated answering questions on Proffer while using other dApps built on the same global expertise bank.
If you’re curious to learn more about Proffer, the social search protocol we’re building on top of crowdsourced peer review and self-optimizing expert networks, check out our tech spec here, and the 5 apps we’ve published to showcase five different use cases for Proffer on our website here.