How to Hack Content Moderation in the Metaverse: Lessons from Web2.0

Content moderation in Web 2 is an unsolvable problem.

A golden standard between users’ free speech and gatekeepers' censorship of harmful content is impossible to draw. Even if the world’s longest and most comprehensible guidelines were drafted by 10,000 large language models over the span of 10,000 hours, it would still not encapsulate all the possible scenarios where posts and comments would fall below or above “the bar of acceptability”. On the other hand, without a clear code of conduct and intervention from the platform owner, any social network would deteriorate over time.

Toxic behavior among users is even magnified in the metaverse. Overall, bullying, harassment, hate speech, bizarre groping incidents, and other types of misbehavior are far too common in virtual worlds. A US study from 2021 indicates that five out of six adults (83%) have experienced some form of harassment in online multiplayer games. Looking ahead, Meta’s current CTO, Andrew Bosworth, wrote in an employee memo that moderating what people say or do in the metaverse “at any meaningful scale is practically impossible.”

How can it be ensured that our common digital space does not become a lawless jungle littered with trash talk, spam, and harassment?

In my view, the solution is a simple one: the bulk of moderation needs to be carried out by the users.

Pretty much like Reddit’s community point system with an emphasis on two points:

The community points need to be perceived as valuable to users.
Each user has a pseudonymous online identity with a track record and a reputation that can be audited by others. The reputation needs to be perceived as valuable to users.

I imagine that the first social platform to design a functioning, “self-moderating” system where users are incentivized to make valuable contributions and fear to do the opposite, will take a leading position in “the metaverse” - the deepening merge between virtual and physical realities.

Let’s take a deeper look at the moderation issue.

Ye’s Twitter controversy

One of the richest African-Americans and musicians in recorded history, Ye, formerly known as Kanye West, made a surprising announcement some weeks ago to his +31 million crowd of Twitter followers:

'I'm a bit sleepy tonight but when I wake up I'm going death con 3 On JEWISH PEOPLE. 'The funny thing is I actually can't be Anti Semitic because black people are actually Jew also. You guys have toyed with me and tried to black ball anyone whoever opposes your agenda.'”

Besides a temporary ban on Twitter and a strong backlash from the press and social media, several brands cut ties with Ye after his controversial statement. Adidas terminated the sponsorship of Ye’s shoe- and clothing brand, Yeezy, which accounted for an estimated $1.5 Billion of his net worth.

If Ye were a European citizen, he could likely be issued a fine or even face jail time for making antisemitic remarks. In the EU, hate speech is not protected under the right to freedom of expression[1] whereas the constitutional right to freedom of speech is considered to weigh heavier in the US. On Twitter, the only common ruleset which binds users from all over the world together is Twitter's terms and conditions. A private contract which can only be enforced by the company and warrants no harder punishment for users than a permanent ban.

Ye’s antisemitic remarks were red meat for gossip magazines and mainstream media because of his huge following and celebrity status. But, imagine that discriminating remarks, ranging from “innocent trolling” to Jeffrey Dahmer-style credible death threats, are posted by anonymous users on social networks every split-second of the day. Obviously, none of these “anons” have a $1.5 Billion sponsorship from Adidas, let alone any public image to lose. They literally have nothing to lose by posting stuff on social media - which turns out to be a problem.

According to simple game theory and human psychology, if one user acts badly and is rewarded with more exposure, more users are likely to follow his example as long as they can do so without consequences. As a result, the bar for social conduct sinks lower and lower. An experience many people can relate to on Twitter.

Musk’s “Collective Super Mind”

The new “Twitter Complaint Hotline Operator”, Elon Musk, made the following note to advertisers after his acquisition of Twitter:

https://twitter.com/elonmusk/status/1585619322239561728?embedable=true

From inside Twitter some regular users refer to it as a “hellsite”. Michelle Goldberg does not hold back on her criticism in a recent opinion piece for the NY Times:

“I’ve sometimes described being on Twitter as like staying too late at a bad party full of people who hate you. I now think this was too generous to Twitter. I mean, even the worst parties end. Twitter is more like an existentialist parable of a party, with disembodied souls trying and failing to be properly seen, forever. It’s not surprising that the platform’s most prolific users often refer to it as “this hellsite.”

In truth, most of Twitter’s prolific users rarely tweet.

Despite the great dangers of polarization and echo chambers, Elon Musk sees Twitter’s potential as a “collective, cybernetic super-intelligence (with a lot of room for improvement)". Like a collective super mind that connects everyone but without the need for brain-implanted neural laces.

https://twitter.com/elonmusk/status/1588081971221053440?t=EfPsvKD3qUQxfaN6-cyjUA&s=03?embedable=true

Joe Bak-Coleman, a researcher at Journalism Ethics and Security, Columbia University, disagrees with the notion:

“(..) neurons are ultimately a poor analogy for individual human behavior. As a collection of cells with identical DNA bound to live or die together, neurons share a common goal and have no reason to compete, cheat, or steal. Even if we had a full model of how the brain works, we couldn’t directly apply it to human behavior because it wouldn’t account for competitive dynamics between humans.”

Later in his essay, Joe Bak-Coleman concludes:

“Elon’s premise that Twitter can behave like a collective intelligence only holds if the structure of the network and nature of interactions is tuned to promote collective outcomes.”

It’s kind of simple, really: If all users shared Elon’s vision of Twitter as a “common digital town square, where a wide range of beliefs can be debated in a healthy manner” there wouldn’t be any issues with content moderation. However, to many, perhaps to most users, Twitter is either a “hellsite”, a joke, or a place where words and actions have no real consequences.

This is why users need incentives to contribute with valuable, honest, and accurate information. And they need to fear the consequences of doing the opposite. We need a non-authoritarian, social credit system moderated by the collective super mind.

Reddit’s Community Point System as a Moderation Tool

I think Reddit’s community point system could be the future of content moderation.

The idea is to make a kind of “self-moderating system” where users can actually earn something from making valuable contributions, and on the other hand, be punished for trolling, spamming, threatening, bullying, harassing, spewing out hate speech, or acting against community interests in any way.

Reddit rolled out its community point system in mid-2020 and describes it like this:

“Community Points are a way for Redditors to own a piece of their favorite communities. Moderators and content creators earn Points by contributing to the community, and they can spend their Points on special features, display their Points as reputation in the community, and vote with their Points to weigh in on community decisions.”

Community points are ERC-20 tokens owned by the users in a very real sense since they are stored on a blockchain independently from Reddit. They are currently featured on an experimental level in subreddits such as r/CryptoCurrency (Moons), r/FortniteBR (Bricks), and r/ethtrader (Donuts). The tokens are distributed to community members and moderators according to monthly accumulated karma points (upvotes and downvotes on posts and comments.)

I see Moons, Bricks, and Donuts as community-exclusive advancements of Reddit’s karma point system. For non-Reddit connoisseurs: karma points represent upvotes and downvotes on a user’s contributions to Reddit and are displayed publicly as a type of user-reputation score.

Community points can be thought of as spendable and tradeable Reddit karma.[2] They can currently be used as weighted votes in community pools, to tip other users, post GIFs in comments, purchase digital items for user profiles on Reddit, and they can even be traded on certain DEXes.

So far, the utility and value of Reddit community points are fairly limited. However, I believe that the credit-scoring system in some format or another could be an excellent “in-built tool” for content moderation as “spendable social credit points” find more real words applications and increase in value.

If there is any truth to Elon Musk's vision of Twitter as a “collective, cybernetic super-intelligence”, why should content be moderated by admins instead of the super-intelligence itself?

There is no doubt that moderators will still be needed in the short-term future to remove hateful and illegal contributions as soon as possible. But especially in virtual worlds, where the communication between users happens in real-time, the damage is already done before moderators have a chance to step in. This is why a credit-scoring system should be implemented on social sites, 3D or not, as a preventive measure. The promise of earning community tokens that can be traded, converted, and used as real money, to gain access to services and experiences, would incentivize users to behave properly, as well as help them to determine who they like to do business with.

As we have learned from Bitcoin, when just 51% of users in a system are fair and honest, and have a rational self-interest to stay fair and honest, the system as a whole work.

Thanks for reading! Sign up to my free Substack for monthly posts and writing updates.

[1] See Article 17 of the European Convention on Human Rights

[2] See https://www.reddit.com/r/ethtrader/comments/kl6ldu/donuts_faq/.

Also published here.