DAO leadership and decentralized governance expert.
Not knowing what to believe makes decision-making incredibly difficult. That is to the advantage of the Powers that Be, which may be good in this case. Who knows? Nobody knows. The Powers That Be don’t even know because internally one agency is lying to the other.
If only we had a blockchain to tell us the one true truth! Then we would know how to respond. We would know who is lying and who is telling the truth. We would know if this truly is a conspiracy or an accident, or if our government policy is making sense or not.
Yes, that’s what we need. The truth! And Blockchain can solve it.
Consensus is the way that blockchains reach truth. They have nodes that validate claims and those claims become the immutable truth. If 51% of the nodes agree that’s the truth, then it’s the truth.
But truth in news doesn’t work that way. It’s totally feasible for 80% of the authorities to believe something that’s not true. For example, that you need to eat meat to be strong, that the world is flat or that time is sequential. Then some guy comes along with the theory of relativity… but still most people think time is sequential (and so do most computer systems).
Anyway, you can see the point. Consensus does not equal truth. Consensus equals a perspective. Maybe it is a valuable perspective and maybe not, but it’s just a perspective.
Similarly, data isn’t truth. It’s data. Warm data, cool data, different types of data. The data might say this many hospital beds or that many infections, but it’s data. It’s not truth.
Maybe one of the beds is broken. Maybe one of the tests is broken. Data isn’t truth either, even if it is immutable data verified by 51% or 99% of the nodes.
That’s where we need data integrity. We all know that there is no One Big True Truth.
Lately we’ve been seeing concepts like “warm data” and “agent-centric” systems. The basis of these types of data integrity is that data can’t exist in a vacuum.
In other words, there is no meaning to the number 37. You can’t have the number 37 floating around in a table where it needs to search for its context. In current data systems, we can store the number 37 in a table along with a bunch of other numbers and then process that table.
In a warm-data system, the number could not be stored alone. At a minimum, it would need a source. Let’s give an example.
An agent-centric system can’t store the claim “Jo’s temperature is 37°.” It can store a claim that “Jo’s temperature, as taken at this moment in time, by this thermometer and registered by this doctor is 37°.”
The nature of this claim is such that you could make assessments based on whether the thermometer is faulty or that the doctor is under-reporting compared to other doctors or that all the doctors in a particular facility are under-reporting.
If every single piece of data were validated by the Agents involved, the data would be reliable. Reliable is distinct from true. Reliable means that the recipient of the data can use their sense-making skills to understand the data.
For example, if you creating one of those online dashboards of the spread of the virus, you can say “this data came from the official Quaziland government and we do not have any more granular details from individuals.”
Or the opposite “In drilling down through the Quaziland data, we can see that every single test is accounted for and every validation has the signature of a doctor and a patient on the results.” The names of the doctors and patients could be hidden, but the validation signature could show the results.
For the sake of this article, we aren’t going into how to set up these validation rules—but a well-architected system can create immutable records that are extremely difficult to counterfeit.
In fact, as humans, this is how we function. We can make a judgement pretty well when we know where the information comes from. We know this person has given us good advice in the past about business but bad advice about family.
We know the automated sign at the bus stop is more reliable than Google’s estimation of when the bus will arrive because we have experience with those things. In the current crises, we have no capacity to reliably know where the information came from and compare it to the history of that source, so we are having a very difficult time making sense of anything.
If we had agent-centric data, that is, data that always held the source of the claim, we would be able to make better assessments. Our assessments would also include our bias.
If one journalist trusts individual doctors more than the hospital as a whole, that journalist might report a very different story than someone reporting based on official government statistics. The journalist would also be making a claim and perhaps could link back to the data (we are seeing a lot of this in today’s online tools).
If you took this one step further, you could have a variety of tools that fed into this kind of agent-centric architecture. We could have ways of reporting our own stories that could be validated. (I went to the hospital and there was/wasn’t a line.
I had symptoms and was denied a test.) Right now everything is hearsay and we don’t have appropriate tools for collecting that data in a meaningful way.
Also, in different jurisdictions, people may feel more or less free to attach their names to such data—but in a properly-designed system we could allow anonymous reporting with some kind of hash rather than a name, but that hash would be able to validate that someone wasn’t double-reporting.
Perspectives and assessments could be attached to the sources, too. We do this naturally. Some journalists, authorities, universities, governments, or friends are more trusted than others. Having verified claims and assessments of these entities would allow people to chose sources of data more wisely, and it would incentivize towards truth instead of towards sensationalism.
Unfortunately, we are paying the price for the subversion of truth to market dynamics and to flawed data integrity structures.
It’s time for those of us who are working on data integrity, Web3 and other future technologies to re-define data integrity and provide the infrastructure that will enable this kind of intelligence.
Much of my thinking on this topic is influenced by the Holochain project, where I have been working part-time for the last few months. I don’t think this is the only solution to the problem and encourage others to come up with different ways to think about data integrity, which is the point of this article.