Data Analytics and Blockchain Data analytics and machine learning are hugely valuable, providing insights and spurring advancements in many industries including IoT, healthcare, and financial services. Today’s blockchain platforms cannot directly support applications that compute over sensitive data. Unfortunately, the data that powers these advancements is often highly sensitive. For example, medical research requires access to sensitive patient data. In many cases this data cannot be accessed or shared due to privacy concerns. This results in data silos in which data is not used for its full potential value. Blockchain can help solve this problem, though several challenges remain. For example, one would like to use smart contracts to allow researchers to run machine learning over sensitive data without revealing the data to the researchers. This is the premise behind many exciting new blockchain applications including data markets and decentralized hedge funds. As we discussed in a , existing blockchains such as Ethereum store all data and state publicly, which could allow any user of the network to steal the data. Unfortunately, today’s blockchain platforms cannot directly support applications that compute over sensitive data. previous blog post The Oasis platform provides confidentiality for smart contract execution through the use of and cryptographic techniques. In short, confidentiality ensures that sensitive data cannot be viewed or stolen when a smart contract runs on the data. secure enclaves Confidentiality is an essential requirement for protecting privacy that is missing in today’s smart contract platforms. However, protecting the computation process alone is not enough: additional care must be taken to ensure the of computations don’t leak sensitive information. outputs Privacy Risks of Data Analytics The results of data analytics and machine learning often reveal more than intended, which can lead to privacy violations. For example, consider a company which releases the average salary of its employees each month. January $73,568February $74,872 Month Average Salary Our intuition might tell us that statistical results such as averages do not reveal information about individuals. This is incorrect. Imagine you know that the company had 58 employees in January, and that the only change in staffing at the company between January and February was the hiring of your friend Bob. Based on the combined information, we can determine Bob’s exact salary: $150,504. This simple example demonstrates a problem inherent to any statistical query on sensitive data: the results of such queries can often reveal sensitive information — even if that information is not included directly in the output (such as Bob’s salary). This can happen accidentally, often in non-intuitive ways, and doesn’t require that the researcher is intentionally trying to learn private information. Recent work has shown that machine learning models can also leak information. For example, a co-authored by Oasis team members demonstrates how private information such as credit card numbers can be extracted from a deep learning model trained on user data. More concerning, this leakage happens across many different types of models, parameters, and training strategies. recent paper Anonymization is Not a Solution The most common approach for protecting the privacy of individuals is to data before releasing it. This approach is based on the assumption that if the data contains no identifying information about individuals (names, addresses, etc.) it should be safe for release. anonymize Traditional approaches for protecting privacy are insufficient. Unfortunately, individuals can often be identified in anonymized datasets using so-called . For example, in 2009 Netflix released a dataset of anonymized customer movie reviews for a competition to train better recommendation algorithms. Researchers how to link the anonymized reviews with data from the Internet Movie Database, and were able to re-identify a large number of Netflix customers. Based on this result, Netflix was prevented by the FTC from launching a second round of the competition. re-identification attacks demonstrated There are many other examples of anonymized data being used to identify specific individuals, including and . In fact, a found that 87 percent of the population in the United States can be identified by just their ZIP code, gender and date of birth. search logs taxi trips recent study uniquely These results suggest that traditional approaches for protecting privacy are insufficient. We need a fundamentally new approach that is robust against these and other attacks. Differential Privacy: A Formal Privacy Guarantee is a formal definition of privacy. Informally, it states that the result of a computation must be similar whether or not any individual is included in the analysis. In other words, differential privacy guarantees that looking at the output there is whether any individual appears in the data — much less to learn their actual information. Differential privacy no way to tell with certainty Differential privacy has several desirable properties. First, it doesn’t make any assumptions about what auxiliary information is available, therefore it is immune to all the attacks mentioned above. Additionally, while the definition prevents information from being learned about individuals, it still allows much to be learned about populations in the data, which is the very goal of most data analytics and machine learning problems. Unfortunately, differential privacy is merely a definition of privacy means; it does not tell us to achieve this property. A major goal of our research over the past several years has been to develop algorithms and tools to enforce differential privacy (and other privacy-preserving techniques) for real-world problems such as data analytics and machine learning. what how In previous work we developed , a modular framework for privacy-preserving data analytics. Chorus automatically enforces differential privacy for general-purpose data analytics via several state-of-the-art algorithms. Chorus has been released open-source and is currently to provide privacy-preserving analytics for its analysts. In addition, we are conducting a pilot with data analysts at the Winton Group using Chorus to from location and shopping data while protecting individual privacy. Chorus deployed at Uber predict market trends Our research has also focused on privacy-preserving machine learning, where our work has produced practical new solutions to enforce differential privacy for machine learning tasks. We will present one of these techniques, , at IEEE Security & Privacy 2019, a top security conference. Approximate Minima Perturbation We will share more technical details of this work in a future blog post. Privacy Primitives for Smart Contracts in Oasis At Oasis Labs, we’re building a new platform for privacy-first cloud computing on blockchain. Our mission is to charter the next era for secure computing and enable a new wave of privacy-first applications. This requires a top-to-bottom solution: in addition to providing data confidentiality at the platform level, Oasis will provide built-in privacy primitives at the application level. We are currently developing a set of libraries to enable developers to build privacy-first smart contracts including data analytics and machine learning. These libraries are built from our extensive experience in this space, and will include the privacy-preserving techniques described above, as well as many new techniques. The libraries will provide developers with a range of privacy-preserving building blocks such as differential privacy. Developers can use these building blocks to develop smart contracts that are privacy-preserving by design, without requiring the developer to be a security expert. This makes it easy to write applications that access sensitive data in a secure way, while providing guarantees to users that their data won’t be misused. We look forward to sharing more announcements about this work very soon. If you are interested in building an application using these libraries, we encourage you to apply to check out the . Oasis Devnet Resources: Oasis Devnet Non-technical primer Video tutorials Oasis documentation on Rust smart contracts For further information about Oasis Labs, please contact us at info@oasislabs.com.