paint-brush
Generative AI and Contextual Confidence: Discussion, Acknowledgements and Referencesby@escholar

Generative AI and Contextual Confidence: Discussion, Acknowledgements and References

tldt arrow

Too Long; Didn't Read

An arxiv paper about maintaining contextual confidence amidst advances in generative AI, offering strategies for mitigation.

People Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Generative AI and Contextual Confidence: Discussion, Acknowledgements and References
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Shrey Jain, Microsoft Research Special Projects;

(2) Zo¨e Hitzig, Harvard Society of Fellows & OpenAI;

(3) Pamela Mishkin, OpenAI.

Abstract & Introduction

Challenges to Contextual Confidence from Generative AI

Strategies to Promote Contextual Confidence

Discussion, Acknowledgements and References

4 Discussion

4.1 Enforcing Protective Norms

In this paper, we have highlighted strategies that promote contextual confidence by setting new norms and expectations around the identification and protection of context. This discussion leaves out an important aspect of contextual confidence: It is important not only that there are norms, but that the norms are respected. Indeed, a norm cannot become a norm if there is no expectation that it is respected.


There are two primary stages to any enforcement strategy to protect norms: commitment strategies and accountability strategies.


Commitment strategies aim to create shared knowledge of the context that requires protecting. For example, a participant in a video call might “commit” to acknowledging that the meeting is being recorded, implicitly agreeing to the Terms of Service pertaining to that recording. Such a commitment engenders shared awareness among participants about two key aspects: the ongoing recording of the meeting and the terms governing the use of the information derived from it. Commitment in this example enhances the protection of contextual confidence from the default case in which participants may inappropriately record conversations to then reuse or repurpose content from those conversations. However, commitment alone does not enforce the protection of context for participants who do not respect their commitments.


Accountability strategies ensure that participants in a communicative exchange are held accountable if they violate their commitments. In some cases, the law naturally provides accountability. In the Terms of Service example above, a violation of Terms of Service may be treated as a breach of contract, which can potentially be litigated in court. In other cases, the law will not as readily provide accountability, and specific organizations and platforms will need to develop their own mechanisms for accountability.


Existing commitment and accountability strategies are sparse and require further development and experimentation. While some instant messaging platforms inform users if their communication partner has taken a screenshot of their chat, this simple commitment tool is far from standard or widespread. While some platforms have robust systems for reporting fake accounts, even these accountability technologies are often ineffective. While, as discussed above, prominent video calling services have deployed features that announce when the meeting is being recorded, individuals may not understand Terms of Service documents in full for each video call in which they participate. Thus, while there may be some commitment in the video calling case, this commitment may not be well understood.

4.2 Open Questions and Future Work

We hope this paper serves as a modest starting point for future collaborations between policymakers, AI model developers and researchers, focused on applying a contextual confidence perspective in particular domains. We outline here a few areas of future work that may be especially valuable in the near-term.


First, pragmatically defining what constitutes “context” is challenging in some domains. The concept of context itself is somewhat slippery, and certain elements of context are more important than others in particular domains. Identifying the “who, why, where, when, and how” in every information flow may be impractical at some scales of applications, and this paper offered little guidance as to how to prioritize the most important elements of context. It would be useful to build toward a standardization of what qualifies as a comprehensive contextual confidence evaluation that could be incorporated into safety reviews, or even model cards.


Second, we hope that this paper serves as a call to action for prioritizing the research and development of strategies that promote contextual confidence. Many of the strategies we discuss in this paper are in early stages of development and need a lot more research and development before they can be deployed. In addition, our enumeration of strategies is far from exhaustive. For example, we only offered cursory discussions of how AI models themselves can be used to strengthen strategies for promoting contextual confidence.


Third, it is critical to conduct empirical usability studies and surveys about whether and how the strategies we discussed indeed promote new norms in communication. Some of the strategies we discussed may have unforeseen consequence when applied in particular situations – like the surprising findings about how verification badges and fact checking tools may backfire in some contexts [38, 39]. In addition, it is important to gather data on the degree to which the strategies discussed here are differentially usable and accessible for different participants.


In this paper, we focused on the ways generative AI challenges contextual confidence. As communication technologies continue to evolve, challenges to contextual confidence will continue to emerge beyond generative AI. For instance, advancements in augmented reality and robotics may bring a whole other set of difficulties into the identification and protection of context in the physical world. It is our hope that framing challenges to effective communication in terms of contextual confidence will be useful in forthcoming stages of technological development.

Acknowledgements

We thank Glen Weyl, Danielle Allen, Allison Stanger, Miles Brundage, Sarah Kreps, Karen Easterbrook, Tobin South, Christian Paquin, Daniel Silver and Saffron Huang for comments and conversations that improved the paper.

References

[1] P. Urquhart and P. Heyer, Communication in history: Stone age symbols to social media. Routledge, 2018.


[2] A. B. Schwartz, Broadcast hysteria: Orson Welles’s War of the Worlds and the art of fake news. Macmillan, 2015.


[3] M. Atleson. (2023, March) Chatbots, deepfakes, and voice clones: AI deception for sale. Federal Trade Commission. [Online]. Available: https://www.ftc.gov/business-guidance/blog/2023/03/ chatbots-deepfakes-voice-clones-ai-deception-sale


[4] P. Verma. (2023, March) They thought loved ones were calling for help. it was an AI scam. The Washington Post. [Online]. Available: https://www.washingtonpost.com/technology/2023/03/05/ai-voice-scam/


[5] J. A. Lanz, “Dating app tool upgraded with AI is poised to power catfishing,” Decrypt, 2023.


[6] S. Kreps and D. L. Kriner, “The potential impact of emerging technologies on democratic representation: Evidence from a field experiment,” New Media & Society, pp. 1–20, 2023.


[7] S. Jain, D. Siddharth, and G. Weyl, “Plural publics,” Edmond and Lily Safra Center for Ethics, 2023. [Online]. Available: https://gettingplurality.org/2023/03/18/plural-publics/


[8] L. Wittgenstein, Philosophical investigations. Macmillan, 1952.


[9] J. Austin, How to do things with words. Oxford University Press, 1962.


[10] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.


[11] I. Solaiman, Z. Talat, W. Agnew, L. Ahmad, D. Baker, S. L. Blodgett, H. Daum´e III, J. Dodge, E. Evans, S. Hooker et al., “Evaluating the social impact of generative AI systems in systems and society,” arXiv preprint arXiv:2306.05949, 2023.


[12] R. Shelby, S. Rismani, K. Henne, A. Moon, N. Rostamzadeh, P. Nicholas, N. Yilla-Akbari, J. Gallegos, A. Smart, E. Garcia et al., “Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction,” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 2023, pp. 723–741.


[13] L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh et al., “Ethical and social risks of harm from language models,” arXiv preprint arXiv:2112.04359, 2021.


[14] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.


[15] L. Weidinger, J. Uesato, M. Rauh, C. Griffin, P.-S. Huang, J. Mellor, A. Glaese, M. Cheng, B. Balle, A. Kasirzadeh et al., “Taxonomy of risks posed by language models,” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 214–229.


[16] T. Shevlane, S. Farquhar, B. Garfinkel, M. Phuong, J. Whittlestone, J. Leung, D. Kokotajlo, N. Marchal, M. Anderljung, N. Kolt et al., “Model evaluation for extreme risks,” arXiv preprint arXiv:2305.15324, 2023.


[17] M. Brundage, S. Avin, J. Clark, H. Toner, P. Eckersley, B. Garfinkel, A. Dafoe, P. Scharre, T. Zeitzoff, B. Filar et al., “The malicious use of artificial intelligence: Forecasting, prevention, and mitigation,” arXiv preprint arXiv:1802.07228, 2018.


[18] H. Nissenbaum, “Privacy as contextual integrity,” Washington Law Review, vol. 79, p. 119, 2004.


[19] National Science and Technology Council, “Roadmap for researchers on priorities related to information integrity research and development,” 2022.


[20] D. Allen and J. Pottle, “Democratic knowledge and the problem of faction,” Knight Foundation White Paper Series, Trust, Media, and Democracy, 2018.


[21] A. E. Marwick and d. boyd, “I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience,” New Media & Society, vol. 13, no. 1, pp. 114–133, 2011.


[22] N. K. Baym and D. Boyd, “Socially mediated publicness: An introduction,” Journal of Broadcasting & Electronic Media, vol. 56, no. 3, pp. 320–329, 2012.


[23] N. K. Baym, Personal connections in the digital age. John Wiley & Sons, 2015.


[24] E. Brynjolfsson, “The Turing trap: The promise & peril of human-like artificial intelligence,” Daedalus, vol. 151, no. 2, pp. 272–287, 2022.


[25] D. Milmo. (2023, September) Paedophiles using open source AI to create child sexual abuse content, says watchdog. The Guardian. [Online]. Available: https://www.theguardian.com/society/2023/sep/12/ paedophiles-using-open-source-ai-to-create-child-sexual-abuse-content-says-watchdog


[26] E. Horvitz, “On the horizon: Interactive and compositional deepfakes,” in Proceedings of the 2022 International Conference on Multimodal Interaction. Bengaluru, India: ACM, November 2022, pp. 653–661.


[27] J. Bote, “Sanas, the buzzy Bay Area startup that wants to make the world sound whiter,” San Francisco Gate, 2022.


[28] R. Chandran. (2023, April) Indigenous groups fear culture distortion as AI learns their languages. The Japan Times. [Online]. Available: https://www.japantimes.co.jp/news/2023/04/10/world/ indigenous-language-ai-colonization-worries/


[29] R. McIlroy-Young, J. Kleinberg, S. Sen, S. Barocas, and A. Anderson, “Mimetic models: Ethical implications of AI that acts like you,” in Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, pp. 479–490.


[30] J. A. Goldstein, G. Sastry, M. Musser, R. DiResta, M. Gentzel, and K. Sedova, “Generative language models and automated influence operations: Emerging threats and potential mitigations,” arXiv preprint arXiv:2301.04246, 2023.


[31] P. Henderson, X. Li, D. Jurafsky, T. Hashimoto, M. A. Lemley, and P. Liang, “Foundation models and fair use,” arXiv preprint arXiv:2303.15715, 2023.


[32] I. Shumailov, Z. Shumaylov, Y. Zhao, Y. Gal, N. Papernot, and R. Anderson, “The curse of recursion: Training on generated data makes models forget,” arXiv preprint arxiv:2305.17493, 2023.


[33] European Disability Forum, “Resolution on the “EU Artificial Intelligence Act for the inclusion of persons with disabilities”,” Tech. Rep., 2023. [Online]. Available: https://www.edf-feph.org/content/uploads/2023/04/ EDF-Board-Resolution-on-the-EU-Artificial-intelligence-Act-for-the-inclusion-of-persons-with-disabilities. pdf


[34] Internet Crime Complaint Center. (2022) Federal Bureau of Investigation elder fraud report. [Online]. Available: https://www.ic3.gov/Media/PDF/AnnualReport/2022 IC3ElderFraudReport.pdf


[35] A. Puig. (2023, March) Scammers use AI to enhance their family emergency schemes. Federal Trade Commission Consumer Alert. [Online]. Available: https://consumer.ftc.gov/consumer-alerts/2023/03/ scammers-use-ai-enhance-their-family-emergency-schemes


[36] Consumer Financial Protection Bureau, “Office of servicemember affairs annual report,” Tech. Rep., 2023. [Online]. Available: https://s3.amazonaws.com/files.consumerfinance.gov/f/documents/cfpb osa-annual-report 2022.pdf


[37] A. Tang, G. Weyl, and the Plurality Community. (2023) Plurality: Technology for collaborative diversity and democracy. [Online]. Available: https://github.com/pluralitybook/plurality/blob/main/contents/english/ Association.md


[38] M. Xiao, M. Wang, A. Kulshrestha, and J. Mayer, “Account verification on social media: User perceptions and paid enrollment,” arXiv preprint arXiv:2304.14939, 2023.


[39] D. Akhawe and A. P. Felt, “Alice in warningland: A large-scale field study of browser security warning effectiveness,” in Proceedings of the 22nd USENIX Security Symposium, 2013, pp. 257–272.


[40] The Coalition for Content Provenance and Authenticity. (2023) Overview of C2PA. [Online]. Available: https://c2pa.org/


[41] Project Origin. (2023) Project origin. [Online]. Available: https://www.originproject.info/


[42] Content Authenticity Initiative. (2023) Content authenticity initiative. [Online]. Available: https: //contentauthenticity.org/


[43] Microsoft. (2023) Cross-platform origin of content framework. [Online]. Available: https://github.com/ microsoft/xpoc-framework


[44] V. Buterin. (2023) What do I think about Community Notes? [Online]. Available: https: //vitalik.ca/general/2023/08/16/communitynotes.html


[45] e Estonia. (2023) e-identity: ID-card. [Online]. Available: https://e-estonia.com/solutions/e-identity/id-card/ [46] Unique Identification Authority of India. (2023) Aadhaar. [Online]. Available: https://uidai.gov.in/en/ my-aadhaar/get-aadhaar.html


[47] Singpass. (2023) Singapore government identity passport. [Online]. Available: https://www.singpass.gov.sg/


[48] B. Campbell, J. Bradley, N. Sakimura, and T. Lodderstedt. (2023) Selective disclosure for JWTs. [Online]. Available: https://datatracker.ietf.org/doc/draft-ietf-oauth-selective-disclosure-jwt/


[49] Microsoft Research. (2023) U-prove. [Online]. Available: https://www.microsoft.com/en-us/research/project/ u-prove/


[50] T. Looker, V. Kalos, A. Whitehead, and M. Lodder. (2023) The BBS signature scheme. [Online]. Available: https://identity.foundation/bbs-signature/draft-irtf-cfrg-bbs-signatures.html


[51] W3C. (2022) Verifiable credentials data model v1.1. [Online]. Available: https://www.w3.org/TR/ vc-data-model/


[52] American Association of Motor Vehicle Administrators. (2023) Mobile driver’s license (mDL) implementation guidelines. [Online]. Available: https://www.aamva.org/getmedia/b801da7b-5584-466c-8aeb-f230cef6dda5/ mDL-Implementation-Guidelines-Version-1-2 final.pdf


[53] Digital Government Exchange (DGX) Digital Identity Working Group. (2022) Digital identity and verifiable credentials in centralised, decentralised and hybrid systems. [Online]. Available: https://www.developer.tech.gov. sg/our-digital-journey/digital-government-exchange/files/DGX%20DIWG%202022%20Report%20v1.5.pdf


[54] Apple. (2023) Apple vision pro. [Online]. Available: https://www.apple.com/apple-vision-pro/


[55] Microsoft. (2023) LinkedIn and Microsoft Entra introduce a new way to verify your workplace. [Online]. Available: https://www.microsoft.com/en-us/security/blog/2023/04/12/ linkedin-and-microsoft-entra-introduce-a-new-way-to-verify-your-workplace/


[56] (2023) World id. Worldcoin. A digital passport offering global proof-of-personhood with privacy-first, self-custodial, and decentralized features. [Online]. Available: https://worldcoin.org/world-id


[57] S. Basu and R. Malik. (2023) India’s Aadhaar surveillance project should concern us all. WIRED UK. [Online]. Available: https://www.wired.co.uk/article/india-aadhaar-biometrics-privacy


[58] Worldcoin. (2023) Worldcoin whitepaper. [Online]. Available: https://whitepaper.worldcoin.org/


[59] N. Immorlica, M. O. Jackson, and E. G. Weyl, “Verifying identity as a social intersection,” Available at SSRN 3375436, 2019.


[60] OAuth. (2023) Oauth information. [Online]. Available: https://mailarchive.ietf.org/arch/browse/oauth


[61] Gitcoin. (2023) Gitcoin passport. [Online]. Available: https://passport.gitcoin.co/


[62] ENS Domains. (2023) Decentralised naming for wallets, websites, & more. [Online]. Available: https://ens.domains/


[63] SpruceID. (2023) SpruceID. [Online]. Available: https://spruceid.com/


[64] Proof of Humanity. (2023) Proof of humanity. [Online]. Available: https://proofofhumanity.id/


[65] D. Siddarth, S. Ivliev, S. Siri, and P. Berman, “Who watches the watchmen? A review of subjective approaches for sybil-resistance in proof of personhood protocols,” Frontiers in Blockchain, vol. 3, pp. 1–16, 2020.


[66] S. Jain, L. Erichsen, and G. Weyl, “A plural decentralized identity frontier: Abstraction v. composability tradeoffs in web3,” arXiv preprint arXiv:2208.11443, 2022.


[67] Y. Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein, “Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust,” arXiv preprint arXiv:2305.20030, 2023.


[68] J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” arXiv preprint arXiv:2301.10226, 2023.


[69] S. Abdelnabi and M. Fritz, “Adversarial watermarking transformer: Towards tracing text provenance with data hiding,” in Proceedings of the 2021 IEEE Symposium on Security and Privacy. IEEE, 2021, pp. 121–140.


[70] X. Zhao, P. Ananth, L. Li, and Y.-X. Wang, “Provable robust watermarking for AI-generated text,” arXiv preprint arXiv:2306.17439, 2023.


[71] S. Aaronson. (2023) My AI safety lecture for UT effective altruism. Shtetl-Optimized. [Online]. Available: https://scottaaronson.blog/?p=6823


[72] S. Gowal and P. Kohli. (2023) Identifying AI-generated images with SynthID. [Online]. Available: https://www.deepmind.com/blog/identifying-ai-generated-images-with-synthid


[73] M. Douze and P. Fernandez. (2023, October) Stable signature: A new method for watermarking images created by open source generative ai. [Online]. Available: https: //ai.meta.com/blog/stable-signature-watermarking-generative-ai


[74] Z. Jiang, J. Zhang, and N. Z. Gong, “Evading watermark based detection of AI-generated content,” arXiv preprint arXiv:2305.03807, 2023.


[75] X. Zhao, K. Zhang, Y.-X. Wang, and L. Li, “Generative autoencoders as watermark attackers: Analyses of vulnerabilities and threats,” arXiv preprint arXiv:2306.01953, 2023.


[76] J. Kirchenbauer, J. Geiping, Y. Wen, M. Shu, K. Saifullah, K. Kong, K. Fernando, A. Saha, M. Goldblum, and T. Goldstein, “On the reliability of watermarks for large language models,” arXiv preprint arXiv:2306.04634, 2023.


[77] S. Shoker, A. Reddie, S. Barrington, M. Brundage, H. Chahal, M. Depp, B. Drexel, R. Gupta, M. Favaro, J. Hecla et al., “Confidence-building measures for artificial intelligence: Workshop proceedings,” arXiv preprint arXiv:2308.00862, 2023.


[78] D. Kang. (2023) Verified execution of GPT, Bert, CLIP, and more. Medium. [Online]. Available: https://medium.com/@danieldkang/verified-execution-of-gpt-bert-clip-and-more-6acb693fd55f


[79] D. Kang, T. Hashimoto, I. Stoica, and Y. Sun, “Scaling up trustless dnn inference with zero-knowledge proofs,” arXiv preprint arXiv:2210.08674, 2022.


[80] EZKL. (2023) What is EZKL? [Online]. Available: https://docs.ezkl.xyz/


[81] Evals. (2023) Update on ARC’s recent eval efforts. [Online]. Available: https://evals.alignment.org/blog/ 2023-03-18-update-on-recent-evals/


[82] OpenAI. (2023) GPT-4 system card. [Online]. Available: https://cdn.openai.com/papers/gpt-4-system-card. pdf


[83] Anthropic. (2023) Model card: Claude-2. [Online]. Available: https://www-files.anthropic.com/production/ images/Model-Card-Claude-2.pdf


[84] Cohere Safety Team and Responsibility Council. (2023) Generation model card. [Online]. Available: https://docs.cohere.com/docs/generation-card


[85] J. M¨okander, J. Schuett, H. R. Kirk, and L. Floridi, “Auditing large language models: A three-layered approach,” AI and Ethics, pp. 1–31, 2023.


[86] P. Cihon, M. J. Kleinaltenkamp, J. Schuett, and S. D. Baum, “AI certification: Advancing ethical practice by reducing information asymmetries,” IEEE Transactions on Technology and Society, vol. 2, no. 4, pp. 200–209, 2021.


[87] M. Brundage, S. Avin, J. Wang, H. Belfield, G. Krueger, G. Hadfield, H. Khlaaf, J. Yang, H. Toner, R. Fong et al., “Toward trustworthy AI development: Mechanisms for supporting verifiable claims,” arXiv preprint arXiv:2004.07213, 2020.


[88] I. D. Raji, A. Smart, R. N. White, M. Mitchell, T. Gebru, B. Hutchinson, J. Smith-Loud, D. Theron, and P. Barnes, “Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020, pp. 33–44.


[89] S. K. Katyal, “Private accountability in the age of artificial intelligence,” UCLA Law Review, vol. 66, p. 54, 2019.


[90] T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H. D. Iii, and K. Crawford, “Datasheets for datasets,” Communications of the ACM, vol. 64, no. 12, pp. 86–92, 2021.


[91] I. D. Raji, P. Xu, C. Honigsberg, and D. Ho, “Outsider oversight: Designing a third party audit ecosystem for AI governance,” in Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, pp. 557–571.


[92] D. Kang and S. Waiwitlikhit. (2023, March) Tensorplonk: A “GPU” for ZKML, delivering 1,000x speedups. [Online]. Available: https://medium.com/@danieldkang/ tensorplonk-a-gpu-for-zkml-delivering-1-000x-speedups-d1ab0ad27e1c


[93] “ZK10: ZKML with EZKL: Where we are and the future,” 2023. [Online]. Available: https: //www.youtube.com/watch?v=YI3ljDis8sc


[94] E. G. Weyl, P. Ohlhaver, and V. Buterin, “Decentralized society: Finding web3’s soul,” Available at SSRN 4105763, 2022.


[95] (2023) Pairwise coordination subsidies: A new quadratic funding design. [Online]. Available: https: //ethresear.ch/t/pairwise-coordination-subsidies-a-new-quadratic-funding-design/5553


[96] (2023) Plural communication channel. Plurality Network. [Online]. Available: https://github.com/PluralCC# about


[97] T. Shevlane, “Structured access: An emerging paradigm for safe AI deployment,” arXiv preprint arXiv:2201.05159, 2022.


[98] M. Anderljung and J. Hazell, “Protecting society from AI misuse: When are restrictions on capabilities warranted?” arXiv preprint arXiv:2303.09377, 2023.


[99] M. Anderljung, J. Barnhart, J. Leung, A. Korinek, C. O’Keefe, J. Whittlestone, S. Avin, M. Brundage, J. Bullock, D. Cass-Beggs et al., “Frontier AI regulation: Managing emerging risks to public safety,” arXiv preprint arXiv:2307.03718, 2023.


[100] I. Solaiman, “The gradient of generative AI release: Methods and considerations,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, pp. 111–122. [101] Anthropic. (2023) Claude 2. [Online]. Available: https://www.anthropic.com/index/claude-2


[102] OpenAI. GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses. [Online]. Available: https://openai.com/gpt-4


[103] K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl et al., “Large language models encode clinical knowledge,” arXiv preprint arXiv:2212.13138, 2022.


[104] A. Gupta and A. Waldron. (2023, April) Sharing Google’s Med-PaLM 2 medical large language model, or LLM. [Online]. Available: https://cloud.google.com/blog/topics/healthcare-life-sciences/ sharing-google-med-palm-2-medical-large-language-model


[105] J. Howard. (2023, November) AI safety and the age of dislightenment. fast.ai. [Online]. Available: https://www.fast.ai/posts/2023-11-07-dislightenment.html


[106] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Decentralized business review, 2008. [107] E. Medina and R. Mac, “Musk says twitter is limiting number of posts users can read,” New York Times, 2023. [Online]. Available: https://www.nytimes.com/2023/07/01/business/twitter-rate-limit-elon-musk.html


[108] G. Support. Prevent mail to Gmail users from being blocked or sent to spam. [Online]. Available: https://support.google.com/a/answer/81126?sjid=2987346224567351299-NA


[109] Microsoft. (2023) Data loss prevention. [Online]. Available: https://www.microsoft.com/en-us/security/ business/security-101/what-is-data-loss-prevention-dlp


[110] (2023) Custom instructions for chatgpt. [Online]. Available: https://openai.com/blog/ custom-instructions-for-chatgpt


[111] S. Petridis, B. Wedin, J. Wexler, A. Donsbach, M. Pushkarna, N. Goyal, C. J. Cai, and M. Terry, “Constitutionmaker: Interactively critiquing large language models by converting feedback into principles,” arXiv preprint arXiv:2310.15428, 2023.


[112] M. Jakobsson, K. Sako, and R. Impagliazzo, “Designated verifier proofs and their applications,” in In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 1996, pp. 143–154.


[113] J. Lanier, “How to fix Twitter - and all of social media,” Retreived from https://www.theatlantic.com/technology/archive/2022/05/how-to-fix-twitter-social-media/629951/, 2022.


[114] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.


[115] Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon et al., “Constitutional AI: Harmlessness from AI feedback,” arXiv preprint arXiv:2212.08073, 2022.


[116] F. Khani and M. T. Ribeiro, “Collaborative development of NLP models,” arXiv preprint arXiv:2305.12219, 2023.


[117] W. Zaremba, A. Dhar, L. Ahmad, T. Eloundou, S. Santurkar, S. Agarwal, and J. Leung. (2023, May) Democratic inputs to AI. OpenAI. [Online]. Available: https://openai.com/blog/democratic-inputs-to-ai


[118] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.


[119] Y. Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, and Y. T. Lee, “Textbooks are all you need II: phi-1.5 technical report,” arXiv preprint arXiv:2309.05463, 2023.


[120] C. Xu, Q. Sun, K. Zheng, X. Geng, P. Zhao, J. Feng, C. Tao, and D. Jiang, “Wizardlm: Empowering large language models to follow complex instructions,” arXiv preprint arXiv:2304.12244, 2023.


[121] Salesforce. (2023) Xgen. [Online]. Available: https://github.com/salesforce/xgen


[122] W.-L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y. Zhuang, J. E. Gonzalez, I. Stoica, and E. P. Xing. (2023) Vicuna: An open-source chatbot impressing GPT-4 with 90% ChatGPT quality. [Online]. Available: https://lmsys.org/blog/2023-03-30-vicuna/


[123] Falcon LLM Team. (2023) Falcon LLM. [Online]. Available: https://falconllm.tii.ae/


[124] R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto. (2023) Alpaca: A strong, replicable instruction-following model. Center for Research on Foundation Models, Stanford University. [Online]. Available: https://crfm.stanford.edu/2023/03/13/alpaca.html


[125] (2023) Who’s Harry Potter? Making LLMs forget. Accessed: September 26, 2023. [Online]. Available: https:// www.microsoft.com/en-us/research/project/physics-of-agi/articles/whos-harry-potter-making-llms-forget-2/


[126] D. Choi, Y. Shavit, and D. Duvenaud, “Tools for verifying neural models’ training data,” arXiv preprint arXiv:2307.00682, 2023.


[127] S. Longpre, R. Mahari, N. Muennighoff, A. Chen, K. Perisetla, W. Brannon, J. Kabbara, L. Villa, and S. Hooker, “The data provenance project,” in Proceedings of the 40th International Conference on Machine Learning, 2023.


[128] R. Grosse, J. Bae, C. Anil, N. Elhage, A. Tamkin, A. Tajdini, B. Steiner, D. Li, E. Durmus, E. Perez et al., “Studying large language model generalization with influence functions,” arXiv preprint arXiv:2308.03296, 2023.


[129] A. Ghorbani and J. Zou, “Data Shapley: Equitable valuation of data for machine learning,” in Proceedings of the 36th International Conference on Machine Learning, 2019, pp. 2242–2251.


[130] T. Hardjono and A. Pentland, “Data cooperatives: Towards a foundation for decentralized personal data management,” arXiv preprint arXiv:1905.08819, 2019.


[131] K. Schwab, A. Marcus, J. Oyola, W. Hoffman, and M. Luzi, “Personal data: The emergence of a new asset class,” in An Initiative of the World Economic Forum. World Economic Forum Cologny, Switzerland, 2011, pp. 1–40.


[132] (2023) Data freedom act. RadicalxChange. [Online]. Available: https://www.radicalxchange.org/media/ papers/data-freedom-act.pdf


[133] The White House. (2023) Fact sheet: President biden issues executive order on safe, secure, and trustworthy artificial intelligence. [Online]. Available: https://www.whitehouse.gov/briefing-room/statements-releases/2023/ 10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/


[134] L. Ho, J. Barnhart, R. Trager, Y. Bengio, M. Brundage, A. Carnegie, R. Chowdhury, A. Dafoe, G. Hadfield, M. Levi et al., “International institutions for advanced AI,” arXiv preprint arXiv:2307.04699, 2023.


[135] J. Schuett, N. Dreksler, M. Anderljung, D. McCaffary, L. Heim, E. Bluemke, and B. Garfinkel, “Towards best practices in agi safety and governance: A survey of expert opinion,” arXiv preprint arXiv:2305.07153, 2023.


[136] The White House. (2023) Fact sheet: Biden-Harris administration secures voluntary commitments from leading artificial intelligence companies to manage the risks posed by AI. [Online]. Available: https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/ fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence


[137] Anthropic. (2023, September) Anthropic’s responsible scaling policy, version 1.0. [Online]. Available: https://www.anthropic.com/index/anthropics-responsible-scaling-policy