TLDR
The ultimate goal is to align AI with goals, values and preferences of its users which is likely to include all of humanity. We assume that the value extraction problem will be solved and propose a possible way to implement an AI solution which optimally aligns with individual preferences of each user. We conclude by analyzing the benefits and limitations of the proposed approach. We will not directly address how, once learned, such values can be represented/encoded in computer systems for storage and processing. These assumptions free us from having to worry about safety problems with misaligned AIs such as perverse instantiation.via the TL;DR App
no story
Written by roman.yampolskiy | Professor of Computer Science. AI Safety & Cybersecurity Researcher.