Seatbelts are safety, but what is safety for seatbelts if there are risks they face in certain situations? How far down can safety be designed for systems, with respect to designing safety for safety systems?
Technologies, or efficiency systems, do not just advance better than safety can apply, but technologies do not have a basis--naturally--for safety. A human, on average, anywhere on the body would quickly move away from a sudden hot object. Safety is built in, against negative and dangerous experiences, that either automatically or not, it is possible to avoid danger, without having a system for safety in one part of the body, that may then need its own safety system, or have safety [apply] somewhere not everywhere.
Simply, experience is the basis of safety for humans as a means of survival in society. Experience also is the source of progress in society as laws are based on the possibility that manypeople will try to avoid the negative experiences that will result from breaking the law.
Laws or regulations often build on human affect or experience, such that laws are not made for people who will not be affected--if there is a penalty. So human society is not just intelligence driven, but of experiences--by affect.
This is a major contrast with technology. Even with several safety systems in automobiles, they still do not know if they crumple. So while they tryto be safe, it is still for people, not for them, which, assuming it were possible, would have been a better end of safety--so to speak.
AI is a technology, which is not one in a physical space--where safety seems easier. Digital is too flexible and of a different phase, that it is possible to make large changes and adjustments without the same efforts in the physical. For example, drawing on a wall and on a PC, or driving simulations and so forth, are different from the physical, in most cases.
Social media showed that digital contents can be difficult to regulate, with scale, evasion, and speed. Digital, like other technologies, have no affect, so they experience nothing to indicate that they should not allow themselves to be used for one negative purpose or another.
AI, though useful, is already getting misused, with several major stories in the last two years. Making AI safe is a problem preliminary of AI lacking experience, hence affect, before any technical hill.
How can AI have affect? How can this affect become the basis for AI alignment, such that whenever it is misused, it can know that there is a penalty for it? How can this be adopted so that it becomes the basis for regulation, rather than the common suggestions of inspection or monitoring?
There have been several news articles about departures from OpenAI in the last year, with suggestions about not doing enough for safety and alignment. While those with the experience may be certain of what they are saying, across the industry, there is still no major safety tool that is model-independent, against many of the known risks of AI today. This is not about guardrails or red-teaming, but about the misuses of AI, and not a lot of answers, from any source against those problems.
The core of AI safety research could be to look at how human affect works with human intelligence for safety, then to explore how to develop parallels for AI, converting parameters into algorithms, to pave way against this very difficult AI safety problem, mirroring the human mind.
There is a recent story on TechCrunch, OpenAI loses another lead safety researcher, Lilian Weng, stating that, "Another one of OpenAI’s lead safety researchers, Lilian Weng, announced on Friday she is departing the startup. Weng served as VP of research and safety since August, and before that, was the head of the OpenAI’s safety systems team. Weng’s departure marks the latest in a long string of AI safety researchers, policy researchers, and other executives who have exited the company in the last year, and several have accused OpenAI of prioritizing commercial products over AI safety."
There is a recent feature on Quanta Magazine, Debate May Help AI Models Converge on Truth, stating that, "Let two large models debate the answer to a given question, with a simpler model (or a human) left to recognize the more accurate answer. In theory, the process allows the two agents to poke holes in each other’s arguments until the judge has enough information to discern the truth. Building trustworthy AI systems is part of a larger goal called alignment, which focuses on ensuring that an AI system has the same values and goals as its human users. Today, alignment relies on human feedback — people judging AI. But human feedback may soon be insufficient to ensure the accuracy of a system. In recent years, researchers have increasingly called for new approaches in “scalable oversight,” which is a way to ensure truth even when superhuman systems carry out tasks that humans can’t. Computer scientists have been thinking about scalable oversight for years. Debate emerged as a possible approach in 2018, before LLMs became as large and ubiquitous as they are today."
There is a new blogpost by character.ai, Community Safety Updates, stating that, "Moving forward, we will be rolling out a number of new safety and product features that strengthen the security of our platform without compromising the entertaining and engaging experience users have come to expect from Character.AI. These include: Changes to our models for minors (under the age of 18) that are designed to reduce the likelihood of encountering sensitive or suggestive content. Improved detection, response, and intervention related to user inputs that violate our Terms or Community Guidelines. A revised disclaimer on every chat to remind users that the AI is not a real person. Notification when a user has spent an hour-long session on the platform with additional user flexibility in progress."