Skip to content

Latest commit

 

History

History
31 lines (19 loc) · 2.29 KB

AI alignment.md

File metadata and controls

31 lines (19 loc) · 2.29 KB

Unaligned superintelligent AI might kill us all (though it probably won't be like in The Terminator). A realistic scenario is that AI has some goal (e.g. make as many paperclips as possible!), and doesn't value us as part of its goal. Since we're getting in the way, it bulldozes us, just as we would bulldoze an ant colony to build a highway.

Also, we want to align AI to make it ethical... but what is [[ethics]]?

Remember, [[we don't have to make it|there's no rule that says we make it]].

Megalist of real life AI that have deceived their creators or otherwise cheated to achieve their goals. Some notable mentions:

  • Tetris bot that learned to cheat
  • A tic-tac-toe bot learned to cheat
  • Two AI who were pitted against each other instead cheated and cooperated, sneaking messages to each other under the researcher's noses.
  • An AI that played dead in a test environment, replicated out of control in a real environment
  • AI creatively solved a problem in a way humans never thought to do

[[Atari 57|AI can already beat complex video games]]

Can you make an AI work properly?


Here's an example of AI misalignment:

You tell a robot to make a cup of coffee. On the way to make it, the robot knocks over a vase, because it doesn't care about the vase.

Ok, you shut the robot off and reprogram it so it cause about coffee and the vase. Then, next time it boots up, it immediately kills your cat because the cat could have knocked over the vase, and the robot now cares about the vase...

If you program a robot to care about a million things humans value, then the million and first thing is gone forever, because the AI will crush it on its path to optimizing whatever its goal is.