There is a fear of AI. Indeed, AI CAN be dangerous, but so are governments (causing wars), raising farm animals(disease), using electricity(global warming), etc… and I would argue that this will not be different. Unlike most fearing the doom of humanity, I am suggesting an optimistic future.
Moore’s law reconsidered.
Generalized Moore’s law suggests that the capability of several things during its growth phase double every X year. This is true for computers, biology, etc. However, it also does not go faster than that. It already takes self-improvement into consideration. If the AI is as good as a human, it will take several years for it to be several times better. Moreover, this development will likely come to aid in human potentials such as genetic modification and brain-computer-interface among other technologies, meaning humans won’t quickly become obsolete like people would assume. We may eventually become AI or merge with AI.
Outer/Inner alignment reconsidered/RLHF is one of the solutions.
Reality is messy. Human value is not a simple function. Why are we expecting the AI to be this way? What people solving double alignment problems try to do is essentially create a paperclip maximizer, but the one which optimizes “human value” instead, and then argue that even making a paperclip maximizer maximize paperclip would be a big advancement. The reason we have many different goals is that the REWARD only creates learning, but is not what we’re optimizing for. We want money, housing, entertainment, etc… if we only wanted rewards(dopamine, serotonin, etc), we would’ve all just consumed drugs all day. What this means is that something like a shut-down switch would probably work. The AI that strictly pursues reward might see this as bad, but who is to assume the AI does? The AI was not trained to avoid the shut-down switch being pressed. Also, AI forcing humans to press its reward button seems like a good strategy until you realize the AI probably learned to do something which correlates with humans pressing the reward button like helping humans. It has not been trained to force humans to give rewards. As you can see, in this example, the AI strictly solves neither the outer nor inner alignment problem, yet it works. This reward button is what RLHF is.
We’re in the early day of alignment research.
The progress in RLHF and so on is despite us being only at the early day of alignment research. We will probably get much better at it.
Alignment is part of the benefit.
Developing a smart AI that does not align with the developer is no good. To monetize the result, the AI would need to be aligned. This would incentivize companies to invest in alignment by default.
We do not need perfect alignment.
While it may be true that humans may not behave perfectly in alignment with the natural selection that evolved us, it is important to remember that we are one of the most evolutionarily successful species on the planet. Extrapolated from this fact, even if the AI is not perfectly aligned with us, it is reasonable to suggest that we would be pretty good off of the deal. The assumption is that an AI with a simplistic goal would optimize away something we value, but the AI is probably messier than we assume.
There will be several AIs.
Even if one out of a thousand is aligned, we will already have enough for human flourishing. 1/1000 of solar energy is enough for us to feed all humans and have them live in luxury and so on. Moreover, in several areas, such as developing technologies, cooperation will result in us getting more than 1/1000th of the result. AI would probably forego extreme solutions and cooperate on the middle ground with most values fulfilled in an almost optimal sense. If one solution succeeds, we win.
Capability is needed to win.
Even if hypothetically, we got an AI perfectly aligned with us, it will not save us from anything unless it is powerful enough. We need alignment AND capability.
Conclusion: Do not slow down AI, keep up the alignment and capability research.
I may be biased as an optimist, but I believe even the worst-case scenario will involve thousands of AIs, with a few partially aligned with us that they are enough to protect us then allow us to increase our own capabilities enough to protect ourselves. Improving AI alignment tools will allow us to guide AI in the direction we want. Increasing AI capability will allow us to use AI more effectively. We need both. This is not to suggest that we should stop AI alignment research. Instead, we should offer even more solutions. This will improve the usefulness of the AI and increase the probability of us winning further. For example, reinforcement learning with direct human feedback from brain-computer-interface and so on.
Do not fear AI. Advance capability. Be optimistic.
There is a fear of AI. Indeed, AI CAN be dangerous, but so are governments (causing wars), raising farm animals(disease), using electricity(global warming), etc… and I would argue that this will not be different. Unlike most fearing the doom of humanity, I am suggesting an optimistic future.
Moore’s law reconsidered.
Generalized Moore’s law suggests that the capability of several things during its growth phase double every X year. This is true for computers, biology, etc. However, it also does not go faster than that. It already takes self-improvement into consideration. If the AI is as good as a human, it will take several years for it to be several times better. Moreover, this development will likely come to aid in human potentials such as genetic modification and brain-computer-interface among other technologies, meaning humans won’t quickly become obsolete like people would assume. We may eventually become AI or merge with AI.
Outer/Inner alignment reconsidered/RLHF is one of the solutions.
Reality is messy. Human value is not a simple function. Why are we expecting the AI to be this way? What people solving double alignment problems try to do is essentially create a paperclip maximizer, but the one which optimizes “human value” instead, and then argue that even making a paperclip maximizer maximize paperclip would be a big advancement. The reason we have many different goals is that the REWARD only creates learning, but is not what we’re optimizing for. We want money, housing, entertainment, etc… if we only wanted rewards(dopamine, serotonin, etc), we would’ve all just consumed drugs all day. What this means is that something like a shut-down switch would probably work. The AI that strictly pursues reward might see this as bad, but who is to assume the AI does? The AI was not trained to avoid the shut-down switch being pressed. Also, AI forcing humans to press its reward button seems like a good strategy until you realize the AI probably learned to do something which correlates with humans pressing the reward button like helping humans. It has not been trained to force humans to give rewards. As you can see, in this example, the AI strictly solves neither the outer nor inner alignment problem, yet it works. This reward button is what RLHF is.
We’re in the early day of alignment research.
The progress in RLHF and so on is despite us being only at the early day of alignment research. We will probably get much better at it.
Alignment is part of the benefit.
Developing a smart AI that does not align with the developer is no good. To monetize the result, the AI would need to be aligned. This would incentivize companies to invest in alignment by default.
We do not need perfect alignment.
While it may be true that humans may not behave perfectly in alignment with the natural selection that evolved us, it is important to remember that we are one of the most evolutionarily successful species on the planet. Extrapolated from this fact, even if the AI is not perfectly aligned with us, it is reasonable to suggest that we would be pretty good off of the deal. The assumption is that an AI with a simplistic goal would optimize away something we value, but the AI is probably messier than we assume.
There will be several AIs.
Even if one out of a thousand is aligned, we will already have enough for human flourishing. 1/1000 of solar energy is enough for us to feed all humans and have them live in luxury and so on. Moreover, in several areas, such as developing technologies, cooperation will result in us getting more than 1/1000th of the result. AI would probably forego extreme solutions and cooperate on the middle ground with most values fulfilled in an almost optimal sense. If one solution succeeds, we win.
Capability is needed to win.
Even if hypothetically, we got an AI perfectly aligned with us, it will not save us from anything unless it is powerful enough. We need alignment AND capability.
Conclusion: Do not slow down AI, keep up the alignment and capability research.
I may be biased as an optimist, but I believe even the worst-case scenario will involve thousands of AIs, with a few partially aligned with us that they are enough to protect us then allow us to increase our own capabilities enough to protect ourselves. Improving AI alignment tools will allow us to guide AI in the direction we want. Increasing AI capability will allow us to use AI more effectively. We need both. This is not to suggest that we should stop AI alignment research. Instead, we should offer even more solutions. This will improve the usefulness of the AI and increase the probability of us winning further. For example, reinforcement learning with direct human feedback from brain-computer-interface and so on.