I am commenting as someone who has spent a lot of time thinking about AI alignment, and considers themselves convinced that there is a medium probability (~65%) of doom. I hope this is not intrusive on this forum!
I hadn’t considered the crux to be epistemic, which is an interesting and important point.
I would be interested in an attempt to quantify how slowly humanity should be moving with this: Is the best level comparable to the one with genetic engineering, or nuclear weapon proliferation? Should we pause until our interpretability techniques are good enough so that we can extract algorithms from AlphaFold2?
I am also interested in possible evidence that would convince you of the orthodox (“Bostrom-Yudkowsky”) view: what proofs/experiments would one need to observe to become convinced of that (or similar) models? I have found especially the POWER-seeking theorems and the resulting experiments enlightening.
Rather than asking how fast or slow we should move, I think it’s more useful to ask what preventative measures we can take, and then estimate which ones are worth the cost/delay. Merely pausing doesn’t help if we aren’t doing anything with that time. On the other hand, it could be worth a long pause and/or a high cost if there is some preventive measure we can take that would add significant safety.
I don’t know offhand what would raise my p(doom), except for obvious things like smaller-scale misbehavior (financial fraud, a cyberattack) or dramatic technological acceleration from AI (genetic engineering, nanotech).
Thanks for the great article :-)
I am commenting as someone who has spent a lot of time thinking about AI alignment, and considers themselves convinced that there is a medium probability (~65%) of doom. I hope this is not intrusive on this forum!
I hadn’t considered the crux to be epistemic, which is an interesting and important point.
I would be interested in an attempt to quantify how slowly humanity should be moving with this: Is the best level comparable to the one with genetic engineering, or nuclear weapon proliferation? Should we pause until our interpretability techniques are good enough so that we can extract algorithms from AlphaFold2?
I am also interested in possible evidence that would convince you of the orthodox (“Bostrom-Yudkowsky”) view: what proofs/experiments would one need to observe to become convinced of that (or similar) models? I have found especially the POWER-seeking theorems and the resulting experiments enlightening.
Again, thank you for writing the article.
Thanks.
Rather than asking how fast or slow we should move, I think it’s more useful to ask what preventative measures we can take, and then estimate which ones are worth the cost/delay. Merely pausing doesn’t help if we aren’t doing anything with that time. On the other hand, it could be worth a long pause and/or a high cost if there is some preventive measure we can take that would add significant safety.
I don’t know offhand what would raise my p(doom), except for obvious things like smaller-scale misbehavior (financial fraud, a cyberattack) or dramatic technological acceleration from AI (genetic engineering, nanotech).
True, I was insufficiently careful with my phrasing.