AI Policy Should Prioritize Visibility Into Trajectories

This is a linkpost for https://​​amistrongeryet.substack.com/​​p/​​increasing-visibility

As many readers of this blog know all too well, there has been ferocious debate around California SB 1047, a bill which would enact regulations on AI. Even the “Godfathers of AI” – Yann LeCun, Yoshua Bengio, and Geoffrey Hinton – are divided. LeCun seems to hate the bill; last month he tweeted[1]:

Excellent argument by @AndrewYNg against the ignominious California regulation SB1047, which would essentially kill open source AI and significantly slow down or stop AI innovation.

Meanwhile, Bengio and Hinton signed a letter in which they “express our strong support”:

…we are deeply concerned about the severe risks posed by the next generation of AI if it is developed without sufficient care and oversight. SB 1047 outlines the bare minimum for effective regulation of this technology.

It is tempting to think that people are just “talking their book” – supporting or opposing the bill according to how it affects their professional or financial interests. Opponents of the bill are often associated with AI labs (LeCun is Chief AI Scientist at Meta), while many proponents work on AI safety. But I think there’s more to the story.

It’s Not All About Vested Interests

OpenAI opposes SB 1047, but competitor Anthropic has more or less endorsed it. What’s going on here?

To shine some light on what drives views of the bill, let me talk about two people I know personally – Dean Ball of the Mercatus Center and Nathan Labenz of the Cognitive Revolution podcast. Dean and I recently appeared on Nathan’s podcast[2] to discuss the bill, and they participated in an offline panel discussion on AI regulation that I recently organized. They are both thoughtful, honest brokers, well versed in current developments. And they are on opposite sides of the debate. Dean has serious concerns with the bill:

Maybe it’s a worthwhile tradeoff. … Maybe AI capabilities will become sufficiently dangerous that releasing them without extensive, government-mandated testing is wildly irresponsible. Maybe they’ll become so dangerous that it really is too risky to release them as open source, since currently anyone can subvert the safety protections of an open-source model. And maybe after that happens, Meta or another well-resourced company, with its shareholders and its public reputation on the line, will choose to disregard all of those safety best practices and open source its models anyway, prioritizing its strategic business goals over the safety of society.

Maybe that is a world we’ll live in next month, or next year, or in five years, or in ten. But it is manifestly not the world we live in today, and to me, it is not obvious that any one of the “maybes” above is bound to come true.

Nathan, meanwhile, says “if I were the Governor, I would sign the bill”, and quotes a recent letter from Anthropic:

SB 1047 likely presents a feasible compliance burden for companies like ours, in light of the importance of averting catastrophic misuse

The disagreement here seems to be rooted in differing views as to the likely impact of AI. Dean is not convinced that “AI capabilities will become sufficiently dangerous” in the next few years, while Nathan references “the importance of averting catastrophic misuse”. Such differences in expectation – how powerful will AI become, and how dangerous is that power? – underlie many disagreements.

Everyone Is Basing Their Policy On Their Expectations

We have very little idea what capabilities future models will have. It is even difficult to discern the capabilities of existing models. Improvements in prompting, “scaffolding”, and other techniques are squeezing ever-higher levels of performance out of models after they are released[3]. Long after OpenAI launched GPT-4, someone discovered it had the unsuspected ability to play chess – but only if prompted in just the right way.

Even when a model’s capabilities are well understood, there is wide room to over- or under-estimate the potential impact. Creative uses emerge well after the model is released; a capability which seemed benign might turn out to have harmful applications. Conversely, a capability that seems dangerous may turn out to be insufficient to cause harm[4].

The result is that everyone is proposing policies based on their imagined future. If you’re working at an AI lab, it’s easy to assume that you’ll be able to control the technology you’re building, that it will be used mostly for good, that of course you’ll avoid harmful capabilities. Someone outside the industry may imagine the opposite. People imagine wildly different futures, leading them to equally different policy proposals; it’s no wonder that they then find it difficult to have a constructive discussion.

When the problem is framed this way, the solution seems clear: rather than arguing about what might happen, we should work to ground the discussion in reality.

How To Reduce Uncertainty

A recent post from Helen Toner nicely presents some important ideas. Here is my own laundry list.

Researchers have been developing techniques for measuring a model’s capabilities. This work can use more funding. Researchers should also have guaranteed access to the latest models, including those still under development, as well as “agent frameworks” and other systems and applications that are designed to squeeze more capabilities out of the models.

Forecasting the rate at which AI capabilities will progress is another area of research which could use more funding and access.

Then we come to the task of anticipating AI’s impact. For instance, there is extensive debate as to whether an AI that can provide detailed and accurate instructions for synthesizing a dangerous virus would pose a real danger[5]. Again, research funding would be helpful.

We should also be carefully watching for “warning shots” – early indicators of potential danger. Hospitals could screen patients with unusual illnesses; they might have contracted an artificial virus that is fizzling out (or just beginning to spread). Cloud hosting providers might be asked to watch for signs of a self-replicating AI.

The companies that are developing advanced AI models and applications have the best visibility into many important questions. We should institute mechanisms for policymakers, the research community, and the general public to have appropriate access to that information. Some possibilities:

  • Requirements to report internal evaluations of model capabilities, including models still under development.

  • Monitoring how models and applications are used[6], focusing on actual or attempted use for bad purposes.

  • For “red teams” and other safety researchers: access to unreleased models, or (under tight controls!) to model weights.

  • Whistleblower protections. If an employee at an AI lab sees something concerning, they should be encouraged to report their concern to an appropriate regulator[7].

Such requirements should be focused on the largest and most advanced projects. To protect trade secrets, some information would be reported at a coarse level of detail and/​or restricted to a narrow circle.

Other measures to reduce uncertainty:

  • Safe harbor protections for safety research. Within responsible limits, researchers should be able to poke at an AI application without fear of being accused of hacking or violating terms of service.

  • Antitrust exemptions for cooperation on safety initiatives.

The Time To Act Is Now, The Way To Act Is Gathering Data

This post might be viewed as a call to delay policy action until we know more. Unfortunately, we don’t have that luxury. There will always be uncertainty around AI, and we will need to take action with imperfect information.

Our first priority should be to gather more information – and quickly! This is a job for policy, not just voluntary commitments and privately funded research.

  1. ^

    Note that this was in the context of an earlier draft of the bill. I’m not aware that LeCun’s views have changed after the most recent amendments.

  2. ^

    Along with Nathan Calvin.

  3. ^
  4. ^

    An AI that can tell you the instructions for creating anthrax may not be able to coach you through the necessary lab technique to successfully follow those instructions.

    See https://​​x.com/​​sebkrier/​​status/​​1817877099673203192 for a concrete example of a beneficial capability (identifying fraudulent tax returns) not having a significant impact in practice. In the same way, capabilities that are theoretically dangerous do not always result in significant harm.

  5. ^

    A hypothetical attacker would need to clear multiple hurdles. For instance: access to a well-equipped lab, practical lab skills, some way of evading filters that are proposed to be built into DNA synthesizers, and the desire to kill thousands or millions of people.

  6. ^

    Of course, it’s not possible to monitor usage of open-weight models. In an upcoming post, I’ll talk about the many difficult tradeoffs open-weight models pose.

  7. ^

    In order to avoid discouraging potential whistleblowers, the process for responding to reports should be designed to minimize the impact on the business being reported, unless serious wrongdoing is uncovered.