Nature of progress in Deep Learning

karpathy15 May 2022 2:43 UTC

8 points

I recently wrote a blog post Deep Neural Nets: 33 years ago and 33 years from now that was partly a case study on the nature of progress in deep learning, which people here may find interesting. (I would encourage people to read it briefly and return for a few more progress studies—specific comments below).

What strikes me the most is that progress in deep learning has for decades been upper bounded by the computing infrastructure and the software ecosystem. For example, our ability to automate sight famously made a leap in 2012 AlexNet. The neural network architecture and the training algorithm would have been extremely recognizable to LeCun 1989. So who deserves the credit for this leap? I’m inclined to say that it is the thousands of engineers who made computer chips faster and cheaper. Who made the hard drives bigger. Who developed Google Image Search, which was used to seed the labels. Who built Amazon Mechanical Turk that allowed the ImageNet project to clean the labels. Who developed GPUs and wrote CUDA. And of course Alex, Ilya and Geoff for the “final assembly”. In particular, there is no single big eureka moment in this story, only the tens of thousands of smaller eureka moments hidden out of sight.

As a result I am fascinated by the “dark matter” of progress in progress studies—the little incremental improvements (usually by large organizations) that improve on the collective infrastructure and unlock a final assembly of exceptional results. I am also looking for equivalents of this in other fields, e.g. I recall reading an article a while ago that suggested that one of the reasons the Romans would find it hard to industrialize is that material science, precision manufacturing and the associated industry had to advance further to enable the building of all the necessary machines and experimental tools. I am not qualified to tell, but perhaps something along those lines is also “dark matter”, hidden behind the more standard narratives of the invention of the steam engine and the like.

Finally, if it were the case that progress often takes this form (does it?), what could be done to best accelerate it? For example, what could be done to make the AlexNet happen 10 years earlier? It feels hard to come up with any one single thing, except for actions that support and encourage an ecosystem of large organizations incrementally improving the collective software/hardware infrastructure and offering it up as building blocks.

Looking forward to other’s thoughts, especially with respect to how unique (or not) the nature of progress in Deep Learning is relative to other areas. It might also be fun to consider other prominent examples of “dark matter” of progress (or conversely—what progress required the least of it). Cheers, -Andrej

What links here?

karpathy15 May 2022 2:43 UTC

8 points

4 comments2 min readPF link

jasoncrawford 17 May 2022 22:44 UTC
4 points
I think you’re right about “dark matter,” and precision machining is exactly the first example of it that leapt to mind. E.g., Watt was having a hard time getting his improved steam engine to work reliably, because without a very good fit between the piston and cylinder, steam pressure would be lost. The problem was solved by Wilkinson, who had developed a special technique for boring canons that could be applied to cylinders for engines. This story is told toward the beginning of Simon Winchester’s book The Perfectionists (sold in the UK under the title Exactly, I think).
I gave a related example about the non-obvious importance of precision manufacturing in my essay “Why did we wait so long for the threshing machine?”
Another one that comes to mind is chemical synthesis. Think about how much in the chemical and pharmaceutical industries relies on our ability to synthesize chemicals. And yet, this is rarely discussed even in books on the history of technology. Every once in a while I marvel that we can just synthesize molecules. How do we do that? And how did we learn to do that?!
Or, consider the semiconductor industry. To even invent, say, the transistor, we needed the ability to make n-type and p-type silicon. I haven’t dug into it yet, but it must have required sophisticated materials processes to perform the appropriate doping of boron and phosphorus, which are present in the silicon in minute quantities.
One more: When I looked into the history of smallpox vaccines, I found that there was a lot of iteration after the initial vaccine to improve safety, storability, and transportability:
Small amounts of vaccine could be stored for a short time on ivory points, between glass plates, on dried threads, or in small vials. But the virus would lose its effectiveness quickly, especially when subject to heat. When King Charles VI of Spain sent a vaccination expedition to the Americas as a philanthropic effort in 1803, the crew took 22 orphan boys: one was vaccinated before they left, and when his pustule formed, a second boy was vaccinated from the first, arm-to-arm; and so on in a human-virus chain that sustained the vaccine during their months-long voyage across the Atlantic.
Degradation, especially from heat, is a general problem affecting organic material. There are two basic solutions: refrigerating (or freezing), and drying. Before refrigeration, or when it was expensive or otherwise impractical, such as in tropical regions during the World Wars, drying was necessary. The challenge is that the simplest way to dry a material is to heat it, and heat is what we’re trying to protect the material from. Further, drying would often cause proteins to coagulate, making it difficult to reconstitute the material.
The solution, developed in the early 1900s, was “freeze drying”. This technique involves rapidly freezing the material, then putting it under a vacuum so the ice “sublimates”: that is, water vapor evaporates directly off the ice without ever melting into water. A secondary drying process (involving mild heat and/or a chemical desiccant) removes the remaining moisture, and the result is dry material that has not been damaged in structure. If properly sealed off from moisture in the air, the material will last for a long time, even when subject to heat, and it can easily be reconstituted by adding water. Freeze-drying was first applied to blood transfusions in the 1930s; Leslie Collier, in 1955, found that it allowed the smallpox vaccine to last several months even at 37° C (98.6° F), which was suitable for tropical climates.
Think about all of the underlying technologies that are required to invent and scale up something like freeze drying. Progress is highly interconnected; it compounds.
briancpotter 18 May 2022 13:17 UTC
2 points
Re: other examples—true interchangeable parts, which was a major manufacturing advance, required a lot of advances in precision manufacturing. It had been attempted as early as the early 1700s, and was made much more feasible/cost effective by the invention of high-speed tool steel in the late 1800s, which made it possible to machine heat-treated parts. Interchangeable parts was, among other things, one of the technologies that made Ford’s assembly line possible (iirc, Ford was the very first car manufacturer to use interchangeable parts.) But as late as the 1940s, it was still expensive to get true interchangeability, and wasn’t always used.
tsungxu 16 May 2022 16:08 UTC
2 points
I’ve looked mostly at progress from an energy lens, and I think the upper bound constraint for progress is relevant there too.
Coal was restricted largely to space heating until the steam engine, which itself was restricted to stationary applications until the steam locomotive. Oil’s first beachhead was kerosene lamps, decades before internal combustion engines were commercialized. Electricity needed the build out of vast, centralized grids and large coal and hydro power stations. I wrote more about this in this section of a recent long read.
I’m also very interested in the question of how to best accelerate the “dark matter” ecosystem and fast track the next AlexNet in whatever domain it happens to be. I too would be interested to see examples of domains that require minimal infrastructure and dark matter.
Chris Leong 17 May 2022 13:33 UTC
1 point
For example, what could be done to make the AlexNet happen 10 years earlier?
I know it might be a heretical question on this forum, but do we really need to accelerate AI? Isn’t there some point at which we can say “fast enough”? Like if we could press a button a make AGI appear today, would be wise to press that button? Are we truly ready for the consequences of what would arguably be the most important moment in our entire history? Aren’t there enough other things in society that we could fix instead?