OpenAI's Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions
A professionally copyedited transcript of Jakub Pachocki's conversation with Jacob on Unsupervised Learning.
This is a professionally copyedited transcript of Jakub Pachocki's conversation with Jacob on Unsupervised Learning. It has been edited for readability and lightly formatted while preserving the entire substance of the discussion.
Made with: The Transcript Desk Chrome Extension
Full video: https://www.youtube.com/watch?v=vK1qEF3a3WM
Jakub Pachocki, OpenAI's Chief Scientist, sits down with Jacob to cover the full arc of where AI research stands today and where it's headed. The conversation spans the explosive growth of coding agents and what it signals about near-term AI capability, the use of math and physics benchmarks as proxies for general intelligence, how reinforcement learning is being extended beyond easily verified domains toward longer-horizon tasks, and what it means to run a research organization at the precise moment the models themselves are starting to accelerate the research. Jakub shares a candid take on the competitive landscape, why chain-of-thought monitoring is one of the most promising tools in the alignment toolkit, and why the concentration of power enabled by highly automated AI organizations is a societal problem that does not yet have an obvious solution.
Episode Guide
0:00 Intro
1:53 Research Intern Capability Timelines
4:59 Math Breakthroughs
7:59 RL Beyond Verifiable Tasks
12:32 RL vs In-Context
19:01 Allocating Compute Internally
28:18 AI for Science
31:40 Pattern Matching
33:23 Solving the Hardest Math Problems
37:40 Chain of Thought Monitoring
44:33 Generalization and Value Alignment in Models
47:57 Inside OpenAI
51:55 Quickfire
Transcript
00:00-00:13
Jakub Pachocki: I definitely agree that continual learning is the key. It's really the thing we're building. But I don't think this is a problem that's being ignored or is off the path of what we're currently doing. I think it is exactly what we're working toward.
Jacob: What are some other research areas within alignment that you're paying attention to or that you think are promising?
00:13-00:29
Jakub Pachocki: A lot of the longer-term challenges with alignment are about generalization. What are the values the model falls back on? What do you need to figure out to make models work well in some of these other spaces?
Jacob: I keep coming back to this. Jakub Pachocki is the Chief Scientist of OpenAI. I think he is literally one of the most important people on the planet.
00:29-00:51
Jacob: Today on Unsupervised Learning, I got to ask him everything I've been thinking about--and things I know many others in the ecosystem have been wondering too. We talked a lot about model progress, what's required to make long-running agents work, and the really interesting work OpenAI has done in the "AI for Science" world, as well as the progress he expects to see there over the coming years.
00:51-01:15
Jacob: We discussed how companies should think about model building right now, when they should utilize reinforcement learning, and how to think about the evolution of harnesses and their impact. We touched on much of his fascinating research, including his work on alignment and OpenAI's broader work on math competitions. We also talked about this "focusing moment" at OpenAI--what it means for the research organization and how he runs his team. This was such an awesome opportunity to talk to someone driving the change that has revolutionized this space and the world. I hope you enjoy this wide-ranging conversation as much as I did.
01:15-01:53
Jacob: Jakub, I feel like you're the perfect person to talk to about the questions everyone in the ecosystem has. People want to know what's happening with model progress. Companies are thinking about how to build products based on these models, and at a societal level, people are considering the impact AI will have on science and society at large. You've been at the forefront of this space through every generation of improvement over the past few years, so I'm really excited to have you on the podcast.
Jakub Pachocki: Happy to be here.
01:53-02:16
Jacob: I'll start with one of the most "juicy" things you've said. About four months ago, you and the OpenAI team mentioned aiming for a system with research-level intern capabilities by September of this year--which is only about six months away--and a fully automated AI researcher by March 2028. Checking in four months later, how are you feeling about those timelines?
02:16-02:42
Jakub Pachocki: Over the last few months, the biggest change has been the explosive growth of coding tools. Calling it "explosive" is an understatement. At OpenAI, we've reached a point where we use Codex for the majority of our actual coding. For most people, the act of programming has changed quite a bit.
02:42-03:05
Jakub Pachocki: I see this as a clear signal that we are on track. Another very interesting update for me has been the progress in math research capabilities, as well as results we've seen in physics and other fields.
03:05-03:29
Jakub Pachocki: This level of capability--the ability to provide insight when combined with infrastructure access and the ability to use more compute at test time (which is something we're currently using)--points to a strong improvement in general intelligence. I expect this to continue over the next couple of months. It's something we are still very much planning for and focused on.
03:29-03:45
Jacob: How do you know when you've reached that milestone? Is there a specific workflow you look at to say, "Okay, we've achieved research-intern-level capabilities"?
Jakub Pachocki: The way I distinguish a research intern from a fully automated researcher is the span of time the system can work autonomously and the specificity of the task required.
03:45-04:16
Jakub Pachocki: I don't expect we'll have systems this year where you can just say, "Go improve model capability" or "Go solve alignment," and they'll just do it. We might get there eventually, but for now, it's about more specific technical ideas. For example, "I have this specific idea on how to improve the models" or "Run this evaluation differently." I think we have the pieces; we mostly just need to put them together.
04:16-04:37
Jacob: Andrej Karpathy released a viral video showing how he uses these models to improve some of his own models--which are obviously less complex than what you're building here. Did that feel like it was in the spirit of what these tools might eventually look like?
Jakub Pachocki: Yes, I think it's in that spirit. I expect to see a steady evolution from where Codex is now toward more autonomy and longer runtimes.
04:37-05:03
Jakub Pachocki: We'll see a lot of those types of applications. In general, we'll see more autonomous and compute-intensive uses of these models for various tasks.
Jacob: You mentioned the math and physics side. You've had some impressive breakthroughs in math competitions. For our listeners, it's intuitive how coding progress translates to AI research, but how does progress in math and physics tie into this?
05:03-05:34
Jakub Pachocki: The biggest role that focusing on math benchmarks has played for us is serving as a general North Star for improving the technology. Math is highly measurable. It's much easier to determine if you've actually solved a math problem than it is to judge if you've produced a "good" piece of software. Furthermore, math problems can be arbitrarily difficult while still having a definitive "correct" answer.
05:34-06:07
Jakub Pachocki: Until recently, my perspective was that our models couldn't even solve simple math problems. Then they could solve simple ones, but couldn't handle IMO-level (International Mathematical Olympiad) problems. There was a clear, measurable gap in the intelligence of these models. It was very obvious what we needed to do, so math became our North Star for reasoning models.
06:07-06:41
Jakub Pachocki: Now, that is changing. We've reached milestones we've been working toward, like solving IMO Problem 6 and moving into research-level mathematics. At this point, there is still utility in measuring progress this way because there is a definite transfer from mathematical reasoning to AI research. Many of our best researchers are actually trained mathematicians or come from other theoretical fields.
06:41-07:10
Jakub Pachocki: However, we are definitely changing how we think about these North Stars. We are now very focused on how the next models we produce will be useful in the real world.
07:19-07:43
Jakub Pachocki: This shift is happening because we believe models are now capable enough--not as smart as people in every way, but capable enough to materially change the economy and how things are done, especially in research, other economically valuable activities, and applied sciences. Because of that, we feel a lot of urgency.
07:43-08:26
Jacob: In the early days, picking a domain like math was the perfect place to start because it's so hard to solve but easy to verify. Code obviously shares those attributes; it's easy to check and verify, which makes it great for reinforcement learning (RL). A question many people are asking now is: we've seen RL work incredibly well in domains where verification is easy, but what about medicine, law, or finance? There is some ability to verify there, but not to the same extent as math and code. Are we going to see similar improvements in those fields, given that the rates of improvement in code and math have been so astronomical?
08:26-09:10
Jakub Pachocki: I definitely expect so. An interesting duality we think about is that these more general tasks, which are harder to evaluate, share a lot of commonalities with long-horizon tasks. Even with a well-specified math or coding problem, if it's something you need to work on for a year, the long-term success criteria are clear, but what you should do on the first day is a very open-ended problem. I believe these difficulties coincide, and they represent the next frontier for how these systems develop.
09:10-09:24
Jakub Pachocki: We've seen very encouraging signs regarding our ability to scale RL in these more general domains. I think we can scale those efforts, and that holds a lot of promise.
09:24-10:02
Jacob: In these other domains, it feels like one of the hardest things is just knowing what "success" looks like for a task. The problems facing code and math--short-term versus long-term tasks--feel amplified in other spaces. A short-term legal or medical task might be harder to run thousands of iterations on to determine if it was done correctly, and long-term tasks are even harder. How do you conceptualize that research challenge? What needs to be figured out to make models work well in those spaces?
10:02-10:52
Jakub Pachocki: It comes back to the reality of how we make models work over a very long duration and how we teach them to evaluate partial progress. Even outside of RL, as models become more consistent through pure supervision in pre-training, they gain an idea of what a "good" partial artifact looks like. Even if we weren't scaling RL meaningfully, we would see these horizons elongate over time. It is a research challenge to figure out how to leverage new ideas from RL and apply them to general domains, but I'm quite optimistic.
10:52-11:41
Jacob: It sounds like part of your mental model is the models themselves being able to check their own progress at a reliable cadence. It's not totally clear if we've seen true generalization in RL yet. It feels like we have techniques to optimize models around whatever we choose to focus on, but it almost feels like an "old school" version of machine learning--tackling one thing at a time. Would you agree with that characterization? How do you see the current climate, especially since you are buying so much compute?
11:41-12:32
Jakub Pachocki: There is a certain amount of complexity that we--and everyone--need to grapple with. We are no longer just building a "brain in the sky" isolated from the real world. If you want a model to do medical research or cure cancer, it needs to learn about the real world in a meaningful way--perhaps by conducting experiments and learning from the results. You have to figure out how to connect it to the world. That involves the direction you described, but I don't think it runs counter to finding and scaling the simple algorithms we've been developing.
12:32-13:02
Jacob: I talk to a lot of companies, and the main question everyone is asking is: "Should we be doing our own reinforcement learning?" They might take an open-source model, use their own task data and evaluations, and try to optimize it. Does that make sense, or should they just wait for the base models to get better? What advice would you give to builders thinking through how much to invest in the RL side?
13:02-13:46
Jakub Pachocki: Reinforcement learning can be a very data-efficient way to improve a model on a specific task. However, there is an even more data-efficient way of learning: in-context learning. This is the most fundamental way people teach these models--prompting them with examples and instructions. I expect in-context learning to get much better over time. It really matters that models can adapt to your specific context and tasks. I'm not sure if replicating a full RL pipeline is the right way for most people to go, but it's definitely a problem we're thinking about.
13:46-14:32
Jacob: So, you still have to do the work--defining evaluations and gathering data--but in the future, you might be better off feeding that into the context rather than trying to train your own model. People have seen the success of tools like Codex, which you played a key role in, and they wonder if they should build their own harnesses for domains like law or finance, or just use the harnesses provided by large models within their own context. Any thoughts on that?
14:32-14:46
Jakub Pachocki: The implementation of the harness shouldn't be a limitation for very long. I think we'll be able to create much more general harnesses that people can use for all sorts of domains. Actually, Codex is quite good even if you try using it for things beyond coding.
14:46-15:00
Jacob: That's interesting--the idea of a general harness that is adaptive and works across whatever specific tools or data you want to expose to the model.
14:59-16:01
Jakub Pachocki: I think it's worth considering what the ultimate interface for interacting with these models should be. Currently, models provide some UI flexibility--they can build their own interfaces or perform tasks that humans find very time-consuming. However, I believe there is significant room to enable models to access the same interfaces we use. We want AIs on Slack, for example, that are plugged into our specific context, able to learn from it and act on existing information. There is a "meet in the middle" happening here, but long-term, the AI should meet you where you are by default. If it doesn't, it should be because it has new, superior abilities, not because of technical limitations.
16:01-16:38
Jacob: That's an interesting point. Today, these harnesses feel very bespoke to certain environments, but as you add more skills and tools, models will eventually navigate across them effectively, much like humans do. That makes a lot of sense. I'm curious--since you see incredible things on the research side every day--what milestones are still meaningful to you? What would make you think, "Wow, that's crazy," if you saw it during a run? What are you paying the most attention to?
16:38-17:01
Jakub Pachocki: At this point, it's really about the research itself. Can the model discover new things? Can it execute on a long-horizon research problem? I'm looking for the kind of insight where I'd think, "If someone on my team had come up with that, I'd be very intrigued."
17:01-17:16
Jacob: So you're looking for genuine novelty.
17:01-17:16
Jakub Pachocki: Exactly. We've actually seen some minor but impactful ideas come from the internal models we're using, like GPT-4.5 Pro. However, I think that's still very small compared to where I expect things to go.
17:16-17:40
Jacob: It seems inevitable that these models will improve and be used more broadly in research and science. You're one of the first people interacting with them almost as research partners. Have you learned anything about the right way to do that, or how a research organization might look as these models continue to improve?
17:40-18:26
Jakub Pachocki: We are definitely at a transition point where the immediate quality of the model is becoming a determining factor in the pace of our research progress, simply because the models are driving so much of it. This requires rewiring our intuitions about how to run a research organization. Normally, you try not to focus too much on immediate quality and instead look toward the long term. We have a lot of exciting long-term projects queued up, but I feel a great sense of urgency to execute on them now and use these advances in model intelligence to accelerate our research--especially in AI alignment.
18:26-19:01
Jacob: That's a fascinating point. In the past, running a research organization meant giving people space to pursue ideas that might not show progress for months but would eventually drive things forward. Now, it sounds like you're realizing that everything will be better if you focus on improving the model in the short term. Navigating those immediate needs alongside far-off research ideas must be a challenge.
18:56-19:01
Jakub Pachocki: It is. That's something Mark and I spend a lot of time on nowadays.
19:02-19:20
Jacob: OpenAI has a massive amount of compute. You have scaling laws for pre-training and RL, plus various experiments that don't fit into either category. How do you think about allocating compute across all these different vectors?
19:20-20:24
Jakub Pachocki: It gets complicated because there are so many things to do. One discipline we've adopted is explicitly budgeting a large chunk of compute to the most scalable methods--the ones we believe are most responsible for driving general model intelligence. Even if it's not the most efficient allocation at all times--because you could always use that compute to slightly accelerate many smaller things--it's easy to spread yourself too thin and fail to do the things that matter most. You have to understand the empirical evidence, ensure your evaluations and experimental rigor are solid, and then apply regularization. We ask: "Do we understand this method? Will it scale? Can we build on this in the future, or is it a one-off?" We determine priority based on those answers.
20:24-21:05
Jacob: It's interesting that you might leave some "low-hanging fruit" behind because the most important thing is finding the future direction and scaling within it. We've talked a lot about the success of coding models. Last year felt like a period of incredible "hill climbing" in that area. Codex has been successful, but Anthropic was also early to this market with Claude, which has been a dominant product there. Reflecting on that, what do you make of their success in that space?
21:05-22:38
Jakub Pachocki: I think it's a matter of focusing your product direction on where you believe the next application of the technology lies. If you look at our priorization, we have been working on coding products, but they were secondary compared to our main priorities. Interestingly, that wasn't reflective of the priorities within the research organization itself. ChatGPT was an explosive success in 2023, but that specific product doesn't represent everything the technology enables. The majority of our research has been focused on the "future thing," and that has increasingly decoupled from our short-term product strategies. However, I'm very confident in what we've been building on the model intelligence side. Our recent refocusing on the product side is about finally deploying those capabilities, based on the belief that they are what really matters now.
22:38-22:45
Jacob: And now it feels like the entire company's priority is locked in and focused on this, and we've seen incredible progress.
22:45-23:08
Jacob: The improvement in coding models in recent months has been incredible. For the developers listening to this podcast, it's almost hard to comprehend what the world will look like as these models continue to climb toward longer and longer tasks. What do you think will change in their lives, or how will they be using these tools in, say, three to six months? I realize three and six months are very different timelines in this world, but feel free to pick any point in between.
23:08-23:32
Jakub Pachocki: I would expect a gradual increase in the level of autonomy you feel comfortable giving the model--specifically regarding the vagueness of the descriptions it can work with and the level of supervision it requires. I don't think we are very far from models that can work autonomously for a couple of days. They might use significantly more compute than they do now, but they will produce much higher-quality artifacts on their own.
23:32-23:49
Jacob: Do you have a gut instinct on the skill set required? There's always been this question of whether you need a software engineering background to supervise these models over a multi-day run, or if it turns out that once they can run for a while, anyone can use and supervise coding agents to reach a successful output.
23:49-24:23
Jakub Pachocki: I think for many types of output, you already don't need much experience. However, the distinction I would draw between an intern and a truly autonomous researcher or software engineer is that if you want to build something larger, you still need to apply supervision. You need an overarching vision to recognize which building blocks fit together and which don't. I definitely expect the desired skill set to shift quite a bit toward that kind of general vision-setting.
24:23-25:02
Jacob: On the research side, it feels like a month ago all anyone could talk about was "continual learning." It was in the zeitgeist. All these new labs were starting up to focus on it, and some people even left OpenAI to pursue it. I think part of the belief behind that movement is that reinforcement learning (RL) alone won't get us there, or that it will lead to very inefficient scaling that differs from how humans learn. I've even heard you say that RL today is very different from human learning. What's your take on that whole movement?
25:02-26:02
Jakub Pachocki: I'm a little confused by it because, in my mind, the whole excitement surrounding this class of models--even going back to the GPT-3 paper--is that they are capable of continual learning. They are capable of learning to learn in context. That has been the driving force behind the push to scale GPT models further. It's also the premise for why we need to teach them with RL: to help them learn in context more efficiently. So, I agree that continual learning is the goal, but I don't think it's a problem that is being ignored or is off the path of what we're currently doing. It is exactly what we are working toward.
26:02-26:11
Jacob: So, in your mind, the single best path to get there is to continue scaling pre-training and RL?
26:11-26:58
Jakub Pachocki: I think that is how we've made the most progress on this problem so far. There are certainly more ideas and steps to take, and many improvements will simply come from scale.
26:11-26:58
Jacob: A lot of people listening might have used these models for simple things, but when they try a complex, 100-step, long-term task, they find the models don't work yet. On the inside, you see constant improvement, but for them, it feels like we're a lifetime away from solving those longer tasks. How do you articulate the things that need to happen for those longer steps to become possible? Is it about the "checking in" process you mentioned earlier? There seems to be a belief in the research community that these tasks will be solved in the next year or two, but the public might not be fully grasping that trajectory.
26:58-28:02
Jakub Pachocki: A lot of that prediction comes from looking at historical improvement lines, and I think we can increasingly see the shape of things to come. Much of this is about the models becoming intelligent enough to recognize whether they are making progress. Some of it is very pragmatic work: can the models actually access the context, files, and infrastructure they need to do the job? When we were discussing the RL roadmap in the past, I viewed teaching the model to reason with its own tokens as the priority. Eventually, it needs to use tools and the environment. At some point, we need to teach it to see and, eventually, to use a physical body. We are now well into the stage where the model needs to interact with the environment and see. Someday soon, we'll really care about robots.
28:02-28:19
Jacob: It often feels like when people complain that a model can't do X or Y, it's simply because it hasn't been connected to the right systems or given enough context. I wonder if many of these problems would be solved with today's models if context could flow into them universally.
28:19-28:58
Jacob: I want to talk about the "AI for Science" work you've been doing. Everyone feels the impact of coding tools viscerally--companies are seeing huge productivity gains. On the math side, however, not all of us competed in IMO competitions, so we don't necessarily have an intuitive feel for those breakthroughs. One interesting project you worked on involved the "1st Proof" challenge. These seem like very different problems compared to traditional competition math. Could you speak to that? It's a space our listeners might be less familiar with, and I'd love to understand the implications of models doing high-level work there.
28:58-29:54
Jakub Pachocki: I was very excited about the 1st Proof challenge. I view that particular one as a benchmark where respected mathematicians and theoretical computer scientists release problems they believe represent their day-to-day work but haven't been published anywhere, so we can see how the models perform. We were excited about it, but the challenge was dropped without warning with only a week-long deadline. We had a very exciting model in training at the time, and James Lee, who was in charge of training, started prompting that model by hand. Seeing the model actually solve those problems was fascinating. One of the problems was actually from the domain where I did my PhD, and seeing the model tackle it was incredible.
30:02-30:32
Jakub Pachocki: I've seen the model come up with ideas in an hour or so that I would have been quite proud to come up with myself in a week or two. It's a very weird feeling. In the past, the only time I felt like that was watching our Dota bot play interesting games infinitely. It felt like magic because, usually, interesting things don't just happen indefinitely.
30:32-30:45
Jacob: Yeah.
30:32-30:45
Jakub Pachocki: Seeing that happen for math--something I believe is quite representative of, or a precursor to, the work that really matters in the world--definitely increased my sense of urgency.
30:45-31:17
Jacob: One fascinating thing is that you throw these problems at the models and nobody knows how good they'll be at solving them. It must be incredible to see a space you know so well and realize that the previous generation of models wouldn't have been able to do it. You might not have even thought it was the right benchmark to use, but it demonstrates general-purpose capabilities and improvements.
31:17-31:41
Jakub Pachocki: We're at a stage where we need to seek out experts in specific domains to tell us whether these particular proofs are correct. However, it's still much easier to tell if you've made progress in math than in something like coding. With competitive programming, you can evaluate it, but most programming isn't like that. It's about whether the abstractions are right and if you're handling all the edge cases.
31:41-32:04
Jacob: A year ago, there was a common criticism--though it's less strident now--that these models are just pattern matchers. People argued that for "AI for Science," we wouldn't get entirely novel ideas out of pattern matching. It feels like we continue to chip away at that narrative. Are we getting closer to fundamentally disproving it?
32:04-32:30
Jakub Pachocki: I believe so. Right on schedule, we're starting to see minor advancements--a small idea here or there, and perhaps some larger papers in collaboration with scientists. But was AlphaZero just a pattern matcher? Was AlphaGo? Our Dota bots came up with entirely new strategies for their respective games.
32:30-32:33
Jacob: It's funny that there are counter-examples to that critique going all the way back to 2016 or 2017.
32:33-33:23
Jakub Pachocki: Right. You can always find flaws in them--AlphaGo can be beaten with a specific strategy, and our Dota bots could have been beaten too. There will be debates about the definitions of these models for a while. But they are able to discover new things because they have these capabilities. It has taken a couple of years to go from tiny game environments to general scientific research. That required processing a decent approximation of all human knowledge and learning all human languages in the meantime, but the basic principle remains very similar.
33:23-33:47
Jacob: When you had those first proof results, I remember the organizers commented that the AI solutions felt like 19th-century mathematics--brute-force, computation-heavy approaches rather than elegant modern techniques. I'm not sure if that's a feature or a bug of how these models work, but does hearing that concern you or excite you?
33:47-34:13
Jakub Pachocki: It doesn't concern me; it's expected. For at least one of the problems, our model actually produced a very nice proof that was quite a bit shorter than the intended one. In general, you'd expect these models to produce much more reasoning in a short time than a person can, simply in terms of the raw number of tokens or thoughts. I don't expect that "brute force" feel to be a long-term feature.
34:13-34:44
Jacob: There's so much momentum behind AI for science right now. You mentioned that at some point you have to connect these models to the physical world, and OpenAI has released things like GKO and other experiments. As you've dug into this, have you developed an intuition for which areas of science will see crazy progress in the next three years versus those that might be more resistant to change?
34:44-35:01
Jakub Pachocki: A tempting answer would be that it depends on how much manual work is required or where the models aren't yet "plugged in" to the ecosystem. However, I think different laboratories will evolve very quickly to adopt these new technologies within STEM fields.
35:01-35:30
Jacob: There's a question of whether we'll use a general LLM with access to the physical world, or if specialized companies will lead the way--like Isomorphic in biology, Periodic in material sciences, or Physical Intelligence in robotics. What's your gut instinct? Does it make sense to pursue these with different model architectures or all within one context?
35:30-36:16
Jakub Pachocki: It's similar to my view on the UI for Codex: I would build around the capabilities of the technology, not its limitations. If you have something that can suddenly design a huge number of interesting chemical or biological experiments, it makes sense to build labs that enable that. If a model becomes very capable of designing high-quality experiments, it also makes sense to have it work with humans in the loop. We shouldn't think of it as either full automation or just a side tool. We will reach a world where it's very natural to collaborate with AI scientists who are working hard on a problem.
36:16-36:32
Jacob: That's an interesting vision. One world is where you train a model to be an automated, end-to-end biologist or chemist. The other is building tools that propose and run experiments in tandem with human researchers.
36:32-37:25
Jakub Pachocki: I wouldn't necessarily categorize them just as "tools." We will get to a point where they are driving much of the design and ideation for the whole process. Using an LLM architecture, they'll be able to figure out the right experiments to run and then actually design them. Regarding different architectures, prioritizing natural language reasoning gives you a lot of generality. There are certainly things you might want to train a specific model for--for example, if you want to create a very good weather model, LLMs might not be the most efficient way to go about it, even if they could eventually result in the best model.
37:23-37:39
Jakub Pachocki: Eventually, I think it will be similar for protein folding or other tasks of that kind.
Jacob: Yeah. So you think it makes sense to have some independent efforts around that, but obviously, those will end up being paired with a core, high-quality researcher large language model that helps drive a lot of this work.
37:40-38:00
Jacob: Yeah. I also want to make sure we talk about AI safety because that's an area where you've done a lot of pioneering work. I'm not sure all our listeners will be familiar with it, but you actually did some really interesting work across different labs focused on chain-of-thought monitoring. To start, could you tell us a little bit about that work and what you found?
38:00-38:44
Jakub Pachocki: Yeah, so this came from a realization we had around the time we saw the first reasoning models of the current crop. We realized, "Okay, this works," and we thought a lot about what that meant. We figured the world would probably change significantly over the next two or three years. We were thinking about what this meant for safety and our ability to understand what these models are doing. We realized that because of how we train these models, we don't supervise the reasoning process directly. For example, ChatGPT is trained to be polite and nice--
Jacob: And it always tells me I have great ideas.
38:45-39:35
Jakub Pachocki: Yeah. Well, that's a separate issue. But even assuming it's aligned exactly the way we want--which it definitely isn't, it's still a bit sycophantic--there are still things it won't reveal about its motivations in real-time. Maybe it thinks revealing them would be unsafe or unkind, or maybe it isn't actually aligned the way we think and it wants to hide that.
Jakub Pachocki: The way we train reasoning models, the chain of thought doesn't have any of that. It isn't optimized to be any particular way because it isn't directly supervised. It's only "great" in how it relates to producing a high-quality final output. We realized this is actually a very powerful paradigm for interpreting what the model is doing.
39:36-40:18
Jakub Pachocki: It's not actually that different from the idea of mechanistic interpretability. In mechanistic interpretability, you have these model activations that aren't directly supervised to predict a specific label. They are indirectly supervised, but the model has never been trained to expect an inspection of those activations. Therefore, those activations might reveal something about its inner workings.
Jakub Pachocki: The big advantage of chains of thought is that, by default, they are in English. This makes it much easier to understand what is going on, especially as the concepts become more advanced.
40:19-41:05
Jakub Pachocki: The other interesting thing is what we were just discussing: we believe in a future where these models work autonomously for long periods. There will be much more of this reasoning happening. If this is a major axis for how model capability increases, then our ability to supervise them will scale commensurately.
Jakub Pachocki: This really comes down to the principle that you aren't supposed to supervise the chain of thought. When we were originally releasing the preview model, we made the decision to hide the chains of thought.
Jacob: Yeah, I remember that.
41:06-41:30
Jakub Pachocki: For me, that was the primary motivation. I didn't even want to consider releasing it any other way. There was definitely some internal discussion about it, but I felt very strongly that we should hide it for this reason.
Jakub Pachocki: There was another concern that I didn't initially think about, but which I think is also very valid: the idea that the model would be distilled to some extent if the chain of thought were public. That has definitely been a big factor as well.
41:31-42:02
Jakub Pachocki: But I actually think allowing the models some sort of private space is important. Why do I think it's important not to show the chain of thought in the product? Well, if we established a paradigm where you just show the chain of thought to the user, eventually you would have to train it. You'd have to train it for the same reasons you train any model you ship--to be helpful and harmless.
42:03-42:33
Jakub Pachocki: I don't think we actually want to know every step of the chain of thought a model uses to get to a response. It will be useful to some extent, and we are trying to capture most of that value through things like chain-of-thought summaries, which I think are a bit of a stopgap.
Jakub Pachocki: The longer-term solution is having the model actually talk to you in real-time, which the latest versions of the reasoning models are starting to do. I think that will get much better. But there is something very exciting here about not having the training signal fight against us.
42:34-43:10
Jakub Pachocki: If you want to understand what a model does in the long term, but you are scaling a method that directly works against that transparency, you're probably not going to have a good time. That's the other side of the "bitter lesson."
Jakub Pachocki: This decoupling gives me a lot of hope for our ability to at least understand how these models' motivations and generalizations evolve as they get better and work for longer durations.
43:11-43:26
Jakub Pachocki: I don't think it's a complete solution to AI alignment by a long shot. It's just another tool in our toolbox. But I am hopeful that by building our toolbox with technical tools like this, we can continue chipping away at the fundamental problems.
43:27-43:34
Jacob: Yeah, it seems like something that will be incredibly helpful over the medium term, even if it isn't a catch-all solution for long-term alignment.
43:35-44:13
Jakub Pachocki: Exactly. It's a tool that helps us build an understanding of long-term alignment. For example, there has been some very exciting work from a collaboration with other labs on "model scheming." They investigate whether a model is prone to pursuing hidden objectives depending on the environment and how it's trained. That entire line of work is enabled by chain-of-thought monitoring--the ability to actually inspect what the model's motivations are.
44:14-44:32
Jakub Pachocki: That might lead us toward completely different mitigations. Maybe the right way is changing the pre-training data, or perhaps something like "inoculation prompting" from Anthropic. Those are very interesting ideas, but having the ability to understand the model first is foundational to evaluating them.
44:33-44:40
Jacob: It's almost foundational for any further area of research. What are some of the other research areas within alignment that you're paying attention to or that you think are promising?
44:38-45:49
Jakub Pachocki: I think a lot of the longer-term challenges with alignment center on generalization. We can train our models to perform well and mostly control their behavior within the distribution we train for. However, the worrisome part is what happens when a model is asked to do something entirely different, finds itself in a novel situation, or becomes much smarter than it was before.
Jakub Pachocki: When a model develops capabilities we haven't yet figured out how to train for, we have to ask: what values does it fall back on? In that sense, the study of long-term value alignment is really a study of generalization. One line of research I'm particularly excited about--and something we are investing in heavily--is understanding how that generalization falls back onto the pre-training data. There is quite a lot to explore there.
45:50-46:03
Jacob: Over the last six months, have your concerns around alignment increased or decreased? Where do you see the overall trend of this work heading?
46:04-47:37
Jakub Pachocki: Regarding the long-term challenges of alignment--specifically what happens when you have very intelligent models--my thinking has evolved over the past few years. It went from seeing it as a nebulous, grapple-heavy problem to realizing we can make progress through concrete technical solutions and insights.
Jakub Pachocki: This is why we view alignment as a core part of our research. We are designing our reasoning models with this in mind and conducting alignment research specifically tailored to these models. My belief that there is a research path leading to an extremely positive outcome has increased significantly.
Jakub Pachocki: At the same time, my timelines for highly capable models have shortened. We aren't that far away. While I don't think these models will be smarter than humans in every single way immediately, they will be incredibly transformative. I'm optimistic that we can maintain a grip on the alignment problem and evaluate the risks, but as an industry, we must be prepared to make trade-offs--and possibly slow down development--depending on what we observe.
47:38-47:56
Jacob: It's interesting to see this work happening across major labs. You've collaborated with Anthropic and DeepMind on some of this. Has that collaboration emerged organically? Is there a lot of "alignment talk" between the major players, given that the three of you are at the forefront?
47:57-47:57
Jakub Pachocki: There is definitely some. There is a shared interest in these topics.
47:58-48:42
Jacob: I want to shift gears to the inner workings of OpenAI. No company has drawn more global interest over the last few years. We touched on this earlier, but you've mentioned that an important part of your job is giving researchers the space to be "cave dwellers"--to think about what models will look like years from now.
Jacob: However, we are also in a period of massive competitive pressure, with everyone going full-throttle on coding models. How do you actually operationalize that balance today? Has your thinking changed on the right way to oversee an organization in this environment?
48:43-49:20
Jakub Pachocki: I focus on high-quality experiments and recognizing whether we are actually making progress. It's about being honest with ourselves and promoting honesty regarding our results. That hasn't changed. Even though our work evolves, we still have a lot of fundamental work left to do. It's not a matter of wrapping up all projects as quickly as possible. The fundamentals remain the same; what changes is the level of urgency to bring the most promising ideas to fruition.
49:21-49:42
Jacob: OpenAI has had some very public internal moments over the years. As you reflect on the last seven or eight years of your life here, what were some of the difficult, "51/49" decisions that defined the company? What key moments stick out to you?
49:43-51:55
Jakub Pachocki: There have certainly been a number of dramatic moments. However, I think the company underwent the most significant changes through gradual shifts in how it operates rather than snap decisions.
Jakub Pachocki: OpenAI has gone through several phases. When I joined in early 2017, it felt like a very academic lab pursuing many different ideas without much focus on practical scaling. The first big change came with GPT and our first data products. We realized we had to buy massive computers, scale things up, and develop both the science of scaling and the necessary infrastructure. That started our second phase: scaling. We still pursued basic research, but we began evaluating ideas based on whether they were scalable.
Jakub Pachocki: Then came the period leading to ChatGPT. I actually expected things to look a bit different. I was pleasantly surprised that text models were the first to really take off. I originally thought video-style generative AI would be the first big breakthrough and that we would have to make trade-offs to continue pursuing long-term text-based research.
Jakub Pachocki: We anticipated the tension of having a product that is popular now but knowing it needs to evolve significantly to reach the end goal. We've been in that phase for a while. Now, we believe we are entering the phase of deploying models that are truly economically transformative--essentially starting to deploy AGI.
51:53-52:03
Jacob: No, it certainly seems that way. Well, I guess we always like to end interviews with a standard set of quickfire questions, which are basically me just stuffing in all the overly broad questions I couldn't fit anywhere else.
52:03-52:10
Jacob: So, if you'll shamelessly indulge me, to kick it off: what is one thing you've changed your mind on in the AI world in the last year?
52:10-52:48
Jakub Pachocki: I think it's really about starting to reconcile the tension between the AI you build ultimately affecting the world and the fact that, until you get pretty close, it feels like a theoretical thing where you're just training and developing algorithms. Recognizing that we now really need to make a lot of progress and focus on how we are actually deploying this technology has definitely been something I've been thinking about a lot lately.
52:48-53:06
Jacob: Yeah, it's so interesting. Outside of ChatGPT, it was almost more of an abstract research hill-climbing exercise with some real-world usage, and then in this last year, we've obviously seen it--primarily via coding agents--trickle in in a pretty massive way.
53:06-53:14
Jakub Pachocki: I believe AI is going in the same direction as the coding models, where it's actually going to be something very useful and a meaningful part of people's lives.
53:14-53:23
Jacob: When you say "going in the same way," do you mean executing longer-term tasks, or something else?
53:23-53:31
Jakub Pachocki: That's part of it, but also just becoming a dependable, trustworthy assistant or companion.
53:31-54:05
Jacob: It's amazing to watch the way younger people use ChatGPT. I'd argue it's already pretty much there for a lot of folks in high school and college who seem increasingly comfortable using it. Now, I wouldn't be a shameless podcaster if I didn't ask a top researcher for timelines on a few things. I think the stuff outside of the core LLM world is particularly interesting, and there's a lot of buzz around robotics these days. Do you have any insight--obviously, it's hard to pinpoint a moment when robotics "works"--on finding scaling laws or a "ChatGPT moment" for robotics?
54:05-54:23
Jakub Pachocki: I definitely think there are very promising algorithmic ideas there that I believe are going to work, which are not too dissimilar from the space of ideas we've discussed. So, I'm quite optimistic about the timelines there, although I do think they are longer than those for virtual AI.
54:23-54:38
Jacob: Obviously, you're always thinking about the next frontier for what these models can do and the impact on society as a whole. Given the pace of continued model improvement, what is one thing you think we are "under-thinking" right now as a society in terms of the impact of these models?
54:38-55:25
Jakub Pachocki: I think getting to a point where so much intellectual work can be automated comes with significant problems that don't have obvious solutions. One natural issue is the question of jobs and the concentration of wealth, which I suspect will require real involvement from policymakers. I've heard some optimistic takes on how this gets resolved, but at a fundamental level, things that used to be very valuable and costly can now be done quite cheaply. In the long term, that should be a good thing, but I think the transition can happen quite quickly.
55:25-56:03
Jakub Pachocki: There is a related question: if you actually have an automated research laboratory or an automated company that can do so many things, it could be controlled by a very small number of people. It can achieve a lot, and this gets even crazier when you add robots, but you don't even need robots for that to be true. Figuring out what the governance of such things looks like--organizations that are incredibly powerful yet perhaps made up of only a couple of people--is a new question we have to grapple with as a society.
56:03-56:21
Jacob: Speaking of new questions, one thing that's very top of mind for me is that I recently had a kid, and I've been thinking a lot about what his life will look like in ten years. Since you're so close to this work, how has your work on AI changed the way you think about how the next generation should be raised?
56:21-57:15
Jakub Pachocki: The task for all of us is to build AI and build a world in a way where, at the end of the day, humans retain agency and set the direction. Perhaps many of the technical challenges we cherish right now will become more of a pastime than something we strictly need to do to make progress. The challenges will shift more toward figuring out what is important and what we should go do. In that world, people can end up with more things to do--and definitely more exciting things to do. I think you still want to have an understanding of technology and a basic education, however you choose to acquire it, for the sake of being able to think through these problems.
57:15-57:35
Jacob: Well, this has been fascinating. I really appreciate you sitting down and talking about so many different things. I want to make sure to leave the last word to you. Is there anything you want to point our listeners to--whether it's research you're doing, products you're excited about, or anything else you'd like to plug? The floor is yours. I'm sure there are tons of threads people want to pull from this conversation.
57:35-58:05
Jakub Pachocki: I think the set of problems we just discussed, along with questions around alignment and monitorability, are becoming very urgent challenges. I don't think these are challenges only for AI researchers; they are challenges for policymakers and things we have to think through as a society. I'm happy to see some discourse starting to arise, and I think we need more of it.
58:05-58:14
Jacob: I could talk to you for hours more, but I'd be doing the world a great disservice by keeping you from your actual work of continuing to improve these models. Thank you so much for doing this. This was a ton of fun.
58:14-58:14
Jakub Pachocki: Thank you.
58:14-58:40
Jacob: I'm Jacob Ephron, and this has been Unsupervised Learning, a podcast where I talk to the smartest people in AI and ask them questions about what's happening with models and what it means for businesses and the world. As I hope is clear, I have a ton of fun doing this. It's a nights-and-weekends project in addition to my day job as an investor at Redpoint. Our ability to get these incredible guests really comes from folks like you subscribing to the podcast and sharing it with friends. That is what ultimately makes this whole thing work. So, please consider doing that. Thank you so much for your support and for listening. We'll see you next episode.
Made with: The Transcript Desk Chrome Extension
