Gemini co-leads on project origins and what's next | Edited Transcript
A professionally copyedited transcript of Logan Kilpatrick’s conversation with Jeff Dean, Koray Kavukcuoglu, Noam Shazeer, and Oriol Vinyals on Gemini’s origins, Gemini 3.5 Flash, Omni, world models, distillation, and self-learning agents.
Chapter Timestamps
00:01 Jeff Dean on Gemini as a unified model and the “twins” origin of the name
00:46 Logan Kilpatrick introduces Jeff Dean, Koray Kavukcuoglu, Noam Shazeer, Oriol Vinyals, and the Gemini 3.5 Flash launch
01:15 Gemini 3.5 Flash: multimodality, tool use, coding, and agentic capabilities
02:42 Product feedback as part of model improvement, not just launch packaging
04:41 Why one unified model could become Google’s intelligence engine
06:19 Why Google Brain and DeepMind needed one focused Gemini operation
08:52 Scale, pooled resources, Pathways, sparse models, and multimodal ambition
10:04 Omni and world models: video, physics, scientific data, and richer modalities
13:05 The people history: Jeff Dean recruiting Noam Shazeer and early Google AI relationships
15:17 DeepMind’s origin story and the early Jeff Dean–Demis Hassabis connection
16:43 Koray’s route through DeepMind, AlphaGo, AlphaFold, AlphaGeometry, and Gemini
18:56 The positive surprise: each Flash generation catching or beating the previous Pro
20:29 Distillation, Search spelling correction, and old AI assumptions becoming real
23:05 What still disappoints: models need far better data-efficient learning
24:29 Model capacity and extracting more capability from every token and example
26:13 Why humans learn efficiently: interaction with the physical world and richer feedback
27:14 Disagreement, experimentation, small-scale tests, and data-driven research culture
30:02 Why Gemini needs hardware, model design, product, and organization-level coordination
31:52 What may come next: self-learning, agents improving Gemini, and model-generated model components
33:22 Beyond chat: scientific breakthroughs, agentic experiences, and daily product use
34:26 One universal AI product versus separate user experiences and product surfaces
37:08 Product boundaries: glasses, Search, human focus, and separation of concerns
39:13 Future interfaces, brain-computer jokes, and moving from bits to atoms
39:54 Personal projects, consumer AI use, home automation, and knowledge-base building
41:19 Closing reflections from Logan Kilpatrick
Made with: The Transcript Desk Chrome Extension
Full video:
A professionally copyedited transcript of Logan Kilpatrick’s Google for Developers conversation with Jeff Dean, Koray Kavukcuoglu, Noam Shazeer, and Oriol Vinyals on the origins of Gemini, the Google Brain and DeepMind merger, Gemini 3.5 Flash, Omni, world models, distillation, data-efficient learning, self-learning agents, and what may come next.
Transcript
00:01-02:18
Jeff Dean: Even before we started the Gemini effort, many people were thinking about building incredibly general-purpose models. Oriol was leading some efforts at DeepMind, and I was helping steer projects around Pathways, PaLM, and PaLM 2. I eventually realized it was silly to fragment our efforts and our compute. If we were going to build an incredibly powerful model, we needed to come together and work on a single, unified model. That’s actually where the name Gemini comes from—the twins.
Koray Kavukcuoglu: We mapped and then we reduced.
Jeff Dean: Exactly, something like that.
Oriol Vinyals: I thought it was because I had twins!
Jeff Dean: That too, that too.
Logan Kilpatrick: Hey everyone, how’s it going? My name is Logan Kilpatrick, and I’m on the Google DeepMind team. Today, we’re talking with Jeff, Koray, Noam, and Oriol about all things Gemini—the origin of the project and so much more. Thank you all for sitting down to chat. We’re here at Gradient Canopy, and we’ve just launched the Gemini 3.5 era of models, starting with Flash. To contextualize the moment, this is roughly the third generation of Gemini models, and we’ve shipped a lot of versions in between. Oriol, maybe you want to fill us in on where we are with Gemini 3.5?
Oriol Vinyals: Yeah, we could each take a turn. To provide some context, I believe we started in early 2023. Since then, we’ve had several releases, including some mid-generation updates.
Logan Kilpatrick: Yeah.
Oriol Vinyals: We’ve been building on a foundation of multimodality, tool use, and agentic capabilities from the very beginning, and we’ve just kept scaling those up. It’s exciting to be releasing the Flash version of 3.5 today. It’s a very powerful series, and the focus of this release is primarily on coding, while of course preserving and enhancing our other capabilities. I think everyone feels that coding capabilities and agentic experiences are currently defining what it means to interact with AI, and 3.5 represents a huge step forward.
Koray Kavukcuoglu: Experiencing AI through a model like 3.5 Flash is a massive step forward. Everyone is starting to recognize it as a very strong model. In some ways, these big release moments have actually become less exciting because the public launch isn’t the only thing on everyone’s mind.
02:18-04:41
Koray Kavukcuoglu: What I’m really focused on is what I’ll be using tomorrow for my own engineering and research. I think about what my colleagues in the office will be using for their work—will they be happy with it, or will they be complaining to me? That day-to-day progress is what’s truly fun and exciting.
Logan Kilpatrick: Reflecting on the initial moment when the Gemini project came together and we shipped those first models, was it obvious to you all that the product story would be so vital? Obviously, at Google, we bring AI to customers through many products, but was the intention always to use those products to actually improve the model itself? Was that a deliberate goal from the start, or did it become obvious over time as use cases became much more complex than the initial version of Gemini?
Koray Kavukcuoglu: I’m actually curious to hear what you all think.
Noam Shazeer: Well, for me, that’s my job.
Jeff Dean: I can weigh in on that. I think it was actually quite obvious. If you have a model used by a large number of people, you’re going to gain invaluable lessons about what’s working and what isn’t. We’ve seen this in Search for many years; user behavior informs us where we’re falling short and where we need to improve. Aggregating those usage statistics to understand the model more deeply and then working to improve those specific areas is essential. AI models should be no different.
Koray Kavukcuoglu: It seemed pretty obvious from the beginning, but we had to have something out there that people were actually using.
Logan Kilpatrick: Yeah, that’s the true test—people using it and finding it useful. If you just stay in a bubble and try to hill-climb benchmarks, you end up just optimizing for those benchmarks. You might even leak the benchmark data into the training, and it just doesn’t end well.
Koray Kavukcuoglu: You don’t want to build intelligence in a black box; you want it to be useful. You want people to use it so you can understand what’s actually required. Pushing the frontier means both advancing technical research capabilities and uncovering the next set of features that will empower users. You can’t achieve that without integrating the research into actual products. Those two things go hand in hand to define what “the frontier” really means.
04:41-06:52
Oriol Vinyals: When the Gemini project started, there were already many machine learning models making their way into products. What seemed obvious to us was that if we created a single model that was more powerful than the average of all those individual models, it would represent a massive leap forward. Whether a single product could be built around one model might not have been as clear at the time, but it was very clear that consolidating all our compute and intelligence into one powerful model would leapfrog many of the things Google was already using machine learning for. It was exciting to be given that amount of compute and responsibility, and I think it has proven to be the core engine of Google’s intelligence.
Jeff Dean: Even before we officially started the Gemini effort, many people were thinking about building incredibly general-purpose models. Oriol was leading some of those efforts at DeepMind, and I was helping steer projects around Pathways, PaLM, and PaLM 2. Eventually, I just said, “This is silly. We are fragmenting our efforts and fragmenting our compute.”
Jeff Dean: We were fragmenting our efforts and our compute. If we were going to build an incredibly powerful model, we needed to come together and work on a single project. That’s actually where the name Gemini comes from—the twins.
Oriol Vinyals: We mapped and then we reduced.
Jeff Dean: Exactly, something like that.
Noam Shazeer: I thought it was because I had twins.
Logan Kilpatrick: That’s true, you do! Jeff, that’s a great segue back to the formation of the Gemini project. I’m curious: how controversial was that decision? Obviously, looking back now after three and a half iterations, the organizational complexity of bringing those teams together is behind us. But at the time, was it blatantly obvious that we wouldn’t win or deliver the right products for our customers if we didn’t do this? Or did it originate as more of a “pie in the sky” idea? What was your level of confidence?
06:52-09:17
Jeff Dean: I was certain that coming together was the right thing to do. I actually articulated it in a half-page memo, explaining that it was silly to fragment our efforts. We should probably release that memo somewhere. It felt like we were splitting our best ideas across different research teams that weren’t really collaborating, while also fragmenting our compute. Both of those issues needed to be fixed. It was organizationally complicated, especially with the time zones—having large teams in London and here, eight hours apart, is never a recipe for easy collaboration. However, I think we’ve done a great job navigating that. We now have an amazing unified team all over the world.
Oriol Vinyals: And we’re cranking out good models.
Logan Kilpatrick: There were a bunch of teams building LLMs at the time that you basically just needed to mash together. At some point, AI research became less like separate academic exploration and more like building a focused operation.
Koray Kavukcuoglu: AI research used to be much more academic. If you go back ten years, the way you organized a team wasn’t the key element; it was more about the exploration. While the speed of exploration is always important, as things become more focused, you really need what Jeff mentioned: a focused operation. Instead of trying to build things in parallel, we realized these projects require immense focus and effort. Each one is a major operation involving many researchers coming together to solve complex problems.
At that point, it became clear that it was time to change. Both organizations acted with great urgency to enable that transition. Bringing two organizations together is never easy, but everyone realized it was the right moment and that there was huge value to be gained. I think the entire organization is very proud of what we’ve built together. Gemini is truly the fruit of that collaboration.
Noam Shazeer: It’s about the scale. When you build one massive, beautiful LLM, it can do so many things. To achieve that, you really do need to pool your resources—bringing together that many people and that much compute.
Oriol Vinyals: Exactly, you need the infrastructure teams, the data teams, and so on.
Noam Shazeer: Yeah, it’s much better to have one of those unified teams than five small, fragmented ones.
09:17-11:42
Jeff Dean: I agree. It’s better to have one definitive model than five separate ones. From the start, we wanted Gemini to be—well, even before Gemini, one of the origins of the Pathways project was to explore a single model that could do many things. We wanted a multimodal model that could handle all different modalities and a very large, sparse model where you could activate different components for different tasks. All three of those concepts are represented in the Gemini models we have today. Now, with our latest developments, we’ve really advanced the multimodal aspects to the point where we can even generate and edit video.
Oriol Vinyals: We can now even generate video. We used to be able to just generate images and audio, which was already pretty awesome, but now you have the full capability of this amazing reasoning model that can handle multiple input modalities and even edit the video it just produced.
Koray Kavukcuoglu: I think Omni represents a whole new capability. We had Veo and Imagen, of course, where you could do text-to-video and text-to-image. But what you really want is a model that understands all the modalities of the physical world—one that understands physics and everything else—combined with text. There is a vast amount of high-level information about the world contained in text as well.
Logan Kilpatrick: Koray, I have a quick question about that. During the I/O keynote, we framed Omni within the “world model” section. I’m curious: does it actually incorporate a lot of the Genie world model research, or is that more of a positioning for the next stage, where the model takes in anything and outputs anything, and that becomes our representation of a world model? That wasn’t abundantly clear to me, and I hadn’t thought about it that way before.
Koray Kavukcuoglu: I’ll give you my opinion, and I know Oriol has worked on these things extensively as well. World modeling means you truly understand dynamics, physics, and visuals. You also have to be able to simulate them. That simulation aspect is critical, both for us to verify if the model understands things correctly and for when you want to rely on the model to “roll forward” a simulation. The decisions coming out of the model should be based on those future simulations.
That’s why I think Gemini Omni is in a different category. It transforms what we had with the original Gemini—which was mostly focused on understanding and text output—and Veo—which was text-to-video modeling—into a true world model by training them jointly. The hope, of course, is that everything transfers; making a better text-understanding model should inherently help the world-modeling aspect. I think we’re seeing this every time we iterate.
11:44-14:14
Oriol Vinyals: I think we’re seeing this every time we try. It isn’t easy, but as we get the recipes right, we see the results. Back in the day, if you wanted to roll out a complex video scene with forward consistency, you had to manually think about those things and almost pre-specify how to get the visuals right over time. Often, when an object turned, it would just disappear.
Now, simply by training at scale and mixing more data, we’re seeing these capabilities emerge naturally. That was the main premise we were putting forward. Finally, we’re going to be outputting amazing, consistent 3D worlds and sounds—all of it. If you had asked me a few years ago, it would have felt almost impossible that this approach would work. If we had known, we probably would have done it ten years ago, but it is finally happening.
Jeff Dean: Yes, and it likely comes down to having more data. When you say “multimodal,” you’re instinctively drawn to human modalities like text, images, audio, and video. But I think we really want the model to understand a much richer set of modalities. It should understand interesting scientific data, such as genomic sequences, chemical structures, robotic grasping data, or LiDAR. Exposing the model to even a little bit of this kind of data makes it much better at understanding it when it encounters more of it later.
Logan Kilpatrick: I feel like part of the story of Google DeepMind being able to pull off this model—and the story of its formation—is really about the people. The fact is, you all actually know each other. We were talking off-camera earlier about when you first met and started working together. I’m curious to hear all of your versions of that story.
Jeff Dean: Maybe I can go first, since I think I’ve known everyone the longest. One way to put it is that for many years, I handled a lot of the engineering hiring and recruiting in the very early days of Google. For about three years, I screened every single engineering resume that came into the company. It was amazing. They would just bring me giant stacks of resumes.
Noam Shazeer: It was amazing. They would just bring Jeff a giant stack of resumes, and he’d be like, “No, yes, yes, no, no, no, yes.” He was extremely fast.
Jeff Dean: I don’t think I actually interviewed Noam, but he had already interviewed and received an offer. I think you were debating whether or not to take it, so I called you up in 2000. I said, “Hey, let’s just chat. I want to introduce myself. I really like the kinds of things you’re excited about and working on, and I think you’d really enjoy it here.” And then I finished the phone call.
14:14-16:20
Logan Kilpatrick: Honest question: Were you just “selling” at that point, or was there something specific? I mean, he already had an offer.
Jeff Dean: I was selling. I wanted him to accept the offer.
Noam Shazeer: Yeah, and it worked.
Jeff Dean: He did accept. And then Noam became my office mate for about three and a half years.
Noam Shazeer: Oh yeah. I remember joining and everyone was assigned a mentor to ask questions as a new hire, because there are a million things you don’t know. I would ask my mentor questions, and every single time, he knew the answer. I thought, “Wow, everyone here knows everything.” It turns out Jeff was my mentor, and it was just that Jeff knows everything because he had written half the codebase.
Jeff Dean: Yeah. So, fast forward to 2012, I think. Oriol had interviewed with us. I don’t think I interviewed you personally, but you had an offer and I was trying to convince you because you were considering us and another company. So I called you up and said, “Hey, you should really come here. We’re doing some truly interesting work with deep learning models.”
Oriol Vinyals: We were having an awesome time. We were all on the Google Brain team, wedged into a 30-person office just outside on the main patio of the four main buildings at the Googleplex. Somehow, you managed to convince me to join.
Jeff Dean: Somehow I managed to convince you to come, which was awesome.
Oriol Vinyals: Yeah, I remember there was a lot of back and forth. I was in the final year of my PhD, just writing my thesis. There were no LLMs at the time, so you actually had to write every single word yourself. After a lot of pondering, I joined. I wouldn’t say Jeff was exactly a mentor in the way Noam mentioned, but we started two projects together. One of them was on distillation.
I remember the codebase was quite complex—it was all in C++. Coming from academia, you don’t always know the best way to implement things, but the idea was clear. I literally remember sitting by Jeff’s desk while he coded the classes for distillation, KL divergence, and so on. We didn’t have coding agents back then, but I’d say for a little while, Jeff was essentially acting as the coding agent for the project.
Logan Kilpatrick: That is still a very hard benchmark to beat today.
16:20-18:48
Oriol Vinyals: That project was great because Geoffrey Hinton had done some very early exploration on MNIST, which is a tiny dataset he could run on his laptop. He had some solid ideas about how to transfer knowledge from a larger model into a smaller one. I felt we had to show this working at scale.
So, we trained an ensemble of 50 models on 300 million images, which was a massive amount at the time. We had 50 distinct models; we grouped the categories so one would be an expert on cars, another on wild animals, and so on. Then we distilled that knowledge into a single model. It turned out to be much more accurate than any single model you could have trained on the raw data alone.
By the way, I remember compute was already a constraint back then. But all you had to do was tell Jeff, “Hey, we ran out of CPU,” and he would just go to some internal site, change a number, and suddenly our capacity doubled. We did that a few times.
Jeff Dean: Yeah, I had those superuser powers back then.
Oriol Vinyals: I definitely miss that!
Jeff Dean: Yeah, unfortunately, exponential growth eventually catches up with you.
Koray Kavukcuoglu: Yeah, that’s when it stopped happening. I remember the first time we really sat down and talked was actually during the acquisition discussions for DeepMind.
Koray Kavukcuoglu: You flew to London, and there was this moment where all sorts of discussions were going on. There were a bunch of people in the room, but then Jeff comes up to me and says, “Let’s look at the code.” I was like, “Okay.”
Jeff Dean: So I sat down at the keyboard and said, “Okay, don’t show me anything too sensitive, but I want to see that directory.”
Koray Kavukcuoglu: He starts poking around that directory, and we go inside. He says, “Okay, let’s see this file,” and I’m like, “Okay.” Then I started explaining, “Here’s what we’re doing here, here’s what we’re doing there. This is this idea, and that is that idea.” At the time, it was a big deal for me, right? I’m sitting there with Jeff, explaining the ideas and the code as we walk through it.
Jeff Dean: Our first code review together. Looks good to me!
Logan Kilpatrick: Was it actually like that, Jeff? Were you just pointing at random directories and Koray just happened to know exactly what was happening?
Jeff Dean: Well, we had already seen 15 talks, which was great.
Koray Kavukcuoglu: At the time, I remember I reviewed pretty much all the code at DeepMind. So, I knew pretty much everything that was going on.
Jeff Dean: Yeah. I think the company was only about 55 or 60 people back then. We all flew over to London, hadn’t slept very well that night, and then we went in and sat through 13 consecutive 30-minute talks.
Jeff Dean: Geoffrey Hinton had a bad back at the time, so he was actually lying on the floor in the back of the conference room. We just flattened him out back there.
Logan Kilpatrick: I’ve heard that story!
18:48-21:17
Jeff Dean: Yeah. Towards the end of the day, I thought, “Okay, this seems pretty promising. But let’s actually see the code,” because we had seen some very nice slide decks, but I wanted to see the substance.
Logan Kilpatrick: That’s crazy. We need a movie about this. I feel like this would make a great movie. Actually, there’s another thread I want to pull on.
Logan Kilpatrick: Reflecting back over the last three and a half years—maybe even longer—is there anything that stands out to you as a positive or negative surprise? For example, is there something you wish we had made more progress on, and it’s surprising that we haven’t? Conversely, is there something where we’ve made far more progress than you could have imagined five years ago?
Oriol Vinyals: Maybe I’ll start with a positive one that is very timely for today. I really didn’t expect that we could keep doing what we’ve been doing generation after generation: packing the intelligence of the Pro model back into Flash. It happened with 1.0, and you could argue that since it was the first run, it was fairly suboptimal and we just improved the recipe. But that trend seems to be continuing and even accelerating. Depending on which version you look at, the next generation of Flash outperforms the previous generation of Pro.
Logan Kilpatrick: Even understanding how distillation works, I’m still mesmerized by how we can pack so much intelligence per byte or per parameter. Has distillation fundamentally changed? Are there architectural improvements to the way we do distillation that allow us to keep packing more in, or is the technique relatively the same as what you all originally came up with?
Oriol Vinyals: I would say it’s actually even simpler now. We used to have tricks with softmax temperatures and we had to use an ensemble of models.
Jeff Dean: Don’t tell!
Oriol Vinyals: No, no, I won’t tell.
Jeff Dean: I’m just making sure.
Logan Kilpatrick: He’s going to spill the recipe!
Oriol Vinyals: Basically, you have a really, really good teacher and you have a student. You don’t need an ensemble of 50 teachers anymore; you just have one excellent teacher and one student. You pretty much use the recipe described in the original paper, with just some modest tweaks.
Noam Shazeer: The original paper has some modest tweaks, but the basic spirit of the idea is pretty much the same.
Oriol Vinyals: Wow. Let me give you the most technical explanation: it’s like squeezing a lemon. You squeeze the lemon, the juice comes out—those are the good bits—and you put it in a glass, which is your small model.
Logan Kilpatrick: I like that. Let’s go with it.
21:17-23:43
Oriol Vinyals: You should read the intro of the paper. It has a poetic introduction about larvae and insects.
Logan Kilpatrick: In the original paper, was it just soft labels and—
Noam Shazeer: Yeah, pretty much.
Logan Kilpatrick: Is there anything you’re surprised we haven’t been able to pull off yet, given how much progress Gemini has made across the board over the last three and a half generations?
Koray Kavukcuoglu: I mean, on the positive side, thinking back to—
Oriol Vinyals: It’s also about the beginning of Google, right? We had this “one box” philosophy. Jeff, you must remember the idea of one box for everything. The search box could be used for anything.
Jeff Dean: You’d type in something and it would show you sports scores; type in something else and it would show you stock quotes.
Oriol Vinyals: Right. And on the back end, these were all very separate systems. They were custom-built; some were AI-ish and some weren’t.
Jeff Dean: The “Did you mean?” spelling correction was largely Noam’s starter project, I think.
Noam Shazeer: Oh yeah. Back then, the user would assume there must be some brilliant, general-purpose AI behind the whole thing that knew how to do all these different things. And now, we’ve actually built it. We built the general-purpose AI that—
Logan Kilpatrick: It’s one box.
Noam Shazeer: It is one box.
Oriol Vinyals: It is one box, and it’s one back end. We finally have the back end that matches the front end. We have the right interface because we built the one box.
Logan Kilpatrick: I’m looking for something a bit more critical. People always want more, you know? Is there something you wish we had achieved by now?
Koray Kavukcuoglu: I think that’s hard for us to answer. As researchers, we don’t really operate with a negative mindset. If something doesn’t work, it’s just a learning opportunity that we build upon. From your perspective, what were you expecting to see that hasn’t happened yet? What is your disappointment?
Logan Kilpatrick: That’s a good question. I wouldn’t necessarily call it a disappointment, but—
Noam Shazeer: I’m part engineer and part researcher, and engineers can be more negative.
Koray Kavukcuoglu: Fair enough. I personally felt like we would have made more progress on continual learning and less structured model architectures. Right now, we have models like Mixture-of-Experts, which are all very similar in structure. I always imagined we’d have something with a much more organic, larger-scale architecture by now.
Jeff Dean: I still think that could be interesting, but we aren’t doing that just yet. What we are doing seems to be working quite well.
23:44-26:13
Noam Shazeer: I’m a little disappointed that we haven’t cured every disease yet. You can’t just type in “invent a cure for cancer” and have it happen instantly. But, you know, we’re moving in that direction.
Logan Kilpatrick: I’m curious to get your reaction to this. It isn’t necessarily a negative, but it is surprising to me just how much energy and effort it takes to merge capabilities into a single model. It seems like a very difficult juggling act. When you merge in a new capability, it doesn’t just work perfectly out of the box; you often have to manage trade-offs and make adjustments to compensate for losses elsewhere.
Logan Kilpatrick: We are seeing changes to try and bridge those gaps, but it isn’t entirely intuitive to me yet.
Koray Kavukcuoglu: From my point of view, one thing that amazes me about these models is that they still have an insane amount of capacity. We keep packing more and more into them. If you think about it, current models aren’t actually that much bigger than the ones we had three or four years ago, yet we continue to pack in significantly more capability and information.
The fact that we can do that suggests there is still so much room left in these models. To me, that is the most exciting part. In terms of algorithmic AI development, I truly believe these models have much more capacity than what we are currently getting out of them. There are going to be major innovations that will enable us to do a lot more with the existing architecture.
Jeff Dean: Yeah, and part of that involves coming up with algorithmic improvements that simply extract more value from the system.
Noam Shazeer: Exactly—getting more out of every piece of data, every example, and every token the model sees. If you look at the efficiency of human learning, it’s a thousand times better than current LLM learning. An LLM has to see a thousand times as much data as a highly capable human just to reach roughly similar capabilities—perhaps slightly better in some areas and worse in others. If we could make it so the model extracts a thousand times more information out of every example, that would be amazing. A human might hear a billion words in a lifetime, whereas a model is trained on trillions.
Oriol Vinyals: Trillions.
Noam Shazeer: Trillions and trillions of words, and it has to remember them all.
26:13-28:26
Koray Kavukcuoglu: Do you disagree a bit, though? We are “pre-trained” through evolution; it’s not like you’re the first human to ever exist. There are certainly arguments to be made on that side of things as well.
Noam Shazeer: There are some arguments about that, but the source code is actually quite small.
Jeff Dean: We have gigabytes of source code!
Logan Kilpatrick: That leads into one of my questions. This is exactly why you don’t want this conversation happening!
Oriol Vinyals: I have a tough one regarding what has been difficult. I think evaluation is incredibly hard. It’s been somewhat underappreciated in the community, even going back to the academic era Koray was mentioning.
Oriol Vinyals: Evaluating capabilities in isolation, predicting the next big breakthroughs, and figuring out how to evaluate in a way that avoids data leakage is a massive challenge. We need metrics that users actually agree with. There’s been a lot of work and progress, but it’s been surprisingly difficult. Perhaps that’s because we transitioned from looking at tables of numbers in research papers to dealing with real users and their feedback. It’s been both surprising and exciting because every time you encounter a difficult problem, it motivates you to fix it. But evaluation is definitely an area that needs to keep improving.
Jeff Dean: That’s a great point. The dream of every AI researcher has always been to build systems that can generalize to things they’ve never encountered before. Even when you’re training specific models for particular tasks, you want them to generalize to new examples. What we’re trying to do now is generalize to literally anything anyone might ask, which is a very hard problem. By having a large user base, you get a lot of feedback that tells you, “Okay, we’re generalizing well on these types of problems, but we’re falling short on these others.”
Logan Kilpatrick: One of the more controversial questions I have for you all is this: you’ve obviously worked together for a long time in various capacities, but what are some research topics that you still don’t all agree on? I want to preface this by saying I think disagreement is a positive thing. There is a certain beauty in having people with different perspectives.
Logan Kilpatrick: One of the benefits of having people with different perspectives is that there is bound to be disagreement, which leads us to try different things. I’m curious if any specific examples of that come to mind.
Koray Kavukcuoglu: I’m trying to think.
Logan Kilpatrick: Or perhaps you all just agree?
28:26-30:50
Koray Kavukcuoglu: I don’t think we agree on everything, but I don’t think there have been any major disagreements. In the grand scheme of Gemini’s design, this group has experimented with all sorts of things. We’ve built a lot of our ideas through that experimentation. I know Jeff has always had this vision of building something a bit more flexible, with more plasticity and fluidity. We haven’t quite reached that point yet, but it’s not that we disagreed on the goal. It’s just that our current systems have empirically shown us that the model we are currently building is the right path forward. Otherwise, I don’t recall any significant conflicts.
Jeff Dean: At any given time, each of us is focusing our efforts on one or two particular areas that the others might not be spending as much time on. For example, I’m spending a lot of time thinking about what future inference hardware should look like, because I believe that’s a critical capability for us to have.
Jeff Dean: You might not be spending as much time on that, but when I describe it to you in the kitchen, you’re like, “Oh yeah, that sounds great. When can we have it?”
Noam Shazeer: Reality is a great way to get people to agree. You look at the experimental results and see what works and what doesn’t.
Jeff Dean: In general, Gemini is very data-driven. Many people run small-scale experiments and present the results. If they look promising, we might suggest combining them with something else. You have to use your pool of research compute as effectively as possible, and being data-driven is the best way to do that.
Oriol Vinyals: I think with something like Gemini, or AI in general, there are many pieces that have to line up.
Koray Kavukcuoglu: When you think about Gemini, or AI in general, it pulls in so many different elements—from hardware and model design to product development and beyond. I believe having this specific group working together is one of the most important factors in our success.
As Jeff mentioned, he focuses on hardware, while Noam focuses on models. Oriol has also been focusing on models but is now going very deep into agents and doing some truly profound work there. For my part, I try to focus on our overall direction with Gemini—ensuring we are integrating well with our products, delivering a great user experience, and operating efficiently.
We all work together to manage these different, critical areas because we are in the midst of a massive technological transformation. Having people who are deeply specialized yet collaborative is essential.
30:53-33:22
Jeff Dean: I think having people who are deeply considering different aspects of this technological transformation is what makes it work.
Logan Kilpatrick: I love it. We should make some predictions just so we have something to be wrong about when we reflect on this conversation a year from now. Obviously, there has been a huge amount of progress and many exciting things coming out of this year’s I/O. If we were sitting here in 2027—which sounds like a made-up year, but it’s just around the corner—what would we be seeing?
Noam Shazeer: The “made-up” year.
Logan Kilpatrick: I mean, 2027 just doesn’t seem real. It feels so far in the future, yet it’s practically six months away.
Koray Kavukcuoglu: I’m going to be 50.
Logan Kilpatrick: Wow. Well, happy early 50th birthday! We’ll definitely be celebrating that. Looking toward I/O 2027, any thoughts?
Logan Kilpatrick: We’ll be celebrating Google I/O 2027 by then. Do you have any predictions or things you’re hopeful will land by that time, perhaps from a model capability perspective?
Oriol Vinyals: Let’s try to predict I/O 2027. What will we be announcing?
Noam Shazeer: No, let’s not do that.
Jeff Dean: Let’s do it!
Logan Kilpatrick: Even just directionally—given where we are now, we’ve made a huge amount of progress in coding. Will we be at a saturation point by then, or will we still be spending as much time focused on it? The same goes for agents. Given the exponential growth we’re seeing across these different capabilities, where do you see us heading?
Koray Kavukcuoglu: Something that might be happening in a year’s time is self-learning.
Logan Kilpatrick: Is self-learning the same as continual learning, or is it something different?
Koray Kavukcuoglu: I think they’re related. For some, they might be the same, but we are now in an era where models are becoming much more agentic. They are already very good at writing code, and we use them in our own research. I believe we will gradually rely on them more and more. Eventually, at least at the experimental level, we’ll start relying on these models to improve various components of Gemini itself. My prediction is that by next year, we will definitely be on that path and likely talking about the results. Let’s see.
Jeff Dean: We’ll probably be able to point to some very significant breakthroughs in our models.
Jeff Dean: I think we’ll see significant components of our models being generated by the models themselves and by agents working on self-improvement, all under the guidance of our researchers.
Logan Kilpatrick: Right. Instead of suggesting to one of your team members, “Hey, why don’t you experiment with this and let me know how it’s going next week?” we’ll be telling the model to do that instead.
33:22-35:46
Koray Kavukcuoglu: It’s hard to disagree with that. To build on the idea of continual learning as a capability, I’m looking forward to a model being able to improve through its experiences and interactions without necessarily needing to update its weights—some sort of knowledge base update that works seamlessly. We have examples of this working now, but I don’t think the capability has hit that steep part of the curve yet where it’s so good that it becomes the obvious choice for everyone to use and turn on.
Jeff Dean: That is a feature I’m hopeful we’ll see implemented in the model. A one-year timeline seems possible.
Logan Kilpatrick: There are definitely a lot of interesting, quirky problems to solve there. I see examples all the time in the current era where you ask a model a question, and it pulls in some random personal context—like a friend’s birthday party—that is completely unrelated to the topic at hand. It feels like we need another year to iron that out.
Oriol Vinyals: We’re in a bit of a tech bubble since we’re so deep into the research side of this. From your perspective, since you’re much more plugged into the real world than we are, what do you want to see? What are your expectations?
Jeff Dean: What do you expect?
Logan Kilpatrick: That’s a good question. This isn’t supposed to be an “Interview Logan” episode, but—
Jeff Dean: Maybe we should have one of those.
Logan Kilpatrick: No, you don’t want to hear what I have to say. To me, the model is the product—that’s all I have to say. I just want the models to keep getting better.
I think the long-running autonomous tasks will be really interesting to watch because that feels like a frontier we can easily track. Even if coding models get 20% better tomorrow and become truly excellent, I think you’ll still run into limitations regarding how long you want a model to run autonomously. It feels like by Google I/O 2027, if we’re able to say, “This model has been running for 30 days straight leading up to the event,” that would be truly surprising and impressive.
Noam Shazeer: I think it would be really surprising to a lot of people—and maybe we shouldn’t say this yet—but it’s something to shoot for. The sheer quantity of work being done independently by the model would be the real breakthrough.
Oriol Vinyals: It would be surprising. I think it actually takes the full stack to pull that off. You’re going to need memory systems, continual learning, and better hardware because letting something run for 30 days is going to cost a zillion tokens.
35:46-38:16
Jeff Dean: Well, you also want that better hardware to have low latency. If the model could finish the task in one day instead of 30, you’d be much happier. “30 days” makes for a good marketing line, but personally, I’d prefer it to be faster.
Noam Shazeer: I’d be happy to share another prediction. It’s not necessarily a prediction for upcoming announcements, but I think these agents are going to highlight the fact that all of our current tools are far too slow.
Jeff Dean: Yeah, exactly. A lot of the tools these agents rely on will become the bottleneck. Even if you make the model infinitely fast, you’ll be limited in how much you can actually speed up real-world work because these tools were designed for human-level latency.
Oriol Vinyals: Or designed for a human frequency of work, right?
Noam Shazeer: Exactly. If a task takes 30 days, 29 and a half of those days are currently spent just waiting.
Logan Kilpatrick: That applies to everything. I have another somewhat meta, perhaps controversial question. Koray, I’m particularly interested in your perspective from a research standpoint.
Logan Kilpatrick: I’m curious about this. I asked Josh the other day: five years from now, will Google have three products or 10,000 products? Which seems more plausible to you?
Noam Shazeer: We’ll have one.
Logan Kilpatrick: Only one product?
Noam Shazeer: Yeah, the model.
Logan Kilpatrick: Okay, I like that answer. What do the rest of you think?
Jeff Dean: I think if you have an incredibly capable model, it can do many, many things. You saw in the Search demos today at I/O that it can even create little apps inside Search that are customized for you, handle visualizations, and write code. In a sense, I don’t know if that counts as one product, 10,000 products, or 10 million products if you consider the number of users.
Koray Kavukcuoglu: On a serious note, I believe people want to consume information in different ways. Something like Search is fundamental. Five years from now, we will definitely still have Search, perhaps with a much more “magical” interface, but the core idea remains: people want to access and consume information for themselves. That learning activity is fundamental. I think it’s going to persist, and we’ll likely see many more products emerge because the underlying intelligence makes them so much easier to build.
Jeff Dean: Yeah, I agree. There are many different product outlets, but only a small number of core technologies that actually make those products amazing.
38:16-40:32
Logan Kilpatrick: So, if you think about the—
Jeff Dean: If you look at the glasses demonstrated at I/O, that’s a specific product. It’s going to continue to improve because the models are getting better; they understand audio more clearly and can speak to you more naturally. However, that remains a distinct product from something like Search.
Logan Kilpatrick: Exactly. Right.
Koray Kavukcuoglu: It’s clear to us that there is one underlying model powering these experiences. I’m not an expert on the product side, but as a user, I often feel like I’m making an active choice about what I want to do with a digital device—whether that’s checking my calendar, sending an email, or buying something. Maintaining those divisions might be more about human factors and how we interact with tools, rather than the technology being incapable of presenting everything within a single product.
Oriol Vinyals: We could present all of these capabilities in a single product, but I feel like the choice of what I want to focus on is important. Whether that need for focus eventually goes away or we just evolve out of it, I’m not sure. For now, I find myself liking the separation of concerns. I wouldn’t bet on a single, unified product at this time—at least not for my own use.
Noam Shazeer: I suppose we’ve been discussing informational products—tools that deliver information. In that context, you have to consider how humans actually want to consume that data. Is it visual? Is it text? Is it through smart glasses? Or is it some kind of brain-computer interface where you receive the model’s internal embeddings directly into your neurons? It sounds wild, but the interface could change a lot.
Jeff Dean: Vector processing.
Noam Shazeer: Exactly, but powered by the kinds of things we are building.
Oriol Vinyals: We’ll be powered by things like Omni. Perhaps in the future, we’ll even get into physical products and start moving atoms instead of just bits. But that is a prediction for the far future.
Logan Kilpatrick: I love it. Moving atoms, not just bits, is the future. Let’s do one more quick round on what you all are building or doing. I’m curious—aside from all the AI coding work—if there is anything interesting you’re doing personally. It doesn’t have to be Gemini-specific or even involve code. Are you working with physical atoms in the real world? Painting, carpentry, woodworking, or anything like that? Jeff, you go first.
Jeff Dean: I’ve really been enjoying some of the consumer-facing products we’re putting out now that they’ve become so much more capable. For example, I made a cute Mother’s Day card.
Jeff Dean: The models are much more capable now. I actually made a cute little Mother’s Day card for my daughter, who just had her first baby. That was a lot of fun.
Logan Kilpatrick: I love building Mother’s Day cards.
40:32-41:51
Koray Kavukcuoglu: As you all know, we recently made the decision to move, which means a new house. A new house comes with all sorts of things you need to fix, learn, and adapt to. These days, my DIY projects range from home automation to fixing things with a hammer and nails. I really enjoy that spectrum; I like being able to do hands-on things.
Noam Shazeer: I love that, but personally, I’m just trying to make the model smarter.
Oriol Vinyals: And building some new model architectures.
Jeff Dean: Yeah, I’ve been trying to build a knowledge base out of a ton of research that I couldn’t possibly process before because we were too busy building. Now I’m creating a brainstorming partner to help think through the next big ideas.
Noam Shazeer: We were too busy building, but now we can create a brainstorming partner to help us figure out what the next big things might be.
Logan Kilpatrick: I love it. That’s awesome. Well, thank you to all four of you for taking the time to sit down. We had plenty of controversial answers, but it was wonderful and a lot of fun.
I made a comment last year at I/O during a conversation—I think I said this to you, Koray—that when we bring people together and launch this technology, you can really feel the warmth of humanity in what we’re building together. This conversation made me feel that same way, so I truly appreciate it. It was wonderful to sit down and talk.
Thank you all for listening and for watching this episode of Release Notes. We’ll see you in the next one.
Made with: The Transcript Desk Chrome Extension

