51. AGI Safety and Alignment with Robert Miles

Ben Byford[00:00:08] Hi and welcome to the Machine Ethics podcast. This month, episode 51, recorded in early January, we’re talking to Rob Miles, communicator of science, machine learning, computing and AI alignment. We chat about why we would even want general artificial intelligence; general AI as narrow AI where its input is the world; making predictions that sound like science fiction; we elucidate terms like “AI safety”, “the control problem”, “AI alignment” and “specification problem”; the lack of people working in AI alignment; the fact that AGI doesn’t need to be conscious; and how not to make an AGI.

Ben Byford[00:00:47] If you’d like to find more episodes like this, go to Machine-Ethics.net, or you can contact us at hello@Machine-ethics.net, you can follow us on Twitter Machine_Ethics, on Instagram at Machine Ethics podcast. And you can support the podcast on patreon.com/machineethics. Thanks again for listening and hope you enjoy!

Ben Byford[00:01:11] Hi Rob, thanks for joining me on the podcast. It’s really great to have you, and if you could just give yourself a quick introduction. Who you are and what do you do?

Robert Miles[00:01:21] Oh, yeah, okay. My name is Rob Miles, I am, I guess a science communicator, populariser, I don’t know. I spend my time explaining AI safety and AI alignment on the internet. Mostly on YouTube. I also now run a little podcast, which is just an audio version of a newsletter run by Rohin Shah and me, Alignment newsletter, which is the week’s research in AI safety. But most of my time is spent on the YouTube channel, which is aimed a bit more at the general public.

Ben Byford[00:01:59] That’s awesome. Thanks very much. So, the first question we have on the podcast generally is a quite open-ended one. What is AI?

Robert Miles[00:02:10] What is AI? So I would say AI is a moving target. There’s a video where I talk about this question, actually. I think it’s like “technology”. If somebody says, “I’m not good with technology,” they’re probably not talking about a bicycle, or a pair of scissors, or something, right. These are all technology, but once something really works reliably, and becomes a deeply ingrained part of our life that we don’t think about, we stop calling it technology. We reserve “technology” as a word for stuff that’s on the edge – stuff that’s still not actually very good, stuff that’s not reliable, stuff that often breaks, and we have to think about the fact that it’s technology. Or stuff that we don’t really understand very well yet. These are the things that get labelled technology, generally speaking. So it’s mostly electronics at this point.

I think AI is a bit like that as well. AI is things that computers can do that they didn’t used to be able to do. There was a time when figuring out a good way to schedule the flights for your airline, to make sure that all the pilots and planes got to where they needed to be logistically, where that was like “Artificial Intelligence”. And we wouldn’t call it that these days, I don’t think, because the technology is many decades old now, and it works well. But if we were to try and do that with a neural network then we’d call it AI again, because it’s buggy and unreliable, and new. So yeah, I think the origin of the term “Artificial Intelligence” is a bit like the difference between a “robot” and a “machine”. A robot is a machine that’s designed to do something that a person does, and once we stop thinking that this is a task for a person to do we tend to stop calling things robots in the same way. It’s not “robot crop harvesting”, it’s a combine harvester, it’s just a machine at that point. So I think AI is about getting computers to do things that we previously thought were the domain of human minds.

Ben Byford[00:04:41] Yep. Like science education.

Robert Miles[00:04:42] Right. Yeah, sure. If you could get a computer to do that, it’s for sure AI.

Ben Byford[00:04:48] Yeah, exactly. Cool, so with that foundation, somewhat, we have these terms that we throw away. I’m trying to get us to the point where we can talk about AI safety and stuff. So, we have this idea that we have machine learning techniques, and kind of old school AI – like you were saying – different techniques that have just become part of our world now, essentially. Some of those things we categorise as “simple AI” or “constrained AI”, or AI that is just good at one thing. Then we have this more broad idea about general AI, or superintelligence, or artificial intelligences that are maybe programmes that can do more than one thing, can learn a broad range of stuff. So I guess the question is, why would we want that? It’s an interesting question to answer before we then dive into what could go wrong. So, we’re talking about superintelligence, general AI, why would we want to do this?

Robert Miles[00:05:58] Yeah, there is this very important distinction between what I would call “narrow AI” and what I would call “general AI”. Although it’s not really a distinction, so much as a spectrum. There’s a question of how wide a range of domains can something act in. So you take something like Stockfish, it’s a narrow AI, it’s only domain is chess. All it can do is play chess and it plays it very well. But then you could take something like AlphaZero, which is one algorithm that can be trained just as well to play chess or Go or shogi, but that’s more general. And now there’s MuZero, which is more general than that, because it plays chess, Go, shogi and also Atari games to a superhuman standard.

So, why do we want generality? Because there are a lot of tasks that we want done in the world, which require generality. You could also think of a general AI system as being a narrow AI system whose domain of mastery is physical reality. Is the world. That’s just a very, very complex domain, and the broader the domain the more complex the system is. So I sometimes talk to people who say that AGI is impossible, and there’s a sense in which that is actually true, in that perfect generality is not possible, because that would have to work in every possible environment, and there’s infinite possible environments. So, for example, one environment is a maximum entropy environment, where it’s complete chaos, there’s no structure to the universe, and therefore you can’t achieve anything because there’s no correlation between the things you do and the outcomes that happen. But if you consider only our universe, it is very optimisable, it is very amenable to agency. It actually, as far as we can tell, it probably runs on fairly straightforward mathematics. It’s probably fairly low complexity. In the sense of [08:33 …] complexity or something like that. It’s quite structured, it’s quite regular, it has causality. Induction works quite well, so we don’t actually have to create a fully general agent, we just need to create an agent which is able to act well in our universe. And that is general enough for what we need, which is an agent that’s going to do things in the real world.

The reason that we want that is that there are a lot of tasks that need that level of generality. If you want an agent to run your company, that is fundamentally a very broad thing to do. You need to be able to read and understand all kinds of different information, you need to be able to build detailed models of the world, you need to be able to think about human behaviour and make predictions. It would be pretty difficult to train something that was a really effective CEO, perhaps a superhuman CEO, without it having all of the other capabilities that humans have. You would expect – and there’s a chance that there’s a threshold that it gets past – I don’t know if this is true, this is speculative. But for example, if you have a very simple programming language, and you start adding features to it, you – past a certain point – you hit a point where your programming language is able to express any program that can be expressed in any programming language. You hit Turing completeness. Once you have a certain level of capability, in principle, you have everything. Maybe not quickly. Maybe not effectively. But you’ve created a general purpose programming language, and it’s possible that you get a similar effect. I don’t know if this is true, but the easiest way to make something to that can do almost anything is just to make something that can do everything.

Ben Byford[00:10:35] Right. I guess the practical use of “everything” in that sense, is that we have something that we can – in a Deep Mind sort of way – solve big problems. So we can get rid of diseases, we can curing ageing, we can avert disasters, all that kind of thing.

Robert Miles[00:11:01] Right. If you have a very powerful system that is able to – if you have a general intelligence that is able to act in the real world, then in principle you can set whatever types of goals you want, and expect to get solutions to them if solutions exist. And that’s kind of the end game.

Ben Byford[00:10:26] That’s like, I mean I want to say “science fiction”, but that’s what we’re moving towards. That’s what people are trying to research and create is this more generalised AI, hopefully with this really quite powerful idea behind it that we can solve or get answers for some of these problems.

Robert Miles[00:11:46] Yeah. It’s a pretty utopian ideal. The thing is, yeah – I want to address the science fiction thing, because it’s something that I often come up against. Which is, I think – I’m trying to formally express what the mistake is, but I’m not going to. I think it’s related to confusing A implies B with B implies A. Which is to say most science fiction predictions about the future don’t come true, but nonetheless, every significant prediction about the future that has come true has sounded like science fiction, because you’re talking about technology that doesn’t exist yet. You’re speculating about how things might be drastically different because of the effect of technology. That’s going to sound like science fiction, and so the fact that it sounds like science fiction doesn’t make it less likely to be true, unfortunately. It would be convenient if you could evaluate the truth of a complex claim by categorising it into a literary genre, but it’s not that easy. You have to actually think about the claim itself and the world and the technology, and run it through, and think about what’s actually likely to happen. Because whatever actually does happen, we can be confident would seem like science fiction from our perspective.

Ben Byford[00:13:20] Mm hmm, and definitely as it’s happening, as well. In hindsight, it probably feels less like science fiction, because it’s normalised.

Robert Miles[00:13:28] Yeah. Can you imagine going to someone 50 or 100 years ago and saying , “Hey, so these things that are just adding machines, these computers, they’re going to get way, way better. Everyone is going to have one a million times more powerful than all of the computers on earth right now in their pocket, and they’re all going to be able to talk to each other at insane speeds all the time.” Or you know what, maybe don’t give the context. Maybe just say, “Hey, in the future basically everyone on earth is going to have access to infinite, free pornography.” Then just see, how likely does that sound? Does it sound likely? It turns out it’s one of the things that happened. The future is definitely going to be weird. So there’s no way to get around just really trying to figure out what will happen, and if your best guess of what would happen seems weird then that’s not a reason to reject it, because the future always seems weird.

Ben Byford[00:14:34] Cool. So, given that we’ve painted so far – other than the weirdness of the situation – we’ve painted quite a nice view on what could happen, or what is positive within this area. You spend a lot of time thinking about these terms which I’m going to throw out, because I’d like to get consolidation of things like “AI safety”, “AI alignment”, “specification problem”, “control problem”, and it would be really nice if you could give me an overview of this area, and in what way these things are similar or equivalent, or not at all.

Robert Miles[00:15:11] Right. Okay, that’s actually a really, really hard problem, because there is not really widespread agreement on a lot of the terms. Various people are using the same terms in different ways. So broadly speaking, AI safety is – I consider AI safety to be – a very broad category. That’s just about the ways in which AI systems might cause problems or be dangerous, and what we can do to prevent or mitigate that. And I would call that, by analogy, if there was something called “nuclear safety”, and that runs the gamut. So if you work in a lab with nuclear material, how are you going to be safe, and avoid getting long term radiation poisoning, like Marie Curie? Then you have things like accidents that can happen, things like the demon core during the Manhattan Project, they had this terrible accident that was an extremely unsafe experiment that dumped a huge amount of radiation out very quickly and killed a lot of researchers, which is a different class of thing to the long term exposure risk. Then you also have things like, in some of the applications, if you have a power plant then there’s a risk that could melt down, and that’s like one type of risk, but there’s also the problems associated with disposing of nuclear waste, and like, how does that work? After that you have all of the questions of nuclear weapons, and how do you defend against them, how do you avoid proliferation? And these types of broader questions.

AI safety is kind of like that, I think, in that it covers this whole range of things that includes your everyday things, like are we going to have invasions of privacy? Are these things going to be fair from a race, gender and so on perspective? And then things like, is my self-driving car going to drive safely? How is it going to make its decisions? Who is legally responsible? All of those kinds of questions are all kind of still under the umbrella of AI safety. But the stuff that I’m most interested in – well actually, let me divide up safety into four quadrants along two axes. You have near and long term, and you have accident and misuse. So in your near term accident is going to be things like, are self-driving cars safe? Your near term misuse is, how are corporations using our data and that kind of thing? Long term misuse I actually think right now is not really an issue. So when you say short term and long term, you can also think of that as narrow and general. I’m most interested in the long term accident risk, because I think that our current understanding is such that it almost doesn’t matter, getting the right people to use AGI, or what they try to get the AGI to do. I think that currently our understanding of AGI is such that we couldn’t control it anyway, and so it sort of doesn’t matter, just getting powerful AGI systems to do what anyone wants them to do is the main thing that I’m interested in. So that’s the sub-part.

Let’s do some other terms. The control problem is a slightly older term, I think, that’s about if you have an AGI, how do you control it? How do you keep it under control? I don’t really like that framing, because it suggests that that’s possible. It suggests that if you have a superintelligence which doesn’t fundamentally want what humans want, that there might be some way to control it. And that feels like a losing strategy to me. So, I prefer to think of it as the alignment problem, which is, how do you ensure that any system you build, its goals align with your goals? With the goals of humanity. So that then it doesn’t need to be controlled, because it actually wants to help. So you don’t control it, you just express your preferences to it. It’s only a very slight shift in framing, but I think it changes the way that you think about the problem.

Ben Byford[00:20:21] Yup, is that because – I’ve done a bit of reading here – and it seems intractable that one can control a system which is, we say superintelligent, but is vastly more intelligent than we are, given that we are the baseline for this framing of intelligence. So on a general person’s intelligence, it’s going to be much, much more intelligent than that. It can do things that implies that we won’t actually be able to control it. Like you’re saying, it doesn’t really matter who creates such a thing, because they themselves won’t be intelligent enough – or don’t have the practical tools – to contain such a thing.

Robert Miles[00:21:03] Right, and this doesn’t need to be a really strong claim, actually. You could try and make the claim that if the thing is drastically superintelligent then it’s impossible to control it. I would prefer to make the claim that if the thing is superintelligent, even without needing that claim, you could just say, “This seems really hard, and it would be nice not to have to try.” It’s not so much that we’re certain that we won’t be able to control it, but we really can’t be certain that we would be able to control it, and we do want a high level of confidence for this sort of thing, because the stakes are very high. Any approach that relies on us containing or outwitting an AGI, it’s not guaranteed to fail, but it’s so far from guaranteed to succeed that I’m not interested in that approach.

Ben Byford[00:22:00] So, I find this quite interesting, because there’s an implicit thing going on here in all of this stuff, that there is a – this AGI system has something that it wants to optimise and it’s going to do it in a runaway sort of way. Or it has some sort of survival thing inbuilt into it. And whether that’s to do with some concept of consciousness or not, that doesn’t really matter, but it has this drive all of its own, because otherwise it would be just idle. You know? We’re conflating intelligence with something like survival or some kind of optimisation problem that we’ve started out on. Is there something coalescing these sorts of ideas?

Robert Miles[00:23:46] Yeah. So the concept is agency. The thing is if we build an agent, this is a common type of AI system that we build right now. Usually we build them for playing games, but you can have agents in all kinds of contexts. And an agent is just a thing that has a goal, and it chooses its actions to further that goal.

Simplistic example of an agent: something like a thermostat. Modelling it as an agent is kind of pointless, because you can just model it as a physical or electronic system and get the same understanding, but it’s like the simplest thing where the idea applies. If you think of a thermostat as having a goal of having the room at a particular temperature, and it has actions like turning on the heating, turning on the air conditioning, whatever. It’s trying to achieve that goal and if you perturb this system, well, it will fight you, in a sense. If you try to make the room warmer than it should be, it will turn on the air conditioning. Or something that makes more sense to talk about is something like a chess AI. It has a goal. If it’s playing white then it’s goal is for the black king to be in check – in checkmate, rather – and it chooses its actions in the form of moves on the board to achieve that goal.

Ben Byford[00:24:04] Yup.

Robert Miles[00:24:04] So, a lot of problems in the real world are best modelled as this type of problem. You have an agent, you have a goal in the world – some utility function or something like that – you’re choosing your actions, which is maybe sending packets across a network, or sending motor controls to some form of actuator, whatever. And you’re choosing which actions to send in order to achieve that particular goal. Once you have that framework in place, which is the dominant framework for thinking about this kind of thing, you then have a lot of problems. Mostly being that you have to get the right goal. This is the alignment thing. You have to make sure that the thing it’s trying to achieve is the thing that you want it to achieve, because if it’s smarter than you it’s probably going to achieve it.

Ben Byford[00:25:03] Yeah, so you have to be very sure that that goal is well specified, and that’s part of this specification thing. Or whether that is even possible. Is it even possible to set a well-formed goal that doesn’t have the potential to be manipulated or interpreted in different ways?

Robert Miles[00:25:26] Yeah, yeah. The thing is, this is the specification problem. It’s not so much that the goal is going to be manipulated, as that the thing you said is not the thing you meant. Anything that we know how to specify really well is something which if actually actionised would not be what we want. You can take your really obvious things like human happiness. Maybe we could specify human happiness, but the world in which humans are the most happy is probably not actually the world that we want. Plausibly that looks like us all hooked up to a heroin drip or some kind of experience machines hooking directly into our brains, giving us the maximally happy experience, or something like that, right? You take something that is locally a good thing to optimise for, but once you optimise hard for that, you end up somewhere you don’t want to be. This is a variation on Goodhart’s Law, that when a measure becomes a target, it stops being a good measure. It’s like that taken to the extreme.

Ben Byford[00:26:40] So, given all this stuff, is there a sense that there is a winning direction, or is there something that’s like, this is the best option for alignment at the moment? The way that we can, if we had an AGI here today, that we would probably try first.

Robert Miles[00:27:06] Yeah, there are a few different approaches. Nothing right now is ready, I wouldn’t say, but there are some very promising ideas. So first off, almost everyone is agreed that putting the goal in and then hitting go is not a winning strategy. You need, firstly because human values are very complicated, anything that you can simply specify is probably not going to capture the complexity, the variety of what humans care about. And usually the way that we do that in machine learning, when we have a complex fuzzy thing that we don’t know how to specify, is that we learn that thing from data. So that’s one thing, value learning, effectively. How do you get an AI system that learns what humans care about?

This is hard, because we have these various approaches for learning what an agent cares about, and they tend to make fairly strong assumptions about the agent. You observe what the agent does, and then you say, “Well, what does it look like they were trying to do? What were they trying to achieve when they were doing all this?” This works best when the agent is rational, because then you can just say, “Well, suppose a person was trying to achieve this, what would they do?” and then, “Well, did they do that?” Whereas humans have this problem where sometimes they make mistakes. We sometimes choose actions that aren’t the best actions for our goals, and so then you have this problem of separating out, “Does this person really value doing this weird trick where they fly off the bike and land on their face, or were they not trying to do that?”

Ben Byford[00:29:01] Mm hmm, yup. Is there – sorry to interject – is there a category of things where this isn’t the case? So if we worked on problems that weren’t innately human, so the goal was set to understand weather patterns enough – this is already getting badly described. Your goal, okay, Rob-3000 is to predict the weather accurately for tomorrow. Go. That would be the thing to optimise for. That seems to me like something that doesn’t have so much human involvement in there, or is it going to trickle in somewhere anyway?

Robert Miles[00:29:49] But also that feels like it doesn’t – you can do that very well with a narrow system. You don’t really need AGI for that task. And if you set AGI that task then, well that’s apocalyptic, probably. In a few different ways. Firstly because you can do that job better if you have more sensors, so any square inch of the planet that isn’t sensors is a waste from that agent’s perspective, if that’s its only goal. Secondly, humans are very unpredictable, so if it’s optimising in a long term way – if it’s myopic and it’s only trying to do tomorrow at any given time, we might be okay. But if it cares about its long term aggregated rewards, then making the weather more predictable becomes something that it wants to do, and that’s not good for us either.

Ben Byford[00:30:42] So I feel like that leads into these other ideas, I was going to ask about if you’d seen Human Compatible by the eminent Dr Stuart Russell, and he has this idea about ambiguity. So you don’t necessarily have optimise the weather – no, not optimise the weather, but tell me what the weather’s going to be like tomorrow, but don’t harm humans. And also don’t – you know, he doesn’t have this hierarchy of things to fulfil, it has this ambiguity towards a given goal, which it’s constantly checking for, so the feedback is never going to get 100% accurate, and it’s always going to need to ask questions, and like you were saying it’s going to constantly need to reaffirm its model of the world, and with humans in it I suppose, what are the things that humans are going to want to do?

Robert Miles[00:31:42] Yeah. This is a much more promising family of approaches, where you don’t try and just learn – you don’t just learn up front. The first thing that doesn’t work is specifying the goal up front. Then okay, maybe we can learn. Maybe we can look at a bunch of humans later and then learn what the goal should be and then go. That also has problems, because it’s kind of brittle. If you get it slightly wrong you can have giant problems. Also, you have a huge distribution shift problem, where if you train the system on everything you know about humans right now, and then you have a world with an AGI in it, that’s quite a different world. The world starts changing, so then you have this classic problem that you always have with machine learning, where the deployment distribution is different from the training distribution.

So, some kind of online thing seems necessary, where the system is continually learning from humans what it should be doing. There’s a whole bunch of approaches in this category, and having the system start off complexly uncertain what it’s goal should be, knowing that its goal is in the minds of the humans, and that the actions of the humans are information, are data that gives it information about its actual goal, seems fairly promising.

Ben Byford[00:33:26] Good. I like it. I felt like when I was reading that it had this really great bit, which was, “Oh, and I’m sure it will be all kind of ethical. We’ll just work that bit out.” Because obviously this is the stuff that I care about. The fact that these things are uncertain doesn’t imply that it will be an ethical AI or AGI. Because obviously you’re learning from people and people can have a spectrum of values, and they can do stupid things that hurt people, or put people at disadvantage. So, I think when we’re looking at that specific example, it’s interesting that it’s solved this runaway optimisation issue, but it doesn’t necessarily solve that the agent will actually do stuff that is in people’s general benefit, it might do something that is in a person’s benefit, possibly. There’s other issues that come up.

Robert Miles[00:34:35] Exactly. So this is another way of splitting the question. Which is, are you – for the alignment problem, the general form of the alignment problem is you have a single artificial agent and a single human agent and it’s just like a straightforward principle/agent problem. You’re trying to get the artificial agent to go along with the human. This is hard. In reality, what we have is we might have a single artificial agent trying to be aligned with a multitude of human agents, and that’s much harder. You can model humanity as a whole, as a single agent, if you want, but that introduces some problems, obviously.

You might also have multiple artificial agents, and that depends on what you think about take off speeds and things like that – how you model future-going. I assign a fairly high probability to there being one artificial agent which becomes so far ahead of everything else that it has a decisive strategic advantage over the other ones, whatever else there is doesn’t matter so much. But that’s by no means certain. We could definitely have a situation where there are a bunch of artificial agents that are all interacting with each other as well as with humanity, and it gets much more complicated.

The reason that I am most interested in focusing on the one-to-one case, is because I consider it not solved, and I think it might be strictly easier. So I find it hard to imagine – and I don’t know, let’s not be limited by my imagination – I find it hard to imagine a situation where we can solve one-to-many without first having solved one-to-one. If you can’t figure out the values of a human in the room with you, then you probably can’t figure out the values of humanity as a whole. Probably. That just feels like a harder problem. I’m aware that solving this problem doesn’t solve the whole thing, but I think it’s like a necessary first step.

Ben Byford[00:37:03] You’re not skulking away from the bigger issues here, come on Rob. You need to sort it out.

Robert Miles[00:37:08] Yeah, I mean there’s no reason to expect it to be easy.

Ben Byford[00:37:14] No, definitely not. I don’t think – I mean, there’s a lot of stuff in this area where there isn’t necessarily consensus, and it’s still very much a burgeoning area, where they’re – I think someone said there’s simply not enough people working in this area at the moment.

Robert Miles[00:37:33] Yes. The density of, like the area of space. If you look at something like computer vision or something like that. You look at one researcher in computer vision and what they’re working on, and it’s a tiny area of this space. They are the world expert on this type of algorithm applied to this type of problem, or this type of tweak or optimisation to this type of algorithm on this type of problem.

Whereas what we have in AI safety is researchers who have these giant swathes of area to think about, because yeah, there are not enough people and there aren’t even enough – AI safety as a research field, or AI alignment, is divided up into these various camps and approaches, and a lot of these approaches are entire fields that have like two people in them. You know, because it’s just like the person who first thought up this idea, and then somebody else who has an interesting criticism of it, and they’re debating with each other, something like that. In five or ten years, that’s going to be a field full of people, because there’s easily enough work to do there. It really is like a wide opening frontier of research.

Ben Byford[00:38:58] Awesome.

Robert Miles[00:38:59] So if you’re interested in doing –

Ben Byford[00:39:01] Exactly. The plug.

Robert Miles[00:39:04] No, genuinely. If you want to make a difference in AI research, you want to have a big impact, both in terms of academically, but if you want to make a big impact on the world, this is probably the – I don’t even know what is second place. Anti-ageing research maybe? But it’s definitely up there as the best thing to be working on.

Ben Byford[00:39:31] So, does consciousness and this idea of the superintelligence maybe being intelligent in a way that we could ascribe consciousness to it? Or that the ideas we have around consciousness may apply? Does that come into this equation of alignment, or the idea of the superintelligence?

Robert Miles[00:39:52] Yeah, for me it doesn’t, because I feel I have no need of that hypothesis. At least in the abstract, you can model something as a utility maximiser and then it’s just like a straightforward machine. It’s building this model of the world. It can use the model to make predictions. It has some evaluation utility function that it can look at the possible world stakes, and decide how much it wants those. Then it can look at its actions and make predictions about what would happen in the case of each action, and it chooses the action that it thinks will lead it towards the world that scores highly according to its utility function. At no point in this process is there any need for internality or consciousness, or anything of that grander scale. It’s possible that when you, in practice, when you build something that does this, it ends up with that, somehow, maybe. I don’t know. But it doesn’t seem necessary. It doesn’t seem like a critical consideration.

Ben Byford[00:41:03] It’s not a component that is implied by whatever the outcome is of the system.

Robert Miles[00:41:09] Yeah. And the other thing is that if we have a choice, I think we should try to make systems that aren’t conscious. If that’s an option. Because I would rather not have to worry about – there’s all kinds of problems you have when – like you turn on the system, and you realise that it’s not quite doing what it should and now is it ethical to turn it off? And all of that kind of thing. Considering that consciousness doesn’t seem to be necessary for capability, which is what we care about, if we can avoid it then I think we actually should.

Ben Byford[00:41:51] Yeah. That’s really interesting. I’m just trying to think of ways that it would be advantageous to choose consciousness, but I guess then we’re getting into the power of Dr Frankenstein situations, where you are making the decision over and above the actual reality of the situation as a need, as a requirement. There’s a certain amount of hubris there, Rob. That’s the word I was looking for. Hubris.

Robert Miles[00:42:27] Yeah. The hubris is implicit in the overall project. You’re trying to create a thing that can do what human minds can do. That’s inherently hubristic, and again I’m kind of okay with that. Seems unavoidable.

Ben Byford[00:42:49] So, given the long term scope of this, is there some really interesting stuff coming through right now? I’ve literally just read a paper yesterday about the halting problem. That’s because I was trying to prepare for this and dive into something somewhere other than your brilliant videos. So is there anything else that you really want to bring to the fore about what is really exciting you in this area at the moment?

Robert Miles[00:43:30] There’s all of these approaches that we didn’t talk about. I feel that you had a question that was about this that we didn’t actually get into. Where we could talk about – there’s too many of them and I don’t want to get into detail about them because, well I would probably get it wrong. Because I haven’t actually read the papers about this recently. You talked about Stuart Russell’s stuff. The stuff that’s happening at Open AI is interesting as well. Things about AI safety via debate. Things about iterated amplification are really interesting. And the stuff at Deep Mind, like recursive reward modelling and that kind of thing, which I’m going to have a video about, hopefully. Some time soon. But these are people thinking about how we can take the types of machine learning stuff we have now and build AGI from it in a way that is safe. Because people are thinking quite strategically about this.

The thing is, it’s no good coming up with something that’s safe if it’s not competitive. If you come up with some system that’s like, “Oh yes, we can build an AGI this way, and it’s almost certainly going to be safe, but it’s going to require twice as much computer power as doing approximately the same thing but without the safety component.” It’s very difficult to be confident that someone else isn’t going to do the unsafe thing first. So fundamentally, as a field, it’s difficult. We have to tackle this – what am I saying? As a field it’s pretty difficult. We have to solve this on hard mode before anybody else solves it on easy mode. So people are looking at trying to be the first people to create AGI, and having that be safe, as a joint problem. And those types of things seem pretty promising to me.

Ben Byford[00:45:50] And it’s a solving the problem of, like you were saying, someone else creating it that isn’t safe. So you’re trying to get there before they get there with the more correct option. The better-aligned option.

Robert Miles[00:46:06] Yeah, so that’s another thing that’s like if you’re not technically minded. If you’re not well-placed to do technical research, there’s a lot of interesting work to be done in governance and policy as well. Like AI governance and AI policy are also really interesting areas of research, which is like how do you – practically speaking – how do you steer the world into a place where it’s actually possible for these technical solutions to be found, and to be the thing that ends up being actually implemented in practice? How do you shape the incentives, the regulations, the agreements between companies and between countries? How do we avoid this situation where, we all know whoever makes AGI first controls the world? We think we know that, and so everyone’s just going as fast as they possibly can, like an arms race, like the space race type situation, in which people are obviously going to be neglecting safety, because safety slows you down, then you end up with nobody winning.

How do you get people to understand – this is like a much broader thing – how do you get people to understand that there are positive sum games, that zero sum games are actually pretty rare. That it’s so, so much better to get 1% of a post-singularity utopia perfect world, than 100% of an apocalypse. We have so, so much to gain through cooperation, unimaginably vast amounts of value to be gained through cooperation, and a really good chance of losing everything through not cooperating. Or a bunch of outcomes that are dramatically worse than losing everything are actually in play if we screw this up. Just getting people to be like, “Can we just slow down and be careful, and do this right?” because we’re really at the hinge of history. This is the point where – this next century – is the point where we win, or we blow it completely. I don’t see an end to this century that looks approximately like what we have now. This is for all the marbles, and can we pay attention please.

Ben Byford[00:48:59] Yeah, and I think that argument can almost be applied to lots of different areas. Like the environment, biodiversity, maybe ideas around poverty and equality and things like that.

Robert Miles[00:49:13] Yeah. The thing is, and this is why I want to talk about hubris. I would be a lot more concerned about things like climate change if I didn’t know the things that I know about AI. I probably would be focusing on climate change, but climate change is fundamentally not that hard a problem if you have a superintelligence. If you have a system that’s able to figure out everything that – you know, if you have a sci-fi situation, then just being like, “Oh the balance of gases in the atmosphere is not how we want it, you know” – and the thing’s figured out molecular nanotechnology or something, then potentially that problem is just one that we can straightforwardly solve. You just need something for pulling out a bunch of CO2 from the atmosphere and whatever else you need. I don’t know, I’m not a climate scientist.

Likewise poverty. If you get something that is aligned – and aligned with humanity, not just whoever happens to be running it – then I don’t anticipate poverty. Possibly there would be inequality in that some people would be drastically, drastically richer than the richest people today, and some other people would be like five drasticallies richer than the richest people today, but I’m not as concerned with inequality as I’m concerned with poverty. I think it’s more important that everyone has what they need that everybody’s the same, is my personal political position. But again, it’s that kind of thing. If the problem is resources, if the problem is wealth.

Solving intelligence – it’s not like an open and shut thing, but you’re in such a better position for solving these really hard problems if you have AGI on your side. So that’s why I – with my choices in my career, at least – my eggs are in that basket. I don’t think that everyone should do, I’m glad there are people working on the things that we’re more confident will actually definitely pay off, but I do see AGI as a sort of a Hail Mary that we could potentially pull off and it’s totally worth pushing for.

Ben Byford[00:51:36] I think it’s one of those things where we’re confident now that things are going badly, so we’ll sort that out. But with the AGI stuff, it could go really well, but we shouldn’t die before we get there, right. We should probably sustain and be good to the world before we fuck it all up, and then we haven’t got this opportunity to go on to do these other solutions.

Robert Miles[00:51:59] I don’t advocate neglecting these problems because AGI is just going to fix it at some point in the future. All or nothing. But there is an argument that concentrating on this stuff is – there’s a line through which solving this solves those problems as well, and that increases the value of this area.

Ben Byford[00:52:24] So the last question we ask on the podcast is to do with what really excites you and what scares you about this technologically advanced, autonomous future. We kind of spoke about this apocalypse and possible, not necessarily utopia, but being able to leverage –

Robert Miles[00:52:44] Relative utopia.

Ben Byford[00:52:44] Yeah, relative utopia. Does that cover it, or are there other things that strike you?

Robert Miles[00:52:54] Yeah, it’s a funny question, isn’t it. It’s like, apart from the biggest possible negative thing and the biggest possible positive thing, how do you feel about this? I think that covers it.

Ben Byford[00:53:09] Yeah. I thought that might be the case. I just thought I’d give you the opportunity anyway.

Robert Miles[00:53:11] I know, totally, totally.

Ben Byford[00:53:14] So, thank you so much for your time, Rob. I feel like it’s one of those things where we could definitely mull this over for the rest of the day. I’m going to let you go now. Could you let people know how they can follow you, find you and stuff, get hold of you and that sort of thing?

Robert Miles[00:53:36] Yeah. So, I am Robert SK Miles on most things. You know, Twitter, Reddit, GMail, whatever, that’s “SK”. And the main thing is the YouTube channel, I guess, Rob Miles AI. I also make videos for the Computerphile channel – that’s phile with a “ph”, someone who loves computers. And if you’re interested in the technical side, like if you’re a researcher – not necessarily a safety researcher, but if you’re interested in machine learning, or have any kind of computer science background, really, I would really recommend the alignment newsletter podcast, or just get the Alignment Newsletter in your email inbox. If you prefer listening to audio, which you might given that you currently are, Alignment Newsletter podcast, it’s a weekly podcast about the latest research in AI alignment. I think that’s it from me.

Ben Byford[00:54:41] Well, thank you again, so much. And I’m definitely going to – that’s someone else I’m going to – have to have back, so that we can mull over some of this again. Really, really interesting and exciting. Thank you.

Robert Miles[00:54:55] Nice, thank you.

Ben Byford[00:54:58] Hi, and welcome to the end of the podcast. Thanks again to Rob, who’s someone I’ve been following for a couple of years now – his stuff on Computerphile and also his own output. So it’s really amazing that I get to talk to people like Rob, and people I’ve talked to in the past on the Machine Ethics podcast about some of these questions that have just been itching inside of me to ask about as I’ve been watching their work, or reading their work. So it’s really fantastic I was able to get hold of Rob. One of the things in our conversation is that I wasn’t quite as certain or determined as Rob was of achieving AGI, but I admire his devotion to the fact that if we do, then we should probably do it with good outcomes. This podcast was kind of a continuation from our interview with Rohin Shah, so if you want to listen to more in that vein, then go and check out that episode, and find more episodes on the podcast. Thanks again, and I’ll see you next time.

51. AGI Safety and Alignment with Robert Miles

Transcription:

Episode host: Ben Byford