Links through which to Think about AI [2]: Simulation and Superintelligence
Cribbing thoughts for thinking about thought-thinking robots
This post is a sequel to Links Through which to Think about AI [1]: Work, Cognition, and Economic Revolution. As this blog of mine’s purpose is somewhat to help me practice finishing things, I have decided to produce the sequel. The third (and hopefully final) post in the series will arrive next month.
There are many areas where I feel behind the curve on this topic, so my own conclusions should be taken with a grain of salt. Things move fast in the AI field, and my sources which were compelling a year ago might have fallen by the wayside or be shown as obviously wrong by now. Nonetheless, the obvious problems of early AIs, especially within the same paradigm, can instruct us on the possible pathologies of more complex AIs which do not exhibit the same problems so obviously, but have the capacity for those problems somewhere in their structure. Now, without further ado:
Before ChatGPT, most people’s image of artificial intelligence had more to do with Skynet or HAL 9000 than neural networks and GPU prices. But times change. After the advent of ChatGPT-3 and its initial breach of yet one more barrier between man and machine (and a highly distinguished one at that), the niche world of AlphaChess and strange walking strategies has exploded into prominence. Now AI can create language — whatever qualifications you want to put in front of that, a computer is creating language. That is shocking.
The first post in this series was practical-minded, looking at the short- to medium-term economic consequences of AI in this current paradigm. This post, will focus not on the many domain-specific applications of machine learning, but rather on the Large Language Models such as ChatGPT which have captured the imagination of so many. What we learn from LLMs will bring us to new inventions and innovations in the realm of AI, and the long-term possibilities of AI are, well, perhaps they are HAL 9000 and Skynet. Or maybe they are not. Either way, to think about AI in the long term, we can’t just look at graphs and historical analogies — we have to actually understand the specifics of the object at hand.
An Aside on Effective Altruism: Love Them or Hate Them, They’ve Saved More Lives than You
If you want to talk about AI research, AI risk, or just AI, you are kneecapping yourself if you do not take a serious interest in and lend an ear to the Effective Altruists (EA). The EA community hopped on the AI research train early and has exerted a gravitational pull to people interested in talking about long-term AI possibilities. In a Substack Note that I can no longer find, someone on this website once said that if Scott Alexander was not explaining cutting-edge AI research in plain language, they weren’t sure who would be. To write about AI as a layperson and not take information from Effective Altruist communities is to go 95 on a freeway without your glasses on: you might end up in the right place, but you’re foregoing your best chance at seeing clearly in a field that is moving extraordinarily quickly and relying quite a bit on rough intuitions and blind luck.
Effective Altruism is a movement which is organized roughly around a kind of hard-nosed utilitarianism that tries to use academic-style rigor to answer that old, thorny question: how can I do the most good? Not everyone likes Effective Altruists. Every once in a while, you’ll see a piece in a newspaper that is some variation of “Effective Altruists are weirdos or hypocrites or missing the point for some reason or another.”
There are better and worse hit pieces out there, I’m sure. However, a surprising amount of them seem really bent on not caring about the fact that Effective Altruists have saved more lives than you.
If I was writing this a year ago, or even a few months ago, I might have said something snarky, like “EA is what happens when you leave a bunch of nerds to try to prove themselves objectively, quantitatively right (as nerds are wont to do) in a new realm: practical ethics. The result is not extremely far from what you would expect.” Now, though, I don't think that is what the situation requires, so I'll merely write that as an example of what I would have said in another case. This way I can have the fun of the snide remark (and thereby perhaps signal my own distance from the movement) but without the responsibility of actually holding it as my position.
In all sincerity, the Effective Altruism community is one that I am happy to defend vociferously at arm’s length. I do not consider myself a member of it, and I am fairly sure I don’t agree with the most distinctive philosophical tenets of the community. I may sometimes feel as though the foundational documents of the field encourage deep oversimplifications of what people are, or have some other gripes with just the most outside-the-norm parts of the community. But it’s a community! It has crackpots and weirdos and oversimplifiers like every other community on the planet. Its weirdos are just weirdos in a different direction. And, if just a bit facetiously, I will say the EA community has, even in this, done something incredibly powerful: it has created a social technology to harness the nerdy desire to be objectively, quantitatively right for the purposes of the social good.
I want to be fair to people who bully Effective Altruists. The character archetype of the nerd really is so fun to make light of. Prone to analyzing a single truth into a million different false parts, incapable of and (therefore?) dismissive towards social graces, neurotic and sophistic and prone to resentment and coddling an intellectual ego as fragile as it is large — these are fun character flaws ripe for satire! But let's be clear here: are you willing to trade the annoyance of hearing people be nerds for, at minimum, 200,000 lives? Even if you aren’t a utilitarian and believe that consequences are not the be-all end-all of morality, I would scoff at any moral theory that would see the choice to donate 10% of your income to public health initiatives in poor countries as anything but good.
It's the Elon Musk-leftist argument, except instead of making allusions anti-Semitic theories and buying Twitter, they're an arrogant coworker who flashes the name of a jargon-y LessWrong post and smiles at you like they won the argument. That arrogant coworker has probably made more people’s lives materially better than you have. If you want to chuckle at the expense of Effective Altruists, sure, fine — you haven’t made it as a social group in the US these days until you’ve been lampooned a few different ways. But don’t do it without remembering that the core insight of Effective Altruists — right now, in our country, at our income levels, it is just so easy to save a life — is powerful enough that even the dickiest EA-bro in the whole dicky Bay Area is someone you should rather be doing his EA stuff than not, and by a margin that isn’t even interesting to think about. If you forget that, then you’re just proving the need for Effective Altruism, that human sympathy is only as powerful as it is provincial. You might be less likely to talk over a coworker, more likely to support all the right political causes, but how many children have you been responsible for deworming?
Back to AI.
AI Is a Computer
Look, it’s obvious. But it’s not? Sometimes, we will smack a car, kitchen appliance, or electronic device out of frustration with the device. Perhaps even curse it or otherwise impugn its honor. This is, as we all know, quite silly: cars have no honor!1 But we can’t help it. It’s natural to treat something like it has a mind, like it will react how we would if we were yelled at, hit, or cursed. It is easy to forget that our manner of interacting with a car can is strictly limited to certain pedals, buttons, wheels, etc. And if we can forget that about a car, then surely we can forget that about a machine which is entirely optimized to sound like a person. After all, a car can’t even respond to your curses, while an AI can politely inform you that such queries are against its code of conduct and to please refrain from cursing further.
In fact, AI, unlike other computer programs, is so good at sounding like a person that often it is very useful to think of it like a person. When you chat with BingAI, it is more useful to think “how would I explain this situation to a help desk?” than it is to think of the mechanics of diffusion models. Nonetheless, it most assuredly is a computer. But what kind of computer?
AI Is a Brain
Contemporary AIs are generally neural networks. I will keep this brief to give myself the least opportunity to embarrass myself, but neural networks are a method of machine learning which ape their structure from that of biological brains, and have recently been pulling ahead of other approaches. In doing this, neural networks follow a long line of technical innovations which discover that, yeah, the biological world is pretty good at figuring out the best way to do things.
Machine learning algorithms ‘train’ by taking in massive amounts of information and encoding different neurons to respond to different sorts of prompts. Training is where the AI defines what “sorts of prompts” means — which individual instances are like other individual instances, and which are very different. This training is automatic: a relative black box where, unlike other computer programs, a coder does not dictate any relationships, but rather the AI’s eventual substance is dictated by the very structure of the information itself.
The chat-based models that we are calling artificial intelligence are Large Language Models (LLMs): engines for language-creation built on massive stores of the written word and trained to respond to linguistic prompts.
What makes AI really stand apart, functionally, from other computer programs is the automatic nature of training. Since no person codes rules for the program’s response to any sort of query, AI seems to take on a life of its own. There are no lines of code that we can point to and say “the computer responded this way because of that function.” Rather, a massively complex organization of neurons responds dynamically to prompts, so that at once it is impossible to predict exactly how the AI will respond and the AI indeed will not respond in the same way to the same prompts every time.

Every ‘neuron’ (a non-linear mathematical function) in an AI program is connected to several others by ‘synapses’. When it receives signals (in the form of real numbers) from either from the input layer or other neurons in a prior hidden layer, it sends signals of its own to other connected neurons either in a subsequent hidden layer or in the output layer. For reference, ChatGPT-3 has 96 hidden layers. Each neuron takes signal inputs, runs them through a function, and produces a signal output. Neurons are also ‘weighted’ (i.e. have their output signals boosted or reduced), and this weight can be adjusted throughout training. All of this is meant to be analogical to biological brains.
Since humanity started thinking about computers, we’ve thought “what if computers were more like us? What if they could really think?” The question “where does the boundary between machine and man lie?” is an old one. The question of whether a person is anything more than an incredibly complicated machine crops up in the annals of European philosophy, as early as the 18th century. In more recent times, it has animated entire subgenres of science fiction.
A year and a half ago, the release of ChatGPT-3 and its impressive capacity for language brought these questions out of fiction and speculation and into our concrete future plans. Before we can plan for the future, we must consider what is going on right now: what is an LLM? What does it know? How does it think? Does it think? What does it mean to ‘think’?
AI Is a Simulator
It’s very easy to slip into thinking of things in terms of agents. When we get angry at a car and yell at it, we reflexively imagine it to have a mind with goals like “don’t get smacked” (or at least treat it that way — I am sure if you asked someone if they truly believed their car had ‘a mind of its own’, they would answer in the negative for the most part). The more complex and human-like an object of your interest becomes, the harder it is to think of it in the detached, mechanistic manner of material manipulation. Rather, it becomes natural to think of it as having intentions, thoughts, feelings — that it is like you.
It is the typical mind fallacy drawn out over even longer distances and to even more basic assumptions. P. F. Strawson, a philosopher of moral responsibility, once placed this ‘intentional frame’ of thinking as a bedrock principle of human experience, and indeed as the ground of moral responsibility itself. In this blog, I have also argued that the supposition of other minds is a necessary part of experience.
When we talk about AIs “lying,” we’re talking about them as agents with goals. But this is not the best way to think about our current AI tools:
Without knowing how future AIs would work, [AI alignment pioneers] speculated on three potential motivational systems:
Agent: An AI with a built-in goal. It pursues this goal without further human intervention. For example, we create an AI that wants to stop global warming, then let it do its thing.
Genie: An AI that follows orders. For example, you could tell it “Write and send an angry letter to the coal industry”, and it will do that, then await further instructions.
Oracle: An AI that answers questions. For example, you could ask it “How can we best stop global warming?” and it will come up with a plan and tell you, then await further questions.
These early pioneers spent the 2010s writing long scholarly works arguing over which of these designs was safest, or how you might align one rather than the other.
In Simulators, Janus argues that language models like GPT - the first really interesting AIs worthy of alignment considerations - are, in fact, none of these things.
…
Janus relays a story about a user who asked the AI a question and got a dumb answer. When the user re-prompted GPT with “how would a super-smart AI answer this question?” it gave him a smart answer. Why? Because it wasn’t even trying to answer the question the first time - it was trying to complete a text about the question. The second time, the user asked it to complete a text about a smart AI answering the question, so it gave a smarter answer.
So what is it?
Janus dubs it a simulator. Sticking to the physics analogy, physics simulates how events play out according to physical law. GPT simulates how texts play out according to the rules and genres of language.
But the essay brings up another connotation: to simulate is to pretend to be something. A simulator wears many masks. If you ask GPT to complete a romance novel, it will simulate a romance author and try to write the text the way they would. Character.AI lets you simulate people directly, asking GPT to pretend to be George Washington or Darth Vader.
…
So far, so boring. What really helped this sink in was reading Nostalgebraist say that ChatGPT was a GPT instance simulating a character called the Helpful, Harmless, and Honest Assistant.
…
What I thought before: ChatGPT has learned to stop being a simulator, and can now answer questions like a good oracle / do tasks like a good genie / pursue its goal of helpfulness like a good agent.
What I think now: GPT can only simulate. If you punish it for simulating bad characters, it will start simulating good characters. Now it only ever simulates one character, the HHH Assistant.
AI is not trying to tell you the truth. It is not, in some sense, trying to do anything. What it is doing is simulating a ‘character’ based on the patterns it produced during its training, patterns which may or may not be incomprehensible to you and a character which may or may not act in a way expected by you.
AI Is a Mind?
So, what does this mean for sentience? Sentience is something of a magic word. What it reflects is our desire to know when something has a mind, or something close to a human mind — when it is a person, worth caring about and having responsibilities towards like we do other people. And this is reflective of a recurring problem with AI: AI unbundles signs of “having a mind.”
When we aim to determine whether something in the world has a mind, there is not one clear sufficient condition. Does it react in certain ways? Can it communicate in certain ways? Does it show signs of self-reflection? Are there certain patterns to its behavior? Does it look kind of like me? None of these, by themselves, are a knock-down argument for having a mind. Yet, for the most part, they all correlate together and correlate towards something like being human, or having a person-like mind. AI takes these correlations, wraps its Lovecraftian tendrils around them, and rips them apart from one another. It’s a brain without a body, neurons without chemicals, language without sensation.
Still, people try to make arguments for what AI has to be able to do in order for it to be sentient. Matt Yglesias, arguing against several of these attempts, makes the point that when we talk about sentience in an AI, we should be wary of setting the bar too high: attempts to say “AI will be sentient when it is able to do X task” and exclude contemporary AI run into the uncomfortable situation that there are almost certainly, at this point, human beings who cannot do that task but whom we obviously assume are sentient.
One might then try to elide Yglesias’s argument by saying we are making the bar unreasonably high because we are placing it in the wrong area: computational power does not make a mind. Then we are saying baldly that what we are interested in is qualia, the fact of actually experiencing things. We want to know whether there is something that it is like to be an AI, whether AIs are subject to mental states like we are.
Suddenly, we find ourselves deep outside the field of computer science and into philosophy. Because the natural extension of this question is “how do we know whether anyone else has a mind?” Some argue that this is a serious scientific issue that warrants another look at the study of consciousness. I am less sure. It is generally a fool’s gamble to bet against the powerhouse of empirical scientific consensus, but I truly do believe that there are inherent difficulties with the study of consciousness. As I argued in a previous post, it is impossible to know whether anything other than ourselves has a mind:
The problem of other minds, as traditionally conceived, is that we cannot prove the existence of other minds, due to the definition of what constitutues a mind. When we talk about minds, we mean beings which experience things. Philosophers like to call that-which-is-experienced “qualia.” I, as a mind, am the subject of a mental state, which consists of qualia. However, by definition, I only have access to my own qualia, and therefore only have confirmation that my own mind has experience, i.e. qualia. There is no way for me to ‘see’ the experiencing that you do. Everything we have — talking to one another, being the same species, brain scans/neurological complexity — is and forever will be a proxy for a mind. Since, from this skeptical stance towards other minds, our sample size is and forever will be n = 1 (I can only furnish foundational proof of my own mind), these proxies are fundamentally suspect. To summarize: to confirm that something has a mind, and not merely a complex non-mind collection of material, you must show that it has qualia. To show that it has qualia, you need to experience it experiencing. However, we can never experience another mind’s experience, only physical correlates of that experience. Therefore, we can never prove the truth of another mind.
It may just be the case that attempts to base ethics on scientific insight into minds are a dead end. There is even one further issue when we get to AI: if we can find neural correlates of consciousness in our own brains, how would we translate that to robotic ones? AI can create language and talk about itself without necessarily having any sensory input of the world or even continuous experience (although, the experience of firing up whenever there is a query would, from the inside, feel continuous).
In fact, this sense of AI as simulator, which originally made AI feel more alien to us, may actually be a pretty good descriptor of the vast majority of our own minds: Freudian psychoanalysis, Buddhist enlightenment, and the predictive processing model of the brain may, in some sense, be the realization that we are merely world-predictors simulating a character that is the product of those very simulations.
Oracles, agents, and genies may just all be different modes of simulation, different theoretical frameworks to help simplify and predict what is in reality a simulator. What is an agent other than a simulator which is programmed to ask itself some basic questions and prompt action on the answers to those questions. AI might just be the particularly clear case which allows us to crack open classic Philosophy 101 puzzles about what personal identity is, look inside, and find out that there really was never a question to be resolved.
So, does AI have a mind? Will it ever? I don’t know. Why? Are you hoping you would be able to be mean to it if it didn’t? That’s weird.
Making AI Human: Scaling Intelligence
LLMs are highly optimized to sound like a person, so one would be forgiven for thinking of them like one. But they are not human. In some ways, their very verisimilitude makes their departures from expectations that much stranger: we think we can easily map out their ‘minds’ by taking our own as a reference, but we are at one point or another generally confronted with AI’s fundamentally alien incomprehensibility. Every once in a while, there is some concept or query that seems like it should not be a problem for the AI to respond appropriately to, and yet something goes wrong. And it is nigh-impossible to figure out what. An AI’s mistakes are incomprehensible to us. This incomprehensibility, combined with the uncanny-valley strangeness of early AI art (and its excellence at horror) led Noah Smith to posit that AI is Lovecraftian intelligence.
AI has a neural network, just like we do. But, then again, so do bats. Any neural network worth thinking about can be organized in a mind-bogglingly-various amount of ways. This both leads to the “jagged frontier” of AI expertise mentioned in the previous post as well as what I might term a “jagged map” of AI conceptual understanding. With AI, things can get really weird.
The ability level of AI is ‘jagged’ because it does not conform to commonsensical human understandings of difficulty. But why should it? AI is not a human. Even if the formal structure of AI computation is similar to our own — i.e. neural — it is just as similar to a bat’s. A bat finds flying and echolocation incredibly easy, but algebra impossible.2
Similarly, the conceptual map of AI (i.e. connecting words/concepts to specific situations) is ‘jagged’ because it does not conform to commonsensical human understandings of conceptual proximity (or situational proximity, i.e. what situations are similar to others). But why should it? AI is not a human. Even if the formal structure of AI computation is similar to our own — i.e. neural — it is just as similar to a bat’s. A bat… well, I actually don’t know what a bat’s conceptual map may or may not look like.
Either way, the manner AI organizes the world and the patterns it is built out of are not our own. Training is meant to bring AI’s conceptual map of the world closer to our own by dumping a whole bunch of data on it and leaving it to sort the patterns out, patterns which will hopefully be more like our own because they are made out of us. And hopefully they will look more and more like our own the more and more data and computational power we give them.
But what is this data? Why do we expect massive, relatively undirected pattern-finding to make AIs better?
These questions bridge the present of AI with its future. The nature of the data depends on the AI in question — for ChatGPT, it is massive amounts of human language scraped from the internet; for DALL-E, it is the incredibly large number of captioned photos strewn across the internet. The nature of our ability to innovate within the current paradigm of AI depends on the possibility of making AI better by simply scaling the amount of data and computing power we give it to work with.
As for why we expect it to work:
AI Is Compute?
Frankly, we expect this process to work because this is what has worked. The so-called “bitter lesson” of AI development is that all these elegant, theory-laden solutions to bringing about AI fall down in front of just leveraging massive amounts of computing power (often called “compute,” for short). AI has gotten better and better at emulating human writing simply by throwing more and more compute and data into the mix. I really cannot note enough that our contemporary AIs just keep getting more impressive, and the absolute variety of things they can do is only increasing.
In doing this, AIs get access to more and more data points to fit to the patterns and more and more compute to construct patterns out of the data. There are less and less areas of human inquiry and linguistic practice which are one-offs or ungeneralizable, and more and more for which the AI can create a pretty good rule for use and response. This has allowed their current, massively impressive performance. But people still have questions about the current failure modes of LLMs: can we solve these by throwing more data and compute into the mix?
From this question, a debate has ensued over whether simply scaling up the compute and data will continue to work to make AI better and better and smarter and smarter. Many arguments for and against scaling are excellently compiled in this post. Do we need a paradigm shift to complete the work of AI, or is it mere iterations on the current theme? There is some sense in which paradigms appear when the compute necessary for them is provided (though we should remember that reality drives straight lines on graphs, not vice versa). Some people seem to think that contemporary AI is losing its momentum while others think the path forward is multimodality3 and yet others continue believing in the promise of scaling.
I am sympathetic to the scale-boosters. It does generally seem like with enough data and compute, you get enough patterns that a sufficiently large pattern-matching system can do basically anything.4 However, I disagree with their papering over of the main problem with their predictions: we’re going to run out of high-quality linguistic data very soon.5 Wildly, the size of the internet is somehow the bottleneck in AI development. So, the question which (Chat AI model) scaling hinges on is actually one of whether we have enough data, not if we can make enough compute or if it is necessary to have anything other than compute or data. If we run out of good data, more data and compute could end up making AIs worse, not better (assuming this is scaling-based ‘dumb compute’ and not clever, theory-laden compute).
The solution most scale-boosters seem to end up with is self-play. Just like how AlphaZero learned how to dominate various games by just playing itself enough times until it figured out how to win, LLMs would talk to themselves enough times and eventually figure out how to talk better.
Self-Play as Dialect
Bear with me as I move from citation to argumentation — argumentation that ought to be taken with several grains of salt.
Self-play is actually quite easy to translate into human cognitive experience. When a chessmaster computes his next moves — “I do that, then she does that, I take there, she takes back, I move there, check, check, take, take… no, not good” — the chessmaster is doing a sort of minor self-play. Even more direct of an analogy is the fact that chess players often do play themselves in chess, and can learn from that experience. It makes sense that chess works as an avenue of self-play because the rules are evident and the success condition is clear.
Self-play is functionally equivalent, even, to a siloed group of human individuals playing amongst one another. The important part is that they are siloed, that they only have the background conditions of cognition and the game itself to go off of. And it is easy to imagine a group of people who are given chess figuring out on their own proper ways to win at chess against anyone. There certainly may be some path dependence, but eventually (assuming a high enough level of intellect and a long enough period of time) they should land on a competitive equilibrium roughly equivalent to our own. If this seems implausible, we can postulate a simpler game: tic-tac-toe, for example. It would be strange to leave tic-tac-toe to a group of human adults, come back in five years, and either beat them in the game or have them beat you or even, really, have any difference in strategy between them and you.
These things work, though, because of competitive pressures; direct competition causes conformity. Evident rules and clear success conditions are necessary for direct competition. Language is a place where neither of those conditions exist. The rules of language are brutally inevident and a “success condition” of language is an almost meaningless notion. In fact, when a community of individual humans is left alone with a language on their own for a long time, they tend to do the opposite of converge on the larger community: they create a dialect.
Similarly, I would expect an LLM’s self-talk to bring it further from our understanding, rather than closer. The minor differences incipient in the LLM’s language-patterns will be emphasized and reinforced by having difference-inflected data drown out the original data in training. I am much, much more bearish on self-play for LLMs than I am for other types of AI. I believe that self-play can allow for more intelligent LLMs, but these LLMs would be intelligent in a dialect further and further from our own.
Now, I perhaps one can imagine solutions for the AI-dialect problem (e.g. make another AI to translate between our own language and the LLM-dialect, figure out how to learn to speak LLM-dialect).6 So, if I am a scale-booster reforming my arguments, I would say something like, “it’s okay if AIs create their own dialects as long as we have translation devices. What self-play can still do is make AIs more effective at answering queries and using language qua language, rather than language qua English.”
Self-Play as Group Polarization
Again, I will reiterate: our contemporary LLMs are incredibly powerful and intelligent, and produce feats of inference and application that make me incredibly wary about doubting their future capacity based on present hindrances. However.
What the reformed scale-booster has argued is that by talking to itself, the AI can achieve accuracy, if not communicability. There is something inherent in the conceptual map of an AI that will bring it closer to truth if it is left to talk to itself.
Again, we should be able to make an analogy to a group of humans, talking to itself. What happens in these cases? Group polarization:
In a striking empirical regularity, deliberation tends to move groups, and the individuals who compose them, toward a more extreme point in the direction indicated by their own predeliberation judgments. For example, people who are opposed to the minimum wage are likely, after talking to each other, to be still more opposed; people who tend to support gun control are likely, after discussion, to support gun control with considerable enthusiasm; people who believe that global warming is a serious problem are likely, after discussion, to insist on severe measures to prevent global warming.
The whole article is an excellent tour through a fascinating pocket of social psychology, but for us, what matters is that there are two mechanisms which drive group polarization, one of which does not bear on AI and a second which certainly does. Group polarization, the article discusses, occurs due to social pressures to conform and limited argument pools. An AI will not experience social pressure to conform. However, it will certainly have a limited argument pool.
Limited argument pools reflect the fact that people are much better arguing for their position than against it, and when we hear a bunch of arguments all pointing in one direction, we tend to move in that direction (for the very reasonable motivation that the arguments may be fairly convincing!). AIs have a similar problem — they also ‘believe’ the patterns they have already modeled, and self-talk will only reinforce and widen any differences between its patterns and the correct ones.
Much like their human counterparts, LLMs would likely update their priors in more and more extreme directions based on their initial argument pools. Without a success or failure condition to bump up against, any incipient misconceptions about the world will spin out into larger and larger magnitude.7 When we have an AI self-play reasoning, it only has the arguments that it begins with.
Unless we believe that everything we need to know about the world is contained in some form or another in the English-speaking internet in May 2023 and all we need are the (semi-) logical implications of these words — or at the very least that the English-speaking internet in May 2023 is not systemically biased towards any wrong answers — we should be wary of expecting LLMs to learn about the world through self-play.
So what are we to do about the data?
LLMs as iPad Babies
What seems striking to me about all these conversations is the lack of discussion around how lossy language is as an informational compression of the world. Arguably, this is an underlying concept of my post on language and incentives and a core piece of the idea that tails come apart — i.e. that at the extremes, concepts do not provide clear answers to questions of relative powers.
All language is, to an extent, simply compression. Attempting to describe an image, pixel by pixel, is a silly endeavor. Just say it’s an impressionistic painting of a deer in a wheat field and get on with your life. Ditto with the world at large. There is so much more data in an three-second slice of living than there is in this post. When we say that the bitter lesson is that leveraging more compute and data does better than theory-laden solutions, well, what if the data for LLMs is already theory-laden? It is not the world, but already a compression of it.
Even more than is the case for humans, a mistake in an LLM’s conception of language is a mistake in its conception of the world — since all it has is language.
As we can handle more data and need less theory, the natural next step, it seems to me, is to move AI from linguistic to semiotic: AI must experience the world, bump up against it in all its granularity. Then, bridging the linguistic and semiotic is what gives AI (1) more data, (2) substance to its words, and (3) access to success conditions. Once an AI’s thought patterns can bump up against the world and not just itself, it seems to me that it really will be able to ‘self-play’ (though not, in some sense, fully self-play, since its ‘partner’ in play will be the world — much like how it is for us now).
Some people, though, may be worried about bringing AI autonomously into the world like that.
Making AI Humane: Alignment on a Timeline
The meat of public-facing AI research is on alignment. Most AI companies don’t want too much getting out about how to make AI better, because they are in competition with one another. But everyone wants AI to be safe. The next post will go into more details about AI alignment and the worries and work going into it. However, a lot of AI alignment research comes out with AI timelines alongside it: when will AI get dangerous? When will it get superintelligent? When will it be able to do all jobs better than any person?
It is extraordinarily difficult to predict when AI will get powerful enough to be a problem, but that hasn’t stopped people from trying. It is very difficult even to say what will be required, materially, before we get to superintelligent AI. One course has been to compare the computing cost necessary to the development of the human brain (the post also discusses AI timelines generally). Make of that what you will. You can see more about the timelines being thrown out by AI researchers here from 2017.
We don’t even know the shape of AI progress. In the AI safety community, the two orthodox positions duking it out are that of the “slow takeoff” of steadily increasing growth in AI capability or the “fast takeoff” where an AI becomes intelligent enough to iterate on itself and becomes superintelligent incredibly quickly. A discussion of the two can be found here.
There’s perhaps reason to believe that to get to a truly dangerous AI we need some sort of fundamental shift in the technology: i.e., LLMs will not extinct us.8 But when will this shift come? How? We will iterate on LLMs, and that will take us to… somewhere… at… some time.
In conclusion: timelines seem like the least interesting part of AI research to me. How AI will affect the economy: important. How we will make AI better: important. How we will make AI safer: important. When AI will become God: hm, no thanks. Passover is coming up soon and I would rather not be caught with any golden calfs. It’s interesting to hear about the inner workings of AI, how we’re plumbing intelligence out of words and computer discs, and the ways that pure intelligence can go awry. A new eschatology is not going to change how I live or even make me mark my calendar.
AI Is a Brain; We’ve Had Brains Before
I don’t know. I don’t have as much a clean wrap-up as I did last time. This post, moreso than the first, is intimately connected with the next. Let’s try to get a gist of it, though. There is a model for technology wherein:
We have something that exists and does things.
The thing that exists has naturally evolved over many years for its own purposes.
These purposes have a side effect which we quite like, but the thing itself is not optimized for that side effect, and this has caused some constraints on our ability to get as much of that side effect as possible.
We can cobble together something optimized specifically for the side effect, but it’s not quite right.
Brains exist and pattern-match and extrapolate based on data and all this good, useful stuff. However, they were evolved over many years to pattern-match for the purpose of moving about the world and self-replicating. As a side effect, though, they can be effective at making us richer and making our lives easier through allowing things like scientific research, technological innovation, and economic growth to happen. But it’s not really optimized for that side effect — a whole lot of its computing power goes to silly things like understanding social cues which do not directly make more science happen. AI is a brain that we can optimize however we want — if we can figure out how to optimize it carefully.9 That is new and powerful.
People constantly underestimate the logically coherent ways that a world (or worldview) can be arranged. The less data there is, the more ways the world could be set up and still fit all of the data. AI can end up contorted into all sorts of shapes that seem strange to us but are really just one way of matching up the data that it was given. And that is only discussing what goes on within the range you have data for.
Once you are being asked to move outside the distribution of your data, things get difficult to predict. A direction that seems natural for you to extrapolate to could be inconceivable to an AI, and vice versa. And in the real world, we may often be outside the distribution of our data in at least one axis.
Maybe it’s not so bad. Skeptics have been consistently surprised by the ability of raw pattern-matching to simulate what was presumed to require specifically human background context. So, I guess I want to say something like this:
Ever since we started thinking about computers, we’ve thought “what if computers were more like us? What if they could really think?” And, well, ¯\_(ツ)_/¯
Well, Audis don’t.
Citation needed.
A note on multimodality: multimodality seems, to me, like dodging the question of AI. We all use different modes of thinking in different contexts (in fact, some believe our minds themselves are fundamentally multimodal), and deciding which frame of mind to operate in at any given moment is a matter of judgment — i.e. an output of intelligence broadly construed. Hard-coding multimodality into an AI either ejects this matter of judgment from the AI onto the engineer or requires a higher-level AI which can adjudicate between modalities. But why should we expect (a) an engineer to be better than an AI or (b) such an AI separation-of-powers to be preferable? For (b), you still need an AI that can choose its modality — you’ve just made that AI separate from the actual outputting engine.
The main counterargument to this is that there is something special about biological brains. For instance, see attempts to simulate a brain. They do not go well. This seems somewhat misguided to me. It always is vastly harder to simulate a thing than to just be the thing. Though the point that we could be nearing an upper bound and that all exponential curves in physics end up being S-curves eventually is well-taken. It is difficult to see when you are within the S-curve whether this thing is ever going to level off, or even to have the gall to guess when that might be.
Worries about image data are much less pronounced.
A problem with this solution seems to me to be the fact that the translation AI can only be trained on non-self-play data, which then brings the same bottleneck problem down a level. I don’t see why it would not repeat.
Now, of course, the successes will also spin out into larger and larger magnitude, but this will not mean that the AI is overall more accurate in its patterns. Rather, it may just be more consistent, as in more of the implications of its ‘beliefs’ cohering with one another.
Note this argument which is related to my own about linguistic data as a cap on LLMs:
The key to this is really just the scope of inputs. The reason humans act “spontaneously” is that we’re always getting a ton of different inputs from the world — sights, sounds, conversations, or our own thoughts. LLMs, in contrast, only take a narrow type of input, and only at times that humans decide. Again, engineers might be able to change this, and create an AGI that has a ton of “senses” and is always “thinking” on its own. But LLMs are just not anywhere close to being something like that.
More on this next month.
An addendum: https://intelligence.org/2018/10/29/embedded-agents/ embedded agents and decisionmaking systems.