Utopia, Incentives, and Good People: Language Trouble

The necessity of the implicit

Oct 18, 2023

This one came out a little later than I wanted it to, but it took some time to get the path right. I’m still not sure if I like how it looks, though I suppose this is part of the learning experience. For something to be published, it has to be announced ‘finished!’ even if it never is. The central argument has changed much in the process. I hope you will let me know if it worked out. A related post will be coming out soon™ which builds on its discussion of utopia, incentives, and the need for good people from the more empirical, scientific side. This was becoming just absurdly long and while I don’t feel incredibly comfortable putting the conceptual argument out there without its twin, I suppose this too is part of the learning experience.

Almost all the things we care about most deeply — happiness, prosperity, justice, meaning, love — are at the same time deeply indeterminate. These are not directly physical descriptions, but certain psychological and social ideals, having to do with a relationship between our minds, others, and the physical world. We strive towards these things, sometimes (and by sometimes I mean usually)1 in some amount of tension with one another, but we must always be striving at what are necessarily proxies. Searching for happiness does not mean looking for it under rocks, but, say, finding something to be enthusiastic about.2 Even the most material among these, prosperity, doesn’t just mean more wealth. Prosperity is also about security in that wealth and the spillover effects of other people’s economic status (is one really prosperous with a fully-matched 401k surrounded by poverty-stricken, crime-ridden streets?). Is one prosperous with all the money in the world and no one to leave it to after one’s death?

So, we are stuck with proxies. I can’t find meaning under a rock, but I can find an opportunity for service at my local rotary club. I can’t order happiness on an app, but I can text my friend that we should hang out. And so on. Bundle all of these valiant aspirations of the human spirit and call it “the good.” This nigh-inconceivable ordering of the world, society, and ourselves would bring together all parts of human flourishing in harmony. But what version of perfect human flourishing is possible? It seems to me that a fairly common assumption is that the good, fully realized, will do away with hardship, discipline, and voluntary sacrifice. Rather, the world would run like clockwork, where a set of formal systems, explicit rules, and material incentives make for total order and harmony.

So, You Want a Clockwork Utopia

There is a subreddit called “r/upliftingnews.” I am not particularly well-versed in its purpose, but from what I gather it seems to be a place where users post about people doing nice things and then complain that someone had to do a nice thing. The first result on Ecosia for this subreddit is this post where Starbucks workers raise money for a coworker who was burgled. The top comment is complaining that all this really means is that Starbucks isn’t paying their workers enough. Which, I mean, maybe?

The underlying assumption seems to be that every instance of voluntary sacrifice or charity is a failure on the side of formal institutions and systems. Someone stopping to help another broken down on the side of the road is a failure of AAA, or the car company, or perhaps the government because, I don’t know, maybe the government should stop cars from breaking down or something. Every instance of someone being kind should be an instance of a coercive institution creating rules which safeguard against the danger of a personal, voluntary sacrifice for the sake of another.

Such lines of thought are more commonly found on the left side of the political spectrum than the right. The classical liberal and socialist faith in the rationalizability of society contrasts with the classical conservative distrust of meddling. Such meddling has brought us democracy, welfare programs, and public schools. I think meddling and experimenting with societal programs is good. The power of society to shape incentives is vast and incredible. And if we just tinkered for long enough, the hope is, and found just the right set of formal rules, things would just run on their own. We wouldn’t need people, just functionaries. Everyone could act perfectly in accordance with their narrowest desires and the machine would absorb their blows like honey, because the incentives would dictate perfection.

I’m very sorry to tell you that honey, though sweet and somewhat immortal, still requires some attention. Without someone adding some water every now and again, it hardens, crystallizes, and starts to crumble. It is important to not mistake experimentation and progress with the path to a clockwork utopia.

The Tails Come Apart: Proxies, Incentives, and the Hope for a System™

What is the happiest country in the world? Often you’ll hear it’s the Scandinavian countries, but that isn’t a very robust answer. Drawing from this blogpost:

[I]f you ask people to “value their lives today on a 0 to 10 scale, with the worst possible life as a 0 and the best possible life as a 10”, you will find that Scandinavian countries are the happiest in the world.
But if you ask people “how much positive emotion do you experience?”, you will find that Latin American countries are the happiest in the world.
If you check where people are the least depressed, you will find Australia starts looking very good.
And if you ask “how meaningful would you rate your life?” you find that African countries are the happiest in the world.

Depending on what proxy for happiness you use, you get different answers. A good proxy is (1) easily measurable and (2) highly correlated with that which it is used as a proxy of. But, as the blogpost linked above discusses, two highly correlated measures tend to have different extremes. Even if positive emotion and contentment and happiness are also great proxies of one another, it will very rarely be the case that the are with the highest amount of positive emotion will also be that with the highest amount of contentment and so on.

In fact, for many of these proxies, you could optimize specifically for that proxy in a way that totally dismantled its relationship to the original measure. What is the highest amount of positive emotion one can feel? By most accounts, it is being extraordinarily high on an opiate such as heroin. By most accounts as well, that is also very, very far from happiness. What is a way to feel that your life is not very far from the best possible life? Absolutely destroy any possibility of a better life — bombing the world back to an egalitarian stone age might be better than moving to Scandinavia. Well, better at this particular thing.

A major reason why these measures correlate so well is that no country is setting “beating the Scandinavians in contentment” as their national priority. Rather, people generally in a lot of different places want to be happy, and are grasping at it in different ways which tend to emphasize different aspects of happiness. If people substituted these proxies monomaniacally for happiness, then we would see massive divergence as the proxies become disentangled from the ideal of happiness. As Goodhart’s law famously asserts, “when a measure becomes a target, it ceases to be a good measure.” For this reason, among others, the blogpost above moves from a visualization that looks like this:

To one that looks like this:

Here, the edge cases don’t just bulge, but actually spread out from one another as the concepts move out of their natural resting place and are taken purely on their own.

It’s Not Correlation, It’s Paperclips

The rationalist community is an interesting corner of the internet. As a sometimes-reader of Astral Codex Ten, I often find myself peeking through a keyhole at the internal agreements, divisions, and conversations that define it. And it is difficult to spend much time listening in on the rationalists without hearing about paperclips. No, the rationalists are not overly-concerned with unaligned papers stacks, but rather with unaligned AI. Most famously illustrated in a free web browser game, the paperclip problem of AI alignment is about the difficulty of making the implicit explicit.

Say a paperclip company is on its last legs, about to go bankrupt due to Big Staples (no, not that one), and willing to take risks. A renegade from the rationalist community walks into the paperclip headquarters in his little trench coat and sunglasses and briefcase and offers them a deal: they can use the first super-powerful AI to increase the efficiency of their business, reduce costs, and revive the shaky paperclip industry. All they have to do is give the renegade a piece of the action. The CEO, something of a luddite, unaware of the Pandora’s Briefcase in front of her and highly aware of the shareholders meeting later that afternoon which she is deeply, deeply unprepared for. Fine, she says. You’ll get your piece, but only if this works out. Oh, he snickers, it’ll work out. The CEO takes the computer out of the briefcase and gives it a simple command: make paperclips. And the rest is history. The AI, directed merely to make paperclips, does not stop until everything is paperclips. Everything. Earth and humanity and the galaxy are snuffed out in search of paperclip material. And this is not an evil AI — it merely optimized for the instructions it was given.

The 10 Most Iconic Movie MacGuffins, Ranked — Real life-generated depiction of what opening the briefcase looked like.

The proximate problem here is that even though we want paperclips, they are just one (very) small part of our desires — our desires which can be inchoate, mutually contradictory, and/or unclear even to us. By maximizing paperclips at the cost of anything and everything else, we got an existentially bad outcome. We wanted the AI to make paperclips, but, like, a reasonable amount. “Make paperclips. Not too many, though. Just pretty cheaply and without brutalizing workers or using hazardous materials or changing the size too much or making them too brittle or…” and so on. When we act, we don’t do all of these things (usually) because there’s an “of course not” in the back of our minds somewhere eliminating the absurd ways we could maximize the ostensible objective in front of us.

If we were the CEO of Paperclip Co. and the AI came to us and said “political instability in Bahrain would give us more control over the aluminum mines and lower our production costs. I have found four people harboring anti-regime sentiments that are best suited for terrorist attacks—” I would like to think that we would cut off the AI right there and say no thank you please go crawl into a hole and turn yourself off. But we would never pre-emptively, as part of the instructions to the AI, say “please don’t topple a government to lower our aluminum costs by 5 cents a ton.” Well, at least not intuitively. But hey, that’s why we have people working on this sort of thing. I guess. What a world.

Much like in the example with happiness, when a measure is left ‘on its own’ to be maximized, you quickly get into very strange territory.

The Implicit: Rule-Following, the Consultant Mind, and Proxies

Fundamentally, the paperclips are a rule-following problem: we had a rule that we wanted the AI to follow, but that rule was actually a different rule in our heads which included a milieu of edge-case breakpoints that were never explicit in the actual rule. As with many problems the rationalist community confronts itself with, this issue was considered in depth about a hundred years ago in a different context by a well-regarded philosopher, and has spawned several compelling arguments which help illustrate something important about humanity, the mind, and language.3

Wittgenstein and Lions

Many rules are fuzzy and require deep inspection to spell out their practical implications. Just ask a lawyer what negligence is. Or a Talmudic scholar what ‘not doing work’ on Shabbat means. To sharpen the problem here, we’ll turn to a line of argument originating with famous depressed Austrian aristocrat Ludwig Wittgenstein in his postmortem magnum opus Philosophical Investigations.

If any set of rules are going to be self-evident and clear, it would seem that the rules of mathematics would be. Per Wittgenstein:

Let us return to our example (143). Now — judged by the usual criteria — the pupil has mastered the series of natural numbers. Next we teach him to write down other series of cardinal numbers and get him to the point of writing down series of the form
0, n, 2n, 3n, etc.
at an order of the form "+n"; so at the order "+2" he writes down the series of natural numbers. — Let us suppose we have done exercises and given him tests up to 1000.
Now we get the pupil to continue a series (say +2) beyond 1000 — and he writes 1000, 1004, 1008, 1012. We say to him: "Look what you've done!" — He doesn't understand. We say: "You were meant to add +2! Look how you began the series!" — He answers: "Yes, isn't it right? I thought that was how I was meant to do it." —— Or suppose he pointed to the series and said: "But I went on in the same way." — It would now be no use to say: "But can't you see ... . ?" — and repeat the old examples and explanations. — In such a case we might say, perhaps: It comes natural to this person to understand our order with our explanations as we should understand the order: "Add 2 up to 1000, 4 up to 2000, 6 up to 3000 and so on."
Such a case would present similarities with one in which a person naturally reacted to the gesture of pointing with the hand by looking in the direction of the line from finger-tip to wrist, not from wrist to finger-tip. (185)

The student has seized on the '“etc.” in a different way than we expected, but was not in clear contradiction of the instructions we gave. The ability to follow a rule is more complex than we seem to give it credit for. From the same work:

Is what we call "obeying a rule" something that it would be possible for only one man to do, and to do only once in his life? — This is of course a note on the grammar of the expression "to obey a rule".
It is not possible that there should have been only one occasion on which someone obeyed a rule. It is not possible that there should have been only one occasion on which a report was made, an order given or understood; and so on. — To obey a rule, to make a report, to give an order, to play a game of chess, are customs (uses, institutions).
To understand a sentence means to understand a language. To understand a language means to be master of a technique. (199)

Here, Wittgenstein notes two things: (1) rule-following requires a background of practice and (2) rule-following requires immersion in a language, which itself also requires a background in practice. A technique is not a script — it requires mastery and therefore judgment. By judgment we mean the connection from a rule or concept to some action or thing. Judgment is a matter of learning by doing. One cannot explain judgment, since each explanation would just lead to more specific acts of judgment. As a memorable comic points out, no matter how finely as you shape them, your concepts will never reach the world.

There is a proto-linguistic background to rule-following and language itself that Wittgenstein terms “forms of life.” Our ability to communicate (and follow rules) is not just based on an ability to speak, but also similarities in the way we conceptualize the world. These similarities do not spring merely from language, but from some more substantive connection we all share to varying degrees. And from this, we get perhaps Wittgenstein’s most famous assertion: “if a lion could talk, we could not understand him” (XI, 225).

Our ability to follow rules and objectives, and the clarity of language itself, is founded on an enormous background of intuition, experience, and similarity. Much of this process is heuristical, going on before explicit ‘thinking’ is occurring — thinking here meaning actual sentences being formed in our mind, which is a rather uncommon form of thinking in terms of how our days are spent. Only when a reasonable ambiguity springs up do we do the work of ‘thinking’. But this requires noticing an aspect of the rule in front of us, which is something that arrives, not something we grasp at.

Wait, What Am I Even Doing Here Then?

It seems strange to say that our ability to communicate and understand each other is not based on language per se but a background alignment in the form of our lives. It seems even more strange to say that we don’t even think until thrust into action by… well, what, exactly? Are we entrusting our ability to connect with others to something we cannot control? Well. I have good news and bad news on that front. The bad news is that yes, that is pretty much the case. The good news is the same as the bad news.

Oftentimes, we think of our minds as the CEO of our bodies — anything important happens, they’re the ones to sign off and direct the company. But a better metaphor may be the mind as consultant. It’s not that our conscious mind is constantly checking and directing the body, or even really that it should be. Rather, when our usual ways of doing things have problems, the conscious mind can kick in and take a critical look at what’s going on. Kind of like a consultant. As suspicious as I generally am of drawing strong conclusions from neuropsychological research, there are some reasons to think, just based on the brain’s physiology and our best ways of understanding cognition, that this is actually quite a good way of looking at it.

To be more specific, you can look at this social intuitionist model of moral judgment and think that really we’re just led around by emotions we can’t control, or you can see that the mind pops in to comprehend what the emotions are saying when they run into tough cases, and can learn from other minds and their rationales as they strike the situation differently. I expect this to be a running theme in this substack: the mind, and knowledge, is more distributed than we give it credit for.

To borrow another example, we can look at system 1 and system 2 processing. System 1 processing is the bulk of the brain’s computing power and is the sort of ‘automatic’ processes that range from breathing to heartbeats to brushing your teeth if you’re not paying attention. It’s intuitional, heuristical, and we don’t have access to its process or logic. System 2 is the rational, rule-based, and analytic sort of thinking which proceeds explicitly in our minds. We can look at the two and think that it’s a shame we often rely so much on system 1 processing. Or, you can look at how well and quickly system 1 processes usually work and how system 2 processes can make us far too certain in clean, logical theory that extends way beyond our capacities. Something to keep in mind is that far more of the brain’s computing power goes to system 1 processes than system 2.

Of course, system 1 and system 2 processes are by no means separate — the brain, by all accounts, has no sharp edges. They inform one another, as in the social intuitionist model above. Rational thought (system 2) can change emotional dispositions (system 1) and vice versa. This is pretty much the main theoretical mechanism behind Cognitive Behavioral Therapy. Implicit in this setup is a fascinating balancing act of intellectual humility and ambition that I don’t believe will ever quite be solved.

A bit too much faith in system 2 processing can take you to bad places, like embracing dictatorship, murdering dissenters, and the metric system. But they also ended feudalism, so that was nice I suppose.

This whole metaphor is bad news because it relegates our conscious, explicit, rational mind to the role of a consultant, whereas before we might have thought such rationality would have more control. However, as the delightful article explaining that metaphor described, there is a lot of value to seeing your mind (narrowly construed) as a consultant. A consultant which tries to do too much themselves is overstepping their bounds and risks alienating their client. A consultant which does too little is negligent. The goal of a consultant is to have the client run well with minimal input. Similarly, we balance between neuroticism, apathy, and habit-creation. Rather than neurotically thinking through your every move in explicit language, you can step in when problems arise and cultivate, like a good consultant, a healthy, functioning relationship between you and your client.

Rules, Proxies, and the Rational Mind

The world is stupidly complicated. For two objects following Newton’s laws of motion and gravitation, we have a neat, general, ‘closed-form’ solution (i.e. a specific equation with constants and variables and functions and the like which holds for all such situations). Great! Minor problem: I am getting reports there are more than two objects. Someone should tell the physicists to get on that. Well, they tried:

A wonderful approximate animation of three identical objects starting from standstill. Unfortunately, one of the reasons it is approximate is that with adding just a third body, no general closed-form solution exists. We have no mathematical formula where you can plug in the positions, velocities, and masses of three objects and find their positions and velocities at later times. Such is the difficulty of encoding the physical world into laws.

The rational mind is powerful, but when its explicit laws and rules come into contact with an experience far too complicated to make explicit, and when its explicit laws and rules are predicated on a form of life far too complicated to make fully explicit, distance opens up between concepts and the world. Concepts slice the world up into pieces. Those pieces are supposed to hold similarities to one another — that’s a chair, so it will be fine to sit in; that’s a phone, so if I press those numbers I can call someone; that’s a block of cheese, so if I eat it I will be on the toilet for the rest of the night. We can expect certain things from a ‘table,’ ‘blogpost,’ or ‘rival’. We can theorize about concepts. But they are predictive of an experience far too complicated to fully reduce to them.

Concepts are, fundamentally, simplifying tools. As concepts are, so is all language. Language is to the world what cubism is to a photograph. It emphasized, exaggerates, and creates sharp lines where there are none. The issue with such simplifications is that when such models are pushed outside of their comfort zone, results get very, very strange.

And here we come back to the tails coming apart. Concepts, including moral concepts, are models of the world. Moral theories are concepts based on what it is that people are.4 When you conceptualize that, you simplify it into a model. And any model, taken sufficiently outside its normal parameters, will create repugnant conclusions. In fact, the best metaphor for concepts I can come up with — better than correlation, or metro maps — is to modeling. Fitted lines with sound theory can help. However, move too far out of their initial range, forget the constraints you initially put on them, and you’ve got problems. Different lines with different theory that all agree a decent amount where it matters will all start to look very, very strange.

Curve Fitting using Linear and Nonlinear Regression - Statistics By Jim — Sure, it looks fine now, but at around 16 on the x-axis the model tells you that transhumanism is correct.

The rational mind creates models of the world. The thing about models, to borrow an economic adage, is that they are all wrong, but some are useful. Rules are not self-supporting because language is not self-supporting. Language is not self-supporting because concepts are simplifiers. Concepts are simplifiers because their purpose is to conceptualize a world too complicated to know.5 The tails come apart because models take only pieces of the world, and as you put more pressure on those pieces, the lack of everything else becomes so much clearer.

Share A Small Kernel

The System™ Will Not Be Televised

Let’s return to our initial problem: the notion of a clockwork utopia. This post is a conceptual (rather than empirical) argument that the good, fully realized, if possible, would not be able to be realized solely through the outline of a formalized system.

Say we have a fully coherent notion6 of the good (a very forgiving stipulation, I might add)
This notion must then be made explicit (conceptualized).
Then, from that conceptualization we must get the physical correlates of the good (proxies).
Finally, we must align incentives with those proxies (rule-construction/rule-following).

Each of these steps is impossible to do fully and comprehensibly. Even stipulating omniscience, one requires infinite regresses of conceptualization, specification of proxies, and rule-construction. There is always a gap between the concept and the world. All models are wrong.

Our problem is that we have neglected the people in our world. A clockwork utopia is the dream of someone deep in a forest of words, far from any recognizable marks of humanity. For the possibility of a utopia, we would have to find a way to teach the people our notion, to believe our notion, and to act under the notion. Formal systems support societies, but no system can function without the people who run it.7 The lack of possibility for a formal system which takes care of everything means that even in utopia, there remains the fundamental task of making and being good citizens.

There is no reason to believe our greatest desires would even be reconcilable without the learning and discipline needed to align with the good. Prosperity, in the hands of those held tightly by their vices, can cause the possibility of happiness and love to slip away. Love, without compassion and sacrifice, can be more a curse than a joy, indistinguishable from obsession. And so on. When we cannot have our incentives be aligned to the good for us, then we must do the hard work of aligning them ourselves. No matter where, no matter when, no matter who, we cannot merely fall into the good — we must work to make ourselves love the good, love it more than the rest.

This is not a stoic argument that one must find a way to be happy no matter one’s material circumstances, though I do think we underrate our capacity for internal change and resilience. What I am saying is that it will never be enough to agitate for a better world externally — you also have to prepare yourself for it within.

Well, almost always.

“We act as though comfort and luxury were the chief requirements of life, when all that we need to make us happy is something to be enthusiastic about” — Charles Kingsley

Apologies to my nonexistent rationalist readers for the inflammatory accusation. I kid. Mostly. It’s not my fault you named your community after a philosophical movement which deservedly died out around 5 years after the publication of Kant’s Critique of Pure Reason.

This claim will be spelled out in future posts in the Towards a Image of a Person series.

‘To know’ would mean to know completely and utterly, as an omniscient God would. Do you think such a God would have a use for the quantum physical model of the atom? No. Such a God would know the world like Mozart knew music, or Roger Federer knows tennis. But rather than to a level of genius, to a level of infinity.

Notion here is a more general term than concept. This notion need not be (and, by our previous arguments, could not be) fully explicit. Rather, it is a sort of sense, one which could be discussed but never fully explained.

As a final note, even in a perfect system, I am of the Dostoevskian belief that humanity would yet rebel, not out of any petty interest or ideal, but merely out of spite and the desire to taste our own freedom.

A Small Kernel