A General Theory of Bigness and Badness

So, a glib take you’ve probably heard is that the problem with Big Government, Big Business, Big Etc. is not the government or the business or the etc. but the “Big”. This is extremely superficial and is essentially elevating a trivial idiosyncrasy of the English language to an important structural principle of the universe, which makes about as much sense as nominative determinism. I think it’s true anyway. Here is my theory of why:

Large groups of people are increasingly hard to coordinate. Getting a group of one person to be value-aligned with itself is literally trivial, 4 people is easy, 12 people is doable for fairly complex values, 50 gets difficult, etc. For a very large organization getting the whole org focused on a complex, nuanced goal is basically impossible.

So the larger an organization gets, the more its de facto goals become simplified, even if it keeps paying lip service to nuanced goals. Theoretically it should be possible to keep nuanced goals at a large scale, but it would take more and more effort per person as you get bigger, and I suspect it would reach “you must spend 110% of your time working on tasks to keep yourself value-aligned”, i.e. impossible-in-practice, somewhere in the 150-1000 employees range.

So that’s part one of the theory: goals get simplified and nuance disappears as the organizations get bigger. By itself this is sufficient to strongly suggest that big organizations are bad in and of themselves. But there are some corollaries which make the case stronger.

Corollary the first: Flatness of hierarchy does matter. The deeper the hierarchy, the more large sub-organizations exist within the large parent organization. The same forces that push the parent org to have a simple goal push each sub-org to have a simple goal. This can explain the rampant bureaucratic infighting in large hierarchical organizations; each sub-org is following its default goal and those come into conflict. This is approximately the Hanlon’s Razor (“Never ascribe to malice that which is adequately explained by incompetence”) analog of  The Gervais Principle.

Corollary the second: “Big” may not imply “evil” but it does forbid “good”. Only simple goals are sustainable for large orgs. But not all simple goals are equally “reproductively fit”. For for-profit companies the most reproductively fit goal is “make a profit”. For political parties it’s “get (re)elected”. For bureaucracies it’s “maintain/expand our budget”. For charities…probably it’s “keep our incoming donations high”, but I’m not confident in that. The bigger the organization, the harder it is for well-intentioned members, even well-intentioned leaders (CEO Larry Page and and President Sergey Brin, President Barack Obama, Chairman of the Joint Chiefs of Staff Colin Powell, …) to keep the organization out of the “low-energy well” that is the default self-perpetuation goal. Organizations which are kept on task for an unselfish core goal will do poorly relative to their peers and tend to die out.

In summary: No manager could possibly keep a large organization on target for a complex goal, and attempting to keep a large org on target for a simple but unselfish goal will rapidly kill the organization. This applies fractally to sub-organizations and super-organizations.

Does this teach us lessons about what to do? Well, it cautions us against trusting that large organizations consisting of benevolent people will act benevolently. It makes me somewhat more skeptical of OpenAI. But nothing specific, no.

No Separation from Hyperexistential Risk

From Arbital:

A principle of AI alignment that does not seem reducible to other principles is “The AGI design should be widely separated in the design space from any design that would constitute a hyperexistential risk”. A hyperexistential risk is a “fate worse than death”, that is, any AGI whose outcome is worse than quickly killing everyone and filling the universe with paperclips.

I agree that this is a desirable quality for any design or approach to creating a design to have. However, I think it’s impossible to do so while creating the possibility for an ‘existential win’, i.e. a good event roughly as good as a hyperexistential risk is bad. In order to create the possibility of a Very Good Outcome, your AGI must understand what humans value in some detail. The author of this page* provides specifics, which they think will move us further away from Very Bad Outcomes, but I don’t agree.

This consideration weighing against general value learning of true human values might not apply to e.g. a Task AGI that was learning inductively from human-labeled examples, if the labeling humans were not trying to identify or distinguish within “dead or worse” and just assigned all such cases the same “bad” label. There are still subtleties to worry about in a case like that[…] But even on the first step of “use the same label for death and worse-than-death as events to be avoided, likewise all varieties of bad fates better than death as a type of consequence to notice and describe to human operators”, it seems like we would have moved substantially further away in the design space from hyperexistential catastrophe.

I find it hard to picture a method of learning what humans value that does not produce information about what they disvalue in equal supply, and this is no exception. Value is for the most part a relative measure rather than an absolute; to determine whether I value eating a cheeseburger it is necessary to compare the state of eating-a-cheeseburger to the state of not-eating-a-cheeseburger, to assess whether I value not-being-in-pain you must compare it to being-in-pain, to determine whether I value existence you must compare it to nonexistence. To the extent we are not labeling the distinction between fates worse than death and death, the learner is failing to understand what we value. And an intelligent sign-flipped learner, if we gave it many fine-grained labels for “things we prefer to death by X much”, would at minimum have the data needed to cause a (weakly-hyper)-existential catastrophe; a world in which we did not die but did not ever have any of the things we rated as better than death. Unless we have some means of preventing the learner from making such inferences or storing the information (so, call the SCP Foundation Antimemetics Division?), this suggestion would not help except against a very stupid agent.

Of course, maybe that’s the point. It seems obvious to me that a very stupid agent does not pose a hyperexistential risk because it can’t build up a model detailed enough to do more than existential harm, but “obvious” is a word to mistrust. Could I make the leap and infer the reversal property? I believe I could. Could one of the senders of That Alien Message, who are unusually stupid for humans but have all the knowledge of their ancestors from birth? I’m fairly confident they could, but not certain. Could one of them cause us hyperexistential harm? Yes, on that I am certain. That adds up to a fairly small, but nonempty, segment of probability space where this would be useful.

But does that add up to the approach being worthwhile?

* Presumably this is Eliezer Yudkowsky , since I don’t believe anyone else wrote anything on Arbital after its “official shutdown”, which was well before this page was created. But I’m not certain.

@docstrings: You have no class.

If you have written any Python code in a shared project recently, you have probably seen a documentation convention like this:

def complex(real=0.0, imag=0.0):
  """Form a complex number.

  @param real: The real part (default 0.0)
  @param imag: The imaginary part (default 0.0)

  @returns: ComplexNumber object.
  """
  if imag == 0.0 and real == 0.0: return complex_zero
  ...

This is a good and useful convention for explaining things to future users of the code, if a little verbose. However, you are more likely to have seen class-based code, and there it is not used very well at all. For example:

class CompetitionBasket(FruitBasket):
  """Fruit basket that is entered into a scored competition.

  @param fruits: A dict of fruit names and quantities
  @param scores: A dict of fruit names and scores-per-fruit
  """

  def __init__(self, fruits, scores):
    self.scores = scores
    super(CompetitionBasket, self).__init__(fruits)

    ...

  def score(self, relevant_fruits=[]):
  """Return the score of the basket according to the current rules.

  @param relevant_fruits: An array of fruit names corresponding to
  the fruits which are currently under consideration. Defaults to an
  empty list and scores all fruit.

  @returns: Integer score of the basket.
  """
  ...

On first glance this looks like the docstring for score follows the same principles. But in actuality this is missing important information, which in a larger class in a complex system would be critical. Both self.fruits and self.scores are critically necessary to the functioning of this method, but neither of them are mentioned. There are advantages to this approach: it is fairly easy to programmatically verify presence of non-empty docstrings for all params and return values a function possesses, and significantly harder to verify presence of docstrings for all non-trivial instance attributes used in a method or all values mutated by side-effects. There are significantly more judgement calls involved in assessing which values need a docstring and which don’t, and it’s plausible that setting the bar for “docstring required” to include these would result in that requirement being more commonly flouted for other methods.

But to consider this and stop is an instance of Goodhart’s Law. It is an argument against mandating them, not an argument against including them wherever possible. For all the reasons we want docstrings (clarity of purpose, maintainability, etc.) we should, wherever possible, include these in the docstring. In some cases, this could result in a docstring 20 lines long; which is clearly a problem. However, in those cases I propose that the main problem is that there is one method which implicitly takes more than a dozen arguments; the object-oriented design has concealed the fact that it is an unwieldy, unmaintainable method and forcing this docstring convention on it brings that fact back into the open.

I would suggest this naming convention:

class Fnord(object):
...
    def methodName(self, foos):
    """Frobozz the foos according to the Fnord's bazzes.
    @param foos: a list containing Foo instances to frobozz
    @instance_param bazzes: Baz instances containing rules for frobozzing
        for this Fnord
    @class_param quux: Number of times Fnords frobozz each foo
    """
    ...

 

Against the Virtue of Ash

At the Bay Area Winter Solstice this year, one of the major themes was something Cody Wild called “The Virtue of Ash”. (Text of Solstice 2018 can be found here.) The virtue of ash is, assuming I understand it right, the quality of enduring catastrophe and seeing your life in ruins around you and rebuilding anyway.

A friend of mine, who only recently moved to the area, was in attendance. When I asked him what he thought of Solstice, one of the first things he shared was “Holy scrupulosity triggers, Batman!”

I think these two facts are related.

On discussing it with some other friends, scrupulosity is not quite the right word. But I do believe the virtue of ash is harmful to promote, for the scrupulosity-like reasons that inspired my friend’s impression. And I feel fairly confident it is useless to cultivate.

First off, let’s set aside whether it’s useful to cultivate for the moment and consider whether it’s good to promote. Promoting the cultivation of the virtue of ash is exhorting people to consider what they’d do on the worst day of their lives, imagining the worst that could happen and trying to bend their mind to be someone who could handle that and keep going. It conveys a message that doing the best you can to make the future be bright is not enough; you should also be preparing for much darker futures where all your current plans lie in ruins, and to make those brighter. This hits at the anxious by raising the implicit standard for “doing enough” even higher. It also hits at the depressed by explicitly encouraging making highly depressing, dark outcomes mentally available.  Since the community being given this advice is already prone to anxiety, depression, and scrupulosity, this is a Bad Thing and should not be done without a clear reason to think the benefits are large. This year’s organizers may think those benefits are large, but if so I disagree.

Continue reading

I Have Seen the Tops of Clouds [Adapted]

I gave this speech in 2016 at the Bay Winter Solstice, as one of the speeches of darkness. The original is by Quinn Norton; this was revised to be shorter, to sound less like I had personally delivered all the revelations and experienced the anecdotes, and to focus less on concerns Ms. Norton has which I do not share. I was and remain proud of this edited version, and decided to make it publicly readable. If you intend to read as a speech to an audience, I suggest using my adaptation. For any other purpose, use hers. I believe only Ms. Norton herself has the right to use the work or any derivative for any commercial purpose, so do not do so or ask me for permission to do so.

I wake up in the middle of the night sometimes. I peer around my room scared and stressed, like all the things I can contain during the day break loose in dreams I can’t remember, the echoes of all these forgotten nightmares roaming around my body. Sometimes I want to cry, or curl up, or scream. I stare into the corners of my room. I try to fall back to sleep, even though I don’t want to go back to whatever sent me here.

It’s not a coincidence that I’m told I’m depressing. I think about depressing things.
I try to face the worst things about humanity and our situation. It started with how the oceans are dying, but since then moved on to genocide, imprisonment, the history of labor exploitation, computer security and mass surveillance, racism, and technological apocalypse.
I’m fun at parties.

It may be that our ticket was punched before we ever got started. While we’re cutting our time on earth shorter, it might be that our species was never going to make it past the end of the womb of our ice-age birth.
I explained this to a friend, about how fragile an organism we are, and how the ice ages cycle. She laughed. She was used to this strange form of hope.

“You have to choose hope, or just jump out a window,” someone said, a person who’d been accused of techno-utopianism. They were walking along the California coast at sunset, talking about all the ways our technological lives could go wrong, and the many ways they are going wrong.
They weren’t utopian, it turned out. They’d thought of the worst long before their detractors had. They’d decided to try to head it off, instead of jumping out a window.

We are diseased and angry and we kill each other and ourselves and all the world. I try to look at this, and my own part in it. Sometimes it’s overwhelming. I feel so powerless trying to comprehend all the terrible things we face, much less get past them into the future with our humanity and our inconceivably beautiful little blue-green planet preserved.

Looking at the ways we break the world, think of Tolstoy’s admonition that if we cannot give up the ills of our lives, then we should declare them, face them, put them on our flags. We can tell the world about the edge of our strength, ability, and virtue. We can share the failure honestly. This is good, and this helps, but it doesn’t bring back the vanished creatures and dying earth, and it doesn’t stop the relentless human cruelty.

There are nights full of invective and hate and days I can only see the flaws in our world, and feel my own flaws and my own fear from within.
And there is so much fear.
The land will drown. The seas could turn acid and burn us from above while starving us from within. At any moment we could still be consumed by nuclear fire, an accidental holdover from the Cold War we’ve failed to wrap up, like a binge drinker or a gambling addict who gets sober, but can’t face the past, and lets it fester.

All these grown-up monsters for my grown-up mind, they are there in the nights I wake up terrified and taunted by death. When I feel so small and broken, when despair and terror take me, I have a secret tool, a talisman against the night. I don’t use it too often so that it doesn’t lose its power. I learned it on airplanes, which are strange and thrilling and full of fear and boredom and discomfort. When I am very frightened, I look out the window and say, very quietly:

I have seen the tops of clouds

And I have.

In all the history of humanity, I am one of the few that has seen the tops of clouds. Many would have died to do so, and some did.
I have seen them many times.
I have seen the Earth from space, and spun it around like a god to see what’s on the other side. We are the only consciousness we’ve ever found that has looked deep into the infinite dark, and instead of dark, we saw galaxies. Suns and worlds without number. We have looked into our world and found atoms, atomic forces, systems that dance to the glorious music of the universe.
We have seen actual wonders that verge on the ineffable. We have coined a word for the ineffable. We have coined thousands of words for the ineffable. In our pain we find a kind of magic, in our worst and meanest specimens we find the flesh of a common human story. We are red with it.

I know mysteries that great philosophers would have died for, just to have them whispered in their dying ears. I can look them up on my smartphone.
I live in the middle of miracles, conceptions and magics easily worth many lifetimes to learn, from which I can pick and choose. I have wisdom and knowledge poured around me like a river, more than I could learn in a thousand lifetimes, and I am still alive.

It is good that I am alive, it is good that we are alive. Even if we kill ourselves off with nuclear fire, or gray goo, or drown ourselves in stinking acid oceans, it is good that we have lived, that we did all of this, and that we grew into what we are, and learned to dream of what we could be.
Perhaps we will soon die, but we will die having gone so far above our primordial ponds and primate forests that we saw the tops of clouds.

It is good that in the body of this weak and tender African animal a piece of the universe has gazed upon itself, that this tiny appendage of existence looked on everything its eyes and tools could drink in and experienced the most pure of wonder, the most terrible of awe. It is worth it, all of it, to even for a moment be the universe gazing upon itself. We reached so far above our biological fate that we spoke love to life, all life, and to its dark universal womb.

That takes away the fear for me. Not all of it, but enough so that I can hug my partner and fall asleep, to dream dreams of what we’ll do next, of how we’ll live this hope.

I can get past the horrible things we face. I can acknowledge the boring and unpleasant truths along the way. I can take up Tolstoy’s charge, and dream of a healing world where my descendants and their descendants will see wonders that I cannot now conceive.

We have seen the tops of clouds.

Social Modeling Recursion (Excerpt)

This is quoted from an explanation by Lahwran (blog), part of a larger post on LessWrong, and sourced from an original claim and example by Andrew Critch. To my knowledge, Critch has never posted it online. I found myself wanting to reference this divorced from the remainder of the post, so I have reproduced it here. None of these words are my own. (If in the distant future this is preserved while the originals are not, then my apologies, I feel the same way about several ancient Greek philosophers, but at least I’ve cleared up that I haven’t edited it.)

A RECURSION EXAMPLE

I found the claim that humans regularly social-model 5+ levels deep hard to believe at first, but Critch had an example to back it up, which I attempt to recreate here.

Fair warning, it’s a somewhat complicated example to follow, unless you imagine yourself actually there. I only share it for the purpose of arguing that this sort of thing actually can happen; if you can’t follow it, then it’s possible the point stands without it. I had to invent notation in order to make sure I got the example right, and I’m still not sure I did.

(I’m sorry this is sort of contrived. Making these examples fully natural is really really hard.)

  • You’re back in your teens, and friends with Kris and Gary. You hang out frequently and have a lot of goofy inside jokes and banter.
  • Tonight, Gary’s mom has invited you and Kris over for dinner.
  • You get to Gary’s house several hours early, but he’s still working on homework. You go upstairs and borrow his bed for a nap.
  • Later, you’re awoken by the activity as Kris arrives, and Gary’s mom shouts a greeting from the other room: “Hey, Kris! Your hair smells bad.”. Kris responds with “Yours as well.” This goes back and forth, with Gary, Kris, and Gary’s mom fluidly exchanging insults as they chat. You’re surprised – you didn’t know Kris knew Gary’s mom.
  • Later, you go downstairs to say hi. Gary’s mom says “welcome to the land of the living!” and invites you all to sit and eat.
  • Partway through eating, Kris says “Gary, you look like a slob.”
  • You feel embarrassed in front of Gary’s mom, and say “Kris, don’t be an ass.”
  • You knew they had been bantering happily earlier. If you hadn’t had an audience, you’d have just chuckled and joined in. What happened here?

If you’d like, pause for a moment and see if you can figure it out.


You, Gary, and Kris all feel comfortable bantering around each other. Clearly, Gary and Kris feel comfortable around Gary’s mom, as well. But the reason you were uncomfortable is that you know Gary’s mom thought you were asleep when Kris got there, and you hadn’t known they were cool before, so as far as Gary’s mom knows, you think she thinks kris is just being an ass. So you respond to that.

Let me try saying that again. Here’s some notation for describing it:

  • X => Y: X correctly believes Y
  • X ~> Y: X incorrectly believes Y
  • X ?? Y: X does not know Y
  • X=Y=Z=...: X and Y and Z and … are comfortable bantering

And here’s an explanation in that notation:

  • Kris=You=Gary: Kris, You, and Gary are comfortable bantering.
  • Gary=Kris=Gary's mom: Gary, Kris, and Gary’s mom are comfortable bantering.
  • You => [gary=Gary's mom=kris]: You know they’re comfortable bantering.
  • Gary's mom ~> [You ?? [gary=Gary's mom=kris]]: Gary’s mom doesn’t know you know.
  • You => [Gary's mom ~> [You ?? [gary=Gary's mom=kris]]]: You know Gary’s mom doesn’t know you know they’re comfortable bantering.

And to you in the moment, this crazy recursion just feels like a bit of anxiety, fuzzyness, and an urge to call Kris out so Gary’s mom doesn’t think you’re ok with Kris being rude.

Now, this is a somewhat unusual example. It has to be set up just right in order to get such a deep recursion. The main character’s reaction is sort of unhealthy/fake – better would have been to clarify that you overheard them bantering earlier. As far as I can tell, the primary case where things get this hairy is when there’s uncertainty. But it does actually get this deep – this is a situation pretty similar to ones I’ve found myself in before.

There’s a key thing here: when things like this happen, you react nearly immediately. You don’t need to sit and ponder, you just immediately feel embarrassed for Kris, and react right away. Even though in order to figure out explicitly what you were worried about, you would have had to think about it four levels deep.

If you ask people about this, and it takes deep recursion to figure out what’s going on, I expect you will generally get confused non-answers, such as “I just had a feeling”. I also expect that when people give confused non-answers, it is almost always because of weird recursion things happening.

In Critch’s original lightning talk, he gave this as an argument that the human social skills module is the one that just automatically gets this right. I agree with that, but I want to add: I think that that module is the same one that evaluates people for trust and tracks their needs and generally deals with imagining other people.