Moral Precepts and Suicide Pacts

You may recently have seen this blog post floating around, brought to prominence by some neo-Nazis getting more open than recently normal. Tolerance is not a Moral Precept, which says:

Tolerance is not a moral absolute; it is a peace treaty. Tolerance is a social norm because it allows different people to live side-by-side without being at each other’s throats. It means that we accept that people may be different from us, in their customs, in their behavior, in their dress, in their sex lives, and that if this doesn’t directly affect our lives, it is none of our business.

And this far, I basically agree. This is why tolerance is important, and why it is necessary. But then the author goes on to say: “It is an agreement to live in peace, not an agreement to be peaceful no matter the conduct of others. A peace treaty is not a suicide pact.” And here I strongly disagree. For example, we have no peace treaty with the state of North Korea. This does not mean that we are legally or morally licensed to resume hostilities from the fifty-year ceasefire whenever we like. We could have, many times, destroyed the country; it would have been advantageous and simple in the past, though since they’ve recently acquired a credible nuclear threat to parts of the mainland US, it might not be now.

If we consider this in the frame given here, that a peace treaty is only a promise to be peaceful to those who are peaceful unto you, then we have committed a wrong in not having long since crippled the Kim dynasty and its military. It would be better for us, for our allies, for world stability, and for everyone except the at-the-time citizens of North Korea. But we did not, and that was correct.

Peace treaties are good, but they are not enough. They are one means of credibly committing to live together in peace, but, like Bismarck’s Prussia as it grew into Germany, the mere existence of a peace treaty is not a great reassurance if you cannot otherwise trust the other signatories. A person or country who is champing at the bit for war, but under a treaty to stay at peace, is less trusted to remain peaceful than a country with no treaties but also no prominent militarist factions. To be trusted to remain peaceful, you must be the kind of person who remains peaceful.

And to be a peaceful person and earn the trust placed in you, you must be peaceful even when you have every right to fight. If you aren’t, and take every opportunity to pick a fight when a half-decent excuse is available, your peace treaty is almost worthless, since anyone can see that you will break the spirit as soon as you can find a loophole in the letter.

It’s the same with tolerance. If you want to shut up your argumentative opponents and vigorously retaliate when your opponents show signs of intolerance, you will not be trusted to be tolerant to others who are tolerant, even those who basically agree with you. If you instead treat intolerance tolerantly, other tolerant people will know you are a trustworthy ally to influence your community to remain tolerant and only retaliate in any way when collective action and collective decisions deem it appropriate.

It is what you do in the hardest, most tempting cases – dealing with explicit Nazi rhetoric, eyeing the underdefended soft underbelly of France with your newly-unifying Germany – that draw the contours of your policy of trustworthy peace. Had newly-formed Germany not been so rapacious when it had the opportunity, England might not have felt the need to protect Belgium against the advance of the German army through it on to France, and the First World War could have gone much better for the Central Powers. Similarly, how tolerant you are of hateful fringe religious groups – i.e. the Westboro Baptist Church – determines how well you can be trusted to be tolerant of minority religions which aren’t outright intolerant but have strongly-held convictions you find enraging. That’s critical if you want religious help with a cause you share an interest in.

Another, similar concept is “The Constitution is not a suicide pact.” Starting from Thomas Jefferson and passing through Lincoln and a number of Supreme Court justices, this is the idea that the protections of the Constitution, while strong, can be suspended in extremis, replaced with undeclared martial law and states of emergency when a threat is great enough. As you might guess, I don’t like this idea. I will give its advocates this: the law is not a suicide pact.
Many laws on the books are important to stick to normally, but if following the letter would lead to opening yourself up to attack, then yes, it is legitimate to suspend the law until the danger has passed or the law has been amended to be safer. To give an example: many states have an “Open Meeting Law”, which says that any group of three or more elected officials must announce the time and place of their meeting in advance, and publish all proceedings afterward. If the Weather Underground has been attacking government officials in your area, it would be dangerous to comply with the law for meetings where you discuss how to deal with the problem and formulate a plan to catch them. Since the alternative is for the government to grind to a halt, unable to do business due to deadly threats, it is clearly the best thing to do. Circumstances have demanded the law bend, and it is correct for us to make them do so.

But that doesn’t apply to the Constitution. Unlike the informal British constitution, ours is very explicit. It sets the “guardrails of democracy”. It is, explicitly, the places where the law cannot and must not bend to fit circumstance. Someone may be publicly advocating sabotage-by-inaction to make it more difficult to prosecute an active war, which does pose a danger, but their right to free speech protects them anyway. People may meet to discuss whether the police are their enemies (perhaps for racial reasons) and what to do to resist law enforcement if so: this damages the rule of law and is obviously a conspiracy to make illegality easier, but the right to free assembly protects them anyway.

This is important, because being able to rely on those protections even if you are doing something blatantly hostile to the government’s interests is a strong demonstration that the protections are not fickle. Make rare exceptions, and what speakers and groups won’t wonder whether they’re the next exception? Maybe you trust the government today at the time, and think “This is a clear case; Al Qaeda are violent enemies of the state and its people, and advocating for them should not be protected.” But having made an exception once, and established a procedure for doing so, what else might use that procedure? Will Black Lives Matter be listed as a terrorist group? Environmentalist protestors trying to disrupt the power plants that, as the Saudi Arabia of coal, are the USA’s most secure power supply? In the past, this would all be an academic thought experiment, but today we live in the America At the End of All Hypotheticals. Give away the power to cross the guardrails once in a specific spot, and someone you trust not at all less may get that power and use it to attack someone you like and agree with, who can’t fight back effectively.

The unifying principle behind these is not easy to articulate. Pieces of it are: Ben Franklin’s self-paraphrase, “They who can give up essential Liberty to obtain a little temporary Safety, deserve neither Liberty nor Safety”, is an important piece. The Kantian notion of universalizable rules is another. (If that connection seems a little tenuous, I suggest reading Ozymandias on The Enemy Control Ray.) But the unifying principle, to me, is Thomas Schelling’s thoughts on bargaining, applied to political and societal norms.

Summarized, those are this: A rule never to do X, with no exceptions, is more stable and more credible than ‘never do X unless it’s also Y because Y is beyond the moral pale’. It’s easy to say “never restrict speech”, and it’s easy to check if someone else is holding to that rule. If your rule is “never restrict speech unless it’s hateful”, there will always be room for argument about what counts as hateful. If 100% of people agree about hatefulness 100% of the time, then it’s still stable. But that’s never true; almost no one is the villain of their own story, and almost no one ever feels that their beliefs and opinions are disproportionate. They may admit that they dislike their enemy a little more than is justified, but they will insist that they didn’t start it, that they are only reacting. So if you say that their speech is hate speech, they will disagree, as will many others like them. From their perspective, you are banning perfectly reasonable, justified speech that disagrees with you politically, so even if they respect the non-escalation principle, from their perspective the rule you are following is “Restrict speech if it’s loud and expresses inconvenient truths or disagrees with me politically.”, and that’s the rule they’ll follow if they get power.

So: tolerate everyone, even the intolerant. “An it harm none, do what ye will”, and only when someone has done harm, in a way everyone can recognize, is it permissible to punish them for deviation. Live and let live, even those you hate and who hate you. Their right to swing their fist ends where your nose begins, and so does their right to free speech, free assembly, free worship. Up until you’re defending your elegant schnozz from blunt force trauma, you are bound, by a suicide pact and a moral precept, not to retaliate.

Individualism and Holidays

It’s pretty unlikely that anyone’s following my blog and not the much superior blog of Sarah Constantin. But if you are, take a read of her most recent post, In Defense of Individualist Culture.

I care very much about protecting individualism, and I think this clarified a couple things for me. One, this is at root of much of my objection to ritual and other holidays; they are not religious, but they are still usually anti-individualist in form if not in content, and I think that’s where most (70%?) of the danger of religion actually lies. Two, why I have a strong, visceral “Enemy!” response to David Chapman’s Meaningness and other pieces of postrationalist thought that descend from it.

While I’m on the holiday topic: I have let this project lapse quite a lot, and have moved to the South Bay where it’s harder to work on. I do have ideas for an August holiday and some next steps (look for tall buildings with rentable open-air access high up), though, so I’m going to give it a shot. (I’m also on the list of potential Winter Solstice organizers, but that’s not my call.)

How I Use Beeminder

I am bad at using productivity systems. I know this because I’ve tried a bunch of them, and they almost all last somewhere between a week and four months before I drop them entirely. I’ve tried Habitica, Complice, a simple daily “what can I do tomorrow” in Google Keep, a written journal… All of them work for a little while, but only that.

Beeminder has stuck. I now have several intermittent goals set up in it that I’ve regularly accomplishing. This is how I use it.

Beeminder is a goal-tracking app. You set a target (“at least X <entries> per week” or special settings for weight loss/gain. It’s more flexible if you pay for a subscription.) and a starting fine collected (by default $5). Then you enter data; it tracks your overall progress, and if you slip below the rate you set, it bills you for the fine and then raises it.

When I started it, I was in a slump, and used it for two things: getting out job applications and remembering to take care of basic hygiene. It sends reminders at an increasing rate if you forget, so it helped a lot with remembering to take showers before it got late enough that I’d wake up my house to do it, and to brush my teeth regularly. And since I was contractually obligated to try hard to find a job, having finished App Academy not long before, a regular reminder that also helped me track when and where was very useful. These were all very frequent goals; my minimum was two applications per day, brushing my teeth twice a day, and showering at least 3x/week. This was pretty good at keeping me on track, but didn’t ever use much less willpower than it did at first.

Currently, I use it somewhat differently. I still have the brushing-my-teeth goal, but the only time it’s been at risk was a period where I broke my brush and didn’t get a new one for several days. It’s now the only daily or near-daily goal I have; its function is mainly to keep me looking at Beeminder regularly. As I vaguely remember from a certain game designer repeating it many, many times, structured daily activities are key to building a routine. I seem to be less susceptible to routine than most people, but it still helps.

With the regular check-in goal in place, I can hang longer-term goals on it. Right now, that’s getting back into playing board games regularly and continuing my quest to learn more recipes. Both of these are things that I am happier and better-motivated when I do, but forget to do from day to day. In writing this post, I also decided to add a habit of clearing out my Anki decks more regularly, since I’ve gotten out of the habit of using those.

This way isn’t the only way, but it’s an effective one, and distinctly different from how the Beehivers themselves do. So if their ways sound alien but this seems appealing, consider giving it a shot.

Short Thought: Testable Predictions are Useful Noise

A housemate of mine thinks that theories making testable predictions is unimportant, relative to how simple they are and what not-currently-testable predictions they make.

There’s some merit to this. There are testable theories that are bad/useless (luminiferous aether), and good/useful theories that aren’t really testable (the many-worlds interpretation of quantum physics). Goodness and testableness aren’t uncorrelated, but by rejecting untestable theories out of hand you are going to be excluding some useful and possibly even correct theories. If you have a compelling reason to use a theory and it matches well with past observations, your understanding may be better if you adopt it rather than set it aside to look for a testable one.

But there is a reason to keep the testable-prediction criterion anyway: it keeps you out of local optima. By the nature of untestability, a theory that does not make testable predictions, no matter how good, will never naturally improve. You may switch, if another theory looks even more compelling, but you will get no signal telling you that your current theory is not good enough.

By contrasts, even a weak theory with testable predictions is unstable. It provides means by which it can be shown wrong, with your search pushed out of the stable divot of “this theory works well” and back to searching. If your tests are useful, they will push you along a gradient toward a better area of theory-space to look in, but at the least you will know you need to be looking.

The upshot is this: even if you have a theory that looks very good, in the long run it is probably better to operate with a theory that looks less good but has testable predictions. The good but stable theory will probably outlive its welcome, while the testable but weak theory will tell you to move on when your data and new experiences pass it. Like a machine learner adding random noise to avoid being stuck, testable predictions are signals that ensure you will explore the possibilities.

Benignness is Bottomless

If you are not interested in AI Safety, this may bore you. If you consider your sense of mental self fragile, this may damage it. This is basically a callout post of Paul Christiano for being ‘not paranoid enough’. Warnings end.

I find ALBA and Benign Model-Free AI hopelessly optimistic. My objection has several parts, but the crux starts very early in the description:

Given a benign agent H, reward learning allows us to construct a reward function r that can be used to train a weaker benign agent A. If our training process is robust, the resulting agent A will remain benign off of the training distribution (though it may be incompetent off of the training distribution).

Specifically, I claim that no agent H yet exists, and furthermore that if you had an agent H you would already have solved most of value alignment. This is fairly bold, but at least the first clause I am quite confident in.

Obviously the H is intended to stand for Human, and smuggles in the assumption that an (educated, intelligent, careful) human is benign. I can demonstrate this to be false via thought experiment.

Experiment 1: Take a human (Sam). Make a perfect uploaded copy (Sim). Run Sim very fast for a very long time in isolation, working on some problem.

Sim will undergo value drift. Some kinds of value drift are self-reinforcing, so Sim could drift arbitrarily far within the bounds of what a human mind could in theory value. Given that Sim is run long enough, pseudorandom value drift will eventually hit one of these patches and drift to an arbitrary direction an arbitrarily large distance.
It seems obvious from this example that Sim is eventually malign.

Experiment 2: Make another perfect copy of Sam (Som), and hold it “asleep”, unchanging and ready to be copied further without changes. Then repeat this process indefinitely: Make a copy of Som (Sem) and give him short written instructions, written by Sam or anyone else, and run Sem for one hour. By the end of the hour, have some set of instructions and state written in the same format. Shut off Sem at the end of the hour and take the written instructions to pass to the next instance, which will be copied off the original Som. (If there is a problem and a Sem does not create an instruction set, start from the beginning with the original instructions; deterministic loops are a potential problem but unimportant for purposes of this argument.)

Again, this can result in significant drift. Assume for a moment that this process could produce arbitrary plain text input to be read by a new Sem. Among the space of plain text inputs could exist a tailored, utterly convincing argument why the one true good in the universe is the construction of paperclips; one which exploits human fallibility, the fallibilities of Sam in particular, biases likely to be present in Som because he is a stored copy, and biases likely to be peculiar to a short-lived Sem that knows it will be shut down within one hour subjective. This could cause significant value drift even in short timeboxes, and once it began could be self-reinforcing just as easily as the problems with Sim.
Getting to the “golden master key” argument for any position, starting from a sane and normal starting point, is obviously quite hard. Not impossible, though, and while the difficulty of hitting any one master key argument is high, there is a very large set of potential “locks”, any of which has the same problem. If we ran Sem loops for an arbitrary amount of time, Sem will eventually fall into a lock and become malign.

Experiment 3: Instead of just Sam, use a number of people, put in groups and recombining regularly from different parts of a massively parallel system of simulations. Like Sem, it is using entirely plain-text I/O and is timeboxed to one hour per session. Call the Som-instance in one of these groups Sum, who works with Diffy, Prada, Facton, and so on.

Now rather than drifting to a lock which is a value-distorting plain text input for a Sem, we need one for the entire group, which must be able to propagate to one via reading and enough of the rest via persuasion. This is clearly a harder problem, but there is also more attack surface; only one of the participants in the group, perhaps the most charismatic, needs to propagate the self-reinforcing state. It can also drift faster, once motivated, with more brainpower that can be directed toward it. On balance, it seems likely to be safer for much longer, but how much? Exponentially? Quadratically?

What I am conveying here is that we are patching holes in the basic framework, and the downside risks are playing the game of Nearest Unblocked Strategy. Relying on a human is not benign; humans seem to be benign only because they are, in the environment we intuitively evaluate them in, confined to a very normal set of possible input states and stimuli. An agent which is benign only as long as it is never exposed to an edge case is malign, and examples like these convince me thoroughly that a human subjected to extreme circumstances is malign in the same sense that the universal prior is malign.

This, then, is my point: we have no examples of benign agents, we do not have enough diversity of environments to observe agents in to realistically conclude that an agent is benign, and so we have nowhere a hierarchy of benign-ness can bottom out. The first benign agent will be a Friendly AI – not necessarily particularly capable – and any approach predicated on enhancing a benign agent to higher capability to generate an FAI is in some sense affirming the consequent.

Holidaying: An Update

As described in Points Deepen Valence, I’ve been contemplating and experimenting with holiday design. Here’s how it’s going:

I ran a Day of Warmth at a friend’s apartment (on the weekend after Valentine’s Day), and it went fairly well.
Good points: a ritualistic quasi-silence was very powerful, and could probably go longer. The simple notion of it being a holiday, rather than a party, does something to intensify the experience. Physical closeness and sharing the taste and smell of food were, as hoped, good emotional anchors. Instinctual reactions about what will be well-received, based on initial gut impression, seem to be pretty accurate.
Bad points: a loosely planned event is not immune, or even resistant, to the old adage that no plan survives contact with the enemy (or in this case audience and participants). I tried to have a small handful of anchors and improvise within them, since the event was small, but without planning problems came up faster and more wide-ranging than I expected. The anchors went off alright, but not as planned; everything between them required more constant thought than desired. Breaking bread, without clear parameters on the bread, did not work well physically. And the close-knit atmosphere of comfort desired was not actually compatible with the intended purpose of deepening shallow friendships.
(A longer-form postmortem is here.)

My initial idea for the Vernal Equinox was a mental spring cleaning, Tarski Day. I haven’t been able to find buy-in to help me get it together, and this month’s weekends are actually very crowded already, so I won’t be doing that. Instead, I’ve been researching other ritual and holiday designs to crib off, and looking for events to observe. One group I’ve been looking at is the Atheopagans, who use the “traditional” pagan framework of the wheel of the year without any spiritual beliefs underlying it. I don’t empathize much with the ‘respect for the earth’ thing, personally, but cribbing off their notes (and how that blogger, specifically, modified holidays for the California climate) is valuable data. He also wrote this document on designing rituals, including some points I agree with and can take advice to include, and some I dislike and consider to carry the downsides of religious practice, to avoid.

There are also the connected “Humanistic Pagans”, and a description of the physical significance of the eight point year (Solstices, Equinoxes, Thermstices and Equitherms) here. It also includes some consequences of the interlocking light/dark and hot/cold cycles for what activities and celebrations are seasonally appropriate, which is food for thought.

I’m not sure where I’m going from here. After the Spring Equinox comes the Spring Equitherm, aka Beltane, which in many traditions and by the plenty/optimism vs. scarcity/pessimism axis, to be naturally a hedonistic holiday. I am not a hedonist by nature, so while I’m sure I could find friends who would be happy to have a ritualistic orgy and/or general bacchanalia, I’m not sure I’d want to attend, which somewhat defeats the personal purpose of learning holiday design. But I don’t want to leave a four-month gap in my feedback loop between now and the Summer Solstice. I suppose I’ll keep you posted.

Daemon Speedup

A short thought about the applicability of Jessica Taylor’s reasoning in Are daemons a problem for ideal agents?, peering at the differences between the realistic reasoning for why it seems intuitive that this should be a problem, and the formalization where it isn’t.

Consider the following hypothetical:

Agent A wants to design a rocket to go to a Neptune. can either think about rockets at the object level, or simulate some alien civilization (which may be treated as an agent B) and then ask B how to design a rocket. Under some circumstances (e.g. designing a successful rocket is convergent instrumental goal for someone in A’s position), B will be incentivized to give A the design of a rocket that actually goes to Neptune. Of course, the rocket design might be a “treacherous” one that subtly pursues B’s values more than A’s original values (e.g. because the design of the rocket includes robots in the rocket programmed to pursue B’s values).

It’s plausible that A could think that B is better at designing rockets than A is, such that asking B to design a rocket is more likely to yield a successful rocket than A just thinking about rockets at the object level. (Something similar to this seems to be going with humans in the place of A: if humans had access to sufficiently powerful computers, then the easiest way to pursue convergent instrumental goals such as solving math problems might be through unaligned AGI). But there’s something weird about the epistemic state A is in: why is A able to design more-successful rockets by thinking about B thinking about rockets, rather than just by thinking about rockets at the object level?

In the realistic situation where we might put ourselves in A‘s position, we expect that B has access to much more computing resources per unit time than we do. For an ideal but bounded agent who has access to an ideal-reasoning but unaligned subagent oracle, this could still be a plausible situation, unless the questions being asked have a solution-verification algorithm that’s much faster than the solution-finding algorithm.

For the simulation idea, though, it’s harder to see how this might work. An ideal reasoner has self-trust, and would trust themselves to come back from a short-term 50x speedup with better knowledge and the same utility function. But say we know of an algorithm which uses the same resources as our own mind and operates significantly faster, but in an opaque way. (This is a problem we currently have with machine learning, and is on CHCHAI’s priorities list.) This does not appear to have an explicit term for treachery in it’s utility function, but because it is opaque we cannot rule out it placing an implicit or emergent value on treachery. Thus, a conservative agent A may model B as explicitly hostile.

I’d like to put that in terms of the formalism introduced in the linked post, which I’ll reproduce here:

If r is a rocket design (as a string), let s(r)[0,1]s(r)∈[0,1] indicate how successful the rocket r is at getting to Neptune. Additionally, let t(r)[0,1]t(r)∈[0,1] indicate how treacherous the rocket r is. A’s utility will be s(r)0.1t(r)s(r)−0.1⋅t(r) (i.e. A cares more about getting a successful rocket at all then getting a non-treacherous rocket). B‘s utility will be s(r)+0.1t(r)s(r)+0.1⋅t(r).

Let Y and Z be different strings specified as logical expressions. A will observe Y, while B will observe Z.

B will compute some string M intended as a message to A. M is a logical fact that A observes by simulating B. After receiving this message, A will choose some rocket design r.

In my elaboration, A knows of two algorithms; a(n), which finds an optimal solution for s(r)-0.1*t(r) in O(2^n) time (using a subalgorithm y(n) to access logical facts Y=y(n)), and  b(n), which is known to approximate an optimal solution for s(r) in O(n^2) time. The approximation is not well-understood and its internals are opaque, so its behavior for other factors such as t(r) is unknown except insofar as they would limit its good performance on s(r).

The algorithm b(n) can be treated as having two parts; z(n) accesses relevant logical facts from its inputs and surroundings, and m(z) approximates a solution for s(r) in O(n^2) time. In this case, A has access to M=m(z) and Z=z(n), but cannot use Z to solve for its true utility s(r)-0.1*t(r) while retaining the speedup of b(n). This would be a case where a conservative agent would want to model the process b(n) as the agent B, in a way that did not violate the orthogonality thesis.

This is largely a restatement of the prior points which Jessica referenced in her post (from Paul Christiano), but it does suggest that this may be an inevitable problem for any agent with any significant slowdown in its reasoning to ensure value alignment. In the case of Garrabrant logical inductors, the inductor is much slower than any of the individual agents B, and so this extension does not provide any additional reason to think that it should be impossible to create a variant inductor that accounts for the internals, though doing so might increase the difficulty of creating an efficient inductor with the same principles.