# How I Use Beeminder

I am bad at using productivity systems. I know this because I’ve tried a bunch of them, and they almost all last somewhere between a week and four months before I drop them entirely. I’ve tried Habitica, Complice, a simple daily “what can I do tomorrow” in Google Keep, a written journal… All of them work for a little while, but only that.

Beeminder has stuck. I now have several intermittent goals set up in it that I’ve regularly accomplishing. This is how I use it.

Beeminder is a goal-tracking app. You set a target (“at least X <entries> per week” or special settings for weight loss/gain. It’s more flexible if you pay for a subscription.) and a starting fine collected (by default 5). Then you enter data; it tracks your overall progress, and if you slip below the rate you set, it bills you for the fine and then raises it. When I started it, I was in a slump, and used it for two things: getting out job applications and remembering to take care of basic hygiene. It sends reminders at an increasing rate if you forget, so it helped a lot with remembering to take showers before it got late enough that I’d wake up my house to do it, and to brush my teeth regularly. And since I was contractually obligated to try hard to find a job, having finished App Academy not long before, a regular reminder that also helped me track when and where was very useful. These were all very frequent goals; my minimum was two applications per day, brushing my teeth twice a day, and showering at least 3x/week. This was pretty good at keeping me on track, but didn’t ever use much less willpower than it did at first. Currently, I use it somewhat differently. I still have the brushing-my-teeth goal, but the only time it’s been at risk was a period where I broke my brush and didn’t get a new one for several days. It’s now the only daily or near-daily goal I have; its function is mainly to keep me looking at Beeminder regularly. As I vaguely remember from a certain game designer repeating it many, many times, structured daily activities are key to building a routine. I seem to be less susceptible to routine than most people, but it still helps. With the regular check-in goal in place, I can hang longer-term goals on it. Right now, that’s getting back into playing board games regularly and continuing my quest to learn more recipes. Both of these are things that I am happier and better-motivated when I do, but forget to do from day to day. In writing this post, I also decided to add a habit of clearing out my Anki decks more regularly, since I’ve gotten out of the habit of using those. This way isn’t the only way, but it’s an effective one, and distinctly different from how the Beehivers themselves do. So if their ways sound alien but this seems appealing, consider giving it a shot. # Short Thought: Testable Predictions are Useful Noise A housemate of mine thinks that theories making testable predictions is unimportant, relative to how simple they are and what not-currently-testable predictions they make. There’s some merit to this. There are testable theories that are bad/useless (luminiferous aether), and good/useful theories that aren’t really testable (the many-worlds interpretation of quantum physics). Goodness and testableness aren’t uncorrelated, but by rejecting untestable theories out of hand you are going to be excluding some useful and possibly even correct theories. If you have a compelling reason to use a theory and it matches well with past observations, your understanding may be better if you adopt it rather than set it aside to look for a testable one. But there is a reason to keep the testable-prediction criterion anyway: it keeps you out of local optima. By the nature of untestability, a theory that does not make testable predictions, no matter how good, will never naturally improve. You may switch, if another theory looks even more compelling, but you will get no signal telling you that your current theory is not good enough. By contrasts, even a weak theory with testable predictions is unstable. It provides means by which it can be shown wrong, with your search pushed out of the stable divot of “this theory works well” and back to searching. If your tests are useful, they will push you along a gradient toward a better area of theory-space to look in, but at the least you will know you need to be looking. The upshot is this: even if you have a theory that looks very good, in the long run it is probably better to operate with a theory that looks less good but has testable predictions. The good but stable theory will probably outlive its welcome, while the testable but weak theory will tell you to move on when your data and new experiences pass it. Like a machine learner adding random noise to avoid being stuck, testable predictions are signals that ensure you will explore the possibilities. # Benignness is Bottomless If you are not interested in AI Safety, this may bore you. If you consider your sense of mental self fragile, this may damage it. This is basically a callout post of Paul Christiano for being ‘not paranoid enough’. Warnings end. I find ALBA and Benign Model-Free AI hopelessly optimistic. My objection has several parts, but the crux starts very early in the description: Given a benign agent H, reward learning allows us to construct a reward function r that can be used to train a weaker benign agent A. If our training process is robust, the resulting agent A will remain benign off of the training distribution (though it may be incompetent off of the training distribution). Specifically, I claim that no agent H yet exists, and furthermore that if you had an agent H you would already have solved most of value alignment. This is fairly bold, but at least the first clause I am quite confident in. Obviously the H is intended to stand for Human, and smuggles in the assumption that an (educated, intelligent, careful) human is benign. I can demonstrate this to be false via thought experiment. Experiment 1: Take a human (Sam). Make a perfect uploaded copy (Sim). Run Sim very fast for a very long time in isolation, working on some problem. Sim will undergo value drift. Some kinds of value drift are self-reinforcing, so Sim could drift arbitrarily far within the bounds of what a human mind could in theory value. Given that Sim is run long enough, pseudorandom value drift will eventually hit one of these patches and drift to an arbitrary direction an arbitrarily large distance. It seems obvious from this example that Sim is eventually malign. Experiment 2: Make another perfect copy of Sam (Som), and hold it “asleep”, unchanging and ready to be copied further without changes. Then repeat this process indefinitely: Make a copy of Som (Sem) and give him short written instructions, written by Sam or anyone else, and run Sem for one hour. By the end of the hour, have some set of instructions and state written in the same format. Shut off Sem at the end of the hour and take the written instructions to pass to the next instance, which will be copied off the original Som. (If there is a problem and a Sem does not create an instruction set, start from the beginning with the original instructions; deterministic loops are a potential problem but unimportant for purposes of this argument.) Again, this can result in significant drift. Assume for a moment that this process could produce arbitrary plain text input to be read by a new Sem. Among the space of plain text inputs could exist a tailored, utterly convincing argument why the one true good in the universe is the construction of paperclips; one which exploits human fallibility, the fallibilities of Sam in particular, biases likely to be present in Som because he is a stored copy, and biases likely to be peculiar to a short-lived Sem that knows it will be shut down within one hour subjective. This could cause significant value drift even in short timeboxes, and once it began could be self-reinforcing just as easily as the problems with Sim. Getting to the “golden master key” argument for any position, starting from a sane and normal starting point, is obviously quite hard. Not impossible, though, and while the difficulty of hitting any one master key argument is high, there is a very large set of potential “locks”, any of which has the same problem. If we ran Sem loops for an arbitrary amount of time, Sem will eventually fall into a lock and become malign. Experiment 3: Instead of just Sam, use a number of people, put in groups and recombining regularly from different parts of a massively parallel system of simulations. Like Sem, it is using entirely plain-text I/O and is timeboxed to one hour per session. Call the Som-instance in one of these groups Sum, who works with Diffy, Prada, Facton, and so on. Now rather than drifting to a lock which is a value-distorting plain text input for a Sem, we need one for the entire group, which must be able to propagate to one via reading and enough of the rest via persuasion. This is clearly a harder problem, but there is also more attack surface; only one of the participants in the group, perhaps the most charismatic, needs to propagate the self-reinforcing state. It can also drift faster, once motivated, with more brainpower that can be directed toward it. On balance, it seems likely to be safer for much longer, but how much? Exponentially? Quadratically? What I am conveying here is that we are patching holes in the basic framework, and the downside risks are playing the game of Nearest Unblocked Strategy. Relying on a human is not benign; humans seem to be benign only because they are, in the environment we intuitively evaluate them in, confined to a very normal set of possible input states and stimuli. An agent which is benign only as long as it is never exposed to an edge case is malign, and examples like these convince me thoroughly that a human subjected to extreme circumstances is malign in the same sense that the universal prior is malign. This, then, is my point: we have no examples of benign agents, we do not have enough diversity of environments to observe agents in to realistically conclude that an agent is benign, and so we have nowhere a hierarchy of benign-ness can bottom out. The first benign agent will be a Friendly AI – not necessarily particularly capable – and any approach predicated on enhancing a benign agent to higher capability to generate an FAI is in some sense affirming the consequent. # Holidaying: An Update As described in Points Deepen Valence, I’ve been contemplating and experimenting with holiday design. Here’s how it’s going: I ran a Day of Warmth at a friend’s apartment (on the weekend after Valentine’s Day), and it went fairly well. Good points: a ritualistic quasi-silence was very powerful, and could probably go longer. The simple notion of it being a holiday, rather than a party, does something to intensify the experience. Physical closeness and sharing the taste and smell of food were, as hoped, good emotional anchors. Instinctual reactions about what will be well-received, based on initial gut impression, seem to be pretty accurate. Bad points: a loosely planned event is not immune, or even resistant, to the old adage that no plan survives contact with the enemy (or in this case audience and participants). I tried to have a small handful of anchors and improvise within them, since the event was small, but without planning problems came up faster and more wide-ranging than I expected. The anchors went off alright, but not as planned; everything between them required more constant thought than desired. Breaking bread, without clear parameters on the bread, did not work well physically. And the close-knit atmosphere of comfort desired was not actually compatible with the intended purpose of deepening shallow friendships. (A longer-form postmortem is here.) My initial idea for the Vernal Equinox was a mental spring cleaning, Tarski Day. I haven’t been able to find buy-in to help me get it together, and this month’s weekends are actually very crowded already, so I won’t be doing that. Instead, I’ve been researching other ritual and holiday designs to crib off, and looking for events to observe. One group I’ve been looking at is the Atheopagans, who use the “traditional” pagan framework of the wheel of the year without any spiritual beliefs underlying it. I don’t empathize much with the ‘respect for the earth’ thing, personally, but cribbing off their notes (and how that blogger, specifically, modified holidays for the California climate) is valuable data. He also wrote this document on designing rituals, including some points I agree with and can take advice to include, and some I dislike and consider to carry the downsides of religious practice, to avoid. There are also the connected “Humanistic Pagans”, and a description of the physical significance of the eight point year (Solstices, Equinoxes, Thermstices and Equitherms) here. It also includes some consequences of the interlocking light/dark and hot/cold cycles for what activities and celebrations are seasonally appropriate, which is food for thought. I’m not sure where I’m going from here. After the Spring Equinox comes the Spring Equitherm, aka Beltane, which in many traditions and by the plenty/optimism vs. scarcity/pessimism axis, to be naturally a hedonistic holiday. I am not a hedonist by nature, so while I’m sure I could find friends who would be happy to have a ritualistic orgy and/or general bacchanalia, I’m not sure I’d want to attend, which somewhat defeats the personal purpose of learning holiday design. But I don’t want to leave a four-month gap in my feedback loop between now and the Summer Solstice. I suppose I’ll keep you posted. # Daemon Speedup A short thought about the applicability of Jessica Taylor’s reasoning in Are daemons a problem for ideal agents?, peering at the differences between the realistic reasoning for why it seems intuitive that this should be a problem, and the formalization where it isn’t. Consider the following hypothetical: Agent A wants to design a rocket to go to a Neptune. can either think about rockets at the object level, or simulate some alien civilization (which may be treated as an agent B) and then ask B how to design a rocket. Under some circumstances (e.g. designing a successful rocket is convergent instrumental goal for someone in A’s position), B will be incentivized to give A the design of a rocket that actually goes to Neptune. Of course, the rocket design might be a “treacherous” one that subtly pursues B’s values more than A’s original values (e.g. because the design of the rocket includes robots in the rocket programmed to pursue B’s values). It’s plausible that A could think that B is better at designing rockets than A is, such that asking B to design a rocket is more likely to yield a successful rocket than A just thinking about rockets at the object level. (Something similar to this seems to be going with humans in the place of A: if humans had access to sufficiently powerful computers, then the easiest way to pursue convergent instrumental goals such as solving math problems might be through unaligned AGI). But there’s something weird about the epistemic state A is in: why is A able to design more-successful rockets by thinking about B thinking about rockets, rather than just by thinking about rockets at the object level? In the realistic situation where we might put ourselves in A‘s position, we expect that B has access to much more computing resources per unit time than we do. For an ideal but bounded agent who has access to an ideal-reasoning but unaligned subagent oracle, this could still be a plausible situation, unless the questions being asked have a solution-verification algorithm that’s much faster than the solution-finding algorithm. For the simulation idea, though, it’s harder to see how this might work. An ideal reasoner has self-trust, and would trust themselves to come back from a short-term 50x speedup with better knowledge and the same utility function. But say we know of an algorithm which uses the same resources as our own mind and operates significantly faster, but in an opaque way. (This is a problem we currently have with machine learning, and is on CHCHAI’s priorities list.) This does not appear to have an explicit term for treachery in it’s utility function, but because it is opaque we cannot rule out it placing an implicit or emergent value on treachery. Thus, a conservative agent A may model B as explicitly hostile. I’d like to put that in terms of the formalism introduced in the linked post, which I’ll reproduce here: If r is a rocket design (as a string), let s(r)[0,1]s(r)∈[0,1] indicate how successful the rocket r is at getting to Neptune. Additionally, let t(r)[0,1]t(r)∈[0,1] indicate how treacherous the rocket r is. A’s utility will be s(r)0.1t(r)s(r)−0.1⋅t(r) (i.e. A cares more about getting a successful rocket at all then getting a non-treacherous rocket). B‘s utility will be s(r)+0.1t(r)s(r)+0.1⋅t(r). Let Y and Z be different strings specified as logical expressions. A will observe Y, while B will observe Z. B will compute some string M intended as a message to A. M is a logical fact that A observes by simulating B. After receiving this message, A will choose some rocket design r. In my elaboration, A knows of two algorithms; a(n), which finds an optimal solution for s(r)-0.1*t(r) in O(2^n) time (using a subalgorithm y(n) to access logical facts Y=y(n)), and b(n), which is known to approximate an optimal solution for s(r) in O(n^2) time. The approximation is not well-understood and its internals are opaque, so its behavior for other factors such as t(r) is unknown except insofar as they would limit its good performance on s(r). The algorithm b(n) can be treated as having two parts; z(n) accesses relevant logical facts from its inputs and surroundings, and m(z) approximates a solution for s(r) in O(n^2) time. In this case, A has access to M=m(z) and Z=z(n), but cannot use Z to solve for its true utility s(r)-0.1*t(r) while retaining the speedup of b(n). This would be a case where a conservative agent would want to model the process b(n) as the agent B, in a way that did not violate the orthogonality thesis. This is largely a restatement of the prior points which Jessica referenced in her post (from Paul Christiano), but it does suggest that this may be an inevitable problem for any agent with any significant slowdown in its reasoning to ensure value alignment. In the case of Garrabrant logical inductors, the inductor is much slower than any of the individual agents B, and so this extension does not provide any additional reason to think that it should be impossible to create a variant inductor that accounts for the internals, though doing so might increase the difficulty of creating an efficient inductor with the same principles. # Minimum Viable Concept I got into an argument, and while I don’t think anyone changed their mind, I think I realized something about why our argumentative norms are so incompatible. The people I was arguing with are academic philosophers. They like extensive, detailed exploration of a concept, tend to be very wordy, and cite heavily. I am a rationalist, which is justly accused of being a new school of philosophy that includes as one of its tenets “philosophy is dumb”, and we do not have the same norms. Here’s an example: (EDIT: After feedback that the quoted person did not agree with their paraphrased statement, I have replaced it with direct quotes.) Me: I’d be interested in the one minute version of how you think the Sequence’s criticism of philosophy is wrong. My interlocutor: There are several criticisms, if you link me to the one you want, I’ll write a thing up for you. Me: “Point me to a paper” is one of the frustrating things about trying to argue with [philosophers]. Particularly after [I asked] for the short version. If you don’t have a response to the aggregate that’s concise, just say so; the response you gave instead comes off as a mix of sophistication signal and credentialist status grab, with a minor side of “This feels like dodging the question.” Philosophers, on the other hand, seem to have a reaction to rationalist argument styles of “Go read these three books and then you’ll be entitled to an opinion.” More charitably, they don’t think someone is taking discussion of a topic seriously unless they have spent significant effort engaging with primary sources that are discussed frequently in the literature on that topic. Which, by and large, rationalists are loath to do. The academic mindset, I think, grows out of how they learned the subject. They read a lot of prior work, their own ideas evolved along with the things they’d discussed and written papers about. A lot of work is put into learning to model the thought processes of previous writers, rather than just to learn their ideas. Textbooks are rare, primary sources common. Working in an atmosphere of people who all learned this way would tend to give a baseline assumption that this is how one becomes capable of serious thought on the subject. (Added note: It seems to be the case that modern analytic philosophy has moved away from that style of learning at most schools. All the effects of this style still seem to predict the observed data, though.) The rationalist mindset grows out of the Silicon Valley mindset. They have the “minimum viable product”, we have the “minimum viable concept”. Move fast and break assumptions. Test your ideas against other people early and often, go into detail only where it’s called for by the response to your idea, break things up into many small pieces and build on each other. If you want to get a library of common ideas for a subject, read a textbook and go from there. With this mindset, it’s a waste of time to read a long book just to get a few ideas and maybe an idea about how the author generated them; you could instead take half an idea, smash it against some adversarial thinking, and repeat that three or four times, getting several whole ideas, pushing them into their better forms, and discarding the three or four that didn’t hold up when you tested them. Find techniques that work and, if you can put them into words, give them to someone else and see if it works for them as it did for you. So academics see us as dilettantes who don’t engage with prior art, are ignorant, and make old mistakes; and we see them as stick-in-the-muds who aren’t iterating, wasting motion on dead ends without anyone to tell them they’re lost and slowing down any attempt at collaboration. (I don’t think I’ve changed my mind about what I prefer, but I hope I’ve passed an ideological/epistemological Turing test that lets people make up their minds which is better.) # Self-Reifying Boundaries In the words of Scott Alexander: Chronology is a harsh master. You read three totally unrelated things at the same time and they start seeming like obviously connected blind-man-and-elephant style groping at different aspects of the same fiendishly-hard-to-express point. In my case this was less “read three totally unrelated things” and more “read one thing, then have current events look suspiciously related”. I have been working my way through Thomas Schelling’s “Strategy of Conflict”, which made precise the concepts we now call “Schelling points” and “Schelling fences”, among others. He was focused on the psychological game theory of positive-sum bargaining, particularly in the context of nuclear war. Which I’m against. Do not punch Nazis. No, not even if they’re wearing spider armbands and shouting Heil Hitler. Imminent self defense only. “Bad argument gets counterargument. Does not get bullet. Never. Never ever never for ever.” But why? Why is free speech protected? Other good tools, it can be pointed out, are also usable for bad purposes. Abusers “set boundaries” to maintain their control, but boundary-setting is healthy in other contexts. We do not have a “right to set boundaries” that protects the misuse by abusers. The first reason is the marketplace of ideas, which Scott defended more eloquently than I’m likely to manage. A good reply to a bad argument, or a morally terrible ideology, is one that addresses the substance, not one that silences it. Say there are only clueless idiots being wrong, and enlightened philosophers being right (or at least less wrong). 1000 clueless idiots can silence 10 enlightened philosophers just as well as 1000 enlightened philosophers could silence 10 clueless idiots. Or you could argue the substance; even if there are 1000 idiots arguing, the philosophers are probably going to win this one. And because that’s true, we should be very skeptical of attempts to shut down speech. If you need to silence it, that suggests you don’t think you can beat it on the merits, while every day it sits out in the marketplace of ideas and doesn’t catch on is another snub, showing that their ideas are not worthwhile. The second reason is where we get back to Schelling. He spends a couple chapters and spills a bunch of ink about points for implicit cooperation in cooperative games with no communication. The classic example is meeting someone in New York City, but the purest one is this: Pick a positive number. If you pick the same as your partner, you both win. The correct answer is 1. Not because of anything inherent, but because human minds tend to settle on it; if you line up all the integers, it comes first. Similarly, if two parachuters land on a map and don’t know each other’s locations, they should meet at whichever feature is most unique. On this map, meet at the bridge: If there is only one building, and two bridges, meet at the building. And if right before you jumped, one of you said “if I got lost, I’d climb the highest hill around and look for my buddy”, then you go to the highest hill around. Critically – and this is where Schelling gets to his real subject – you should climb that hill even if it’s grotesquely unpleasant for you. It wasn’t the obvious place to meet, but by the act of mentioning it your buddy has made it so; now it is. The act of mentioning that something might be the obvious place to coordinate, if communication stops there, makes it the obvious place to coordinate. Make a stupid assumption out loud at a time when shared context is scarce and no one can contradict you, and you reify your stupid assumption into consensus quasitruth, because everyone knows that everyone knows about it, and now you have a shared premise to reason about where you go from there. This is culturally and contextually determined. If you have to coordinate on a number from the list “three eight ten ninety-seven seventy-three”, you’ll probably pick ten, but if you counted in base 8, you’d probably pick eight instead. And these natural coordination points determine points of reasonable compromise. A car salesman haggling doesn’t say “I will accept no less than5173.92 for this one”, because no one would believe it. “I will accept no less than \$5200”, though, we will believe (as much as we’ll ever believe a car salesman).

At the time he was writing, we had conventional explosives more powerful than any nukes that were public knowledge. We used them. Nukes stayed off the table anyway, not because they were different but because they felt different. It was an obvious line, and obvious to everyone that it was obvious to everyone. And so “no nukes” became one of the rules of limited war in a way that “no nukes more destructive than our best conventional bombs” couldn’t have. The perception of them as a difference in kind reified itself, creating a distinct legal status purely because of their distinct subjective perception.

The same is true of free speech. There are reasons to think that free speech is more important. (See reason one.) But even if those reasons don’t cut it, everyone knows about them, and since the Enlightenment it has been treated as especially important. It’s more vivid in the USA, where we elevated it to the second right specifically protected in the Bill of Rights, but even in Europe, where its status is lower, everyone understands that protecting freedom of speech is special, even where they allow exceptions. Even if it isn’t, in an objective ethical calculus, actually worth special protection,  we treat it as a bright line which only tyrants cross, and bending that bright line makes you appear legitimately tyrannical, whether you do it with the law, with violence, or with social warfare and campaigns of ostracism.

So.

Don’t.