Investigating the BAAN Scenario
Thomas Metzinger has recently been pushing the BAAN-scenario to help guide the debate about AI. This podcast is great on a variety of topics, but particularly of interest is the second half about consciousness and BAAN. This article goes into BAAN in a bit more depth.
What is BAAN?
BAAN stands for Benevolent Artificial Anti-Natalism. It explores the logical possibility that a superintelligence prevents further human lives from coming into existence (anti-natalism), and does so compassionately, acting in our own best interest (the benevolence part). It goes something like this (quotations from the Metzinger article):
 A superintelligence has come into existence. It has superseded mankind in all aspects of cognitive performance and is therefore the de facto authority about all things, including ethical and moral reasoning.
 The superintelligence is benevolent and compassionate. It "fully respects our interests and the axiology we originally gave to it." "There is no value alignment problem" as its values are aligned with ours.
[3a] The superintelligence sees that we suffer from existence bias, which deceives us into valuing our own existence despite the suffering this entails. It sees that states of experience "are much more frequently characterized by subjective qualities of suffering and frustrated preferences than these beings would ever be able to discover themselves." It sees that a positive life balance is impossible.
[3b] Like us, it weighs preventing suffering as significantly more important than aiding happiness.
 Being compassionate, it decides to act to minimise "consciously experienced preference frustration, of pain, unpleasant feelings and suffering." It concludes that non-existence (and therefore no suffering) is the best case scenario for all future biological self-conscious beings. It would go directly against our values to end our lives, so it does not do that. However, it does prevent the birth of any more conscious biological creatures.
The Value Alignment Problem
The value alignment problem is:
to build intelligence that is provably aligned with human values. (see Stuart Russell's comment here)
The assertion in 
that there is no value alignment problem seems to be directly contradicted by the conclusion 
. At least, my personal values seem to be incompatible with a peaceful extinction event of conscious beings.
Presumably the argument is that our fundamental values do not explicitly include the continued existence of humans. Instead, they are based on preference-satisfaction and avoiding frustrated preferences. We mistakenly assume this implies continued existence of future humans, but that's only because our existence bias blinds us.
This is reminiscent of Je Souhaite, an X-Files episode in which (spoilers) Mulder wishes for peace on Earth and wipes out the entire human population. Here there is a clear "That's not what I meant!" moment, and Mulder accuses the genie of purposefully misinterpreting the intent of his wish. To claim there is no value alignment problem in BAAN, we must assume our values are correctly specified but, like Mulder, we're wrong about their consequences.
Can I be wrong in my judgement that my life is worth living?
Implanted through millennia of natural selection, the drive to continue existing has become part of our fundamental nature. It is a dogma of our biological existence. This is our existence bias. But is it so blinding as to make us perceive well-being when there is actually suffering? Does this possibility even make sense?
Metzinger claims that existence bias blinds us to the (possible) fact that "the phenomenal states of all sentient beings which emerged on this planet—if viewed from an objective, impartial perspective—are much more frequently characterized by subjective qualities of suffering and frustrated preferences than these beings would ever be able to discover themselves." But what is a phenomenal state viewed from an objective, impartial perspective? Phenomenal states are, by definition, subjective. We can be (and often are) wrong about how they correspond to objective reality, but can they really be wrong when they are only making subjective claims?
The argument seems to assume that it is possible that I believe that I am enjoying my existence, but in actual fact I am suffering. Earlier, as I sat in the grass, I felt like I was enjoying simply noticing the ebb and flow of experience. When you actively notice your experience, there is an implicit comparison being made with the case where there is no noticing, where there is nothing at all. To me, the basis for well-being seems to be grounded in the capacity to notice experience, with preference satisfaction being the icing on the cake. Of course, with bad enough icing even the best cake becomes inedible. But there are also cases where the icing is fine, or even missing, and these are positive.
My life could certainly become net-negative due to an unfortunate change in events. But, should I painlessly die this very instant I would subjectively characterise it as having a positive balance. Of having been worthwhile. The full anti-natalist conclusion requires that all life is net-negative. Can it really turn out I am wrong, and that I am just too blinded by my existence bias to notice all the suffering?
Is Life Suffering?
Motivation for 'life is suffering' come from the Buddhist literature where suffering is a translation of the term Dukkha. Dukkha arises from the fact that all preferences are ultimately unsatisfied as all states of being are impermanent. If the worth of a life is measured in terms of positive preference-satisfaction then, I agree, we get to 'life is suffering'.
I prefer the less emotive translation of Dukkha to unsatisfying. For me, the root of well-being is not preference satisfaction, but noticing a perceived flow of consciousness and comparing this to its absence. The act of noticing experience is also impermanent, but there is, by definition, never a conscious entity around to witness the cases where it is unsatisfied. As soon as the noticing criterion becomes unsatisfied, then there is no perceiving entity that exists to suffer.
To be clear, I'm not arguing that suffering is not the most important variable in the equation. [3b] states that preventing suffering is more important than fostering the opposite. I agree. However, it is possible for [3b] to be correct, but there to still exist future humans that have a positive life balance. It is only when combined with the idea that all life is best characterised by suffering (which I have argued against) that this leads to the anti-natalist conclusion.
I am making the following claims:
- My incredibly lucky and privileged life up to this point has been net-positive.
- The worth of a life is best measured subjectively by the person living it. It does not make sense to override this by an objective evaluation.
- It is possible to cultivate similar life in the future.
This would replace the conclusion 
with, at most, a partial anti-natalism. Under this version of the argument perhaps some lives will be prevented, but others will be encouraged and humanity taught ways to increase their well-being. Metzinger also considers this scenario, arguing that maybe in the future we could make 3a
false. I'm arguing it already is.
For the BAAN-scenario to avoid the value alignment problem comes down to whether or not I can be objectively wrong when I subjectively value my life as positive, and whether it is possible to conclude that "life is suffering" for all biological creatures. I say no, not if value is ultimately derived from simply noticing experience itself. However, regardless of its relationship to value-alignment, the BAAN-scenario provides for interesting discussion. I'm going to go back to my existence bias-fuelled life now and irrationally enjoy the deep satisfaction that comes from simply being aware of experience.