Faversham Stoa is a philosophy discussion group meeting on the 3rd Tuesday of every month from 7.30 to 9.30pm in the The Bull in Tanners Street. We cover a large range of topics. If you have an idea for a topic that you would like us to cover why not drop us a line? There's no charge for membership and everyone is welcome to drop in. Just bring your brain and some beer money!

Artificial Intelligence, Artificial minds, Artificial Consciousness?

Author:Graham Warner
Published:September 2014
Download:PDF version


I hope to bring out some of the philosophical issues that arise from the possibility and reality of artificial intelligence. I also hope to show, in passing, how science fiction has contributed to making these issues clearer, more accessible and more interesting and engaging. For me, reading SF from an early age stimulated my interest in philosophy as well as science and technology. It was most interesting when it moved the focus beyond the "gee-whizz" of new machines and addressed questions about what they might mean for us. One important aspect was the huge diversity of contrasting views that were embodied in SF, especially in short story magazines and compilations which may be its most important form. Science fiction still gives a valuable way to raise the important issues; thought experiments, perhaps?

AI or Artificial Consciousness?

First, there's an ambiguity in how people understand the term artificial intelligence. Strictly it means recreating artificially the reasoning or power, or general intelligence, of human beings; or at least significant parts of it.

But what many people are really interested in artificial consciousness; recreating self-awareness, minds in the full sense, in a machine. The ambiguity goes right back to Alan Turing's seminal paper Computing Machinery and Intelligence (1950).

Now there are forms of AI that fairly narrowly aimed at carrying out practical tasks, like controlling traffic, traffic lights, making self-driving cars, better search engines, playing chess or video games, and so. Again, these are enormously useful but not what really excites many people.

But what really interests most people is artificial consciousness, artificial minds.

Of course, this is one of the most popular themes in science fiction, back to at least RUR, Karel Capek's 1920 play that coined the term robot, or the film Metropolis.

One famous film AI is HAL in 2001: A Space Odyssey. Other AIs have turned murderous before, but HAL seems to be treated as more than another movie monster. The film raises a number of central questions in the philosophy of AI; For instance, does HAL reproduce or mimics human minds, a question posed in the BBC12 interview in the film. And when he kills, who is responsible?

The Measure of a Man

There is an episode of Star Trek: The Next Generation, called "The Measure of a Man," which dramatises some key issues. Commander data is an android with a "positronic brain" and a Star fleet officer on the Enterprise.

Nobody quite knows how Data was constructed, so Maddox, a Star Fleet AI specialist wants to experiment on him, to reverse engineer him, to learn how to reproduce him. Unfortunately this may threaten his continued existence, or at least the quality of subjective memory he tells us values most. Data refuses and resigns, but Maddox insists he is not a person, but a piece of Star Fleet equipment, so he can't resign. A legal hearing ensues, with Data's future at stake.

Before the trial we see Maddox read poetry to Data, and ask him "Is it just words to you, or do you fathom the meaning?"

In order to show that Data is "just a machine" we see Data's arm removed in the courtroom; we see him switched off, we see him use superhuman strength to bend a steel bar. He is described as a "collection of neural nets and heuristic algorithms" (HAL in 2001 stood for "Heuristically programmed ALgorithmic computer").

Captain Picard, Data's defender shows that he displays many of the signs of humanity; he values highly an image of a dead colleague to whom he was emotionally attached. For instance, Maddox is challenged to prove that Picard, a human is conscious. The issue of slavery is evoked. The judge says ""Does Data have a soul? I don't know that he has. I don't know that I have" And I suspect you can guess how it turns out.

But perhaps the programme rigs the debate; Maddox is cold and unpleasant, referring to data as "it"until the last scenes, when he softens, apparently convinced. At one point it's suggested that he hides a fear of Data's nature. Quite improbably, he seems never to have considered the questions of evidence that Data is self-aware; he's stumped when asked about this on the witness standard. That seem highly improbable; has he never thought of this, despite centuries-long history of debate?

We are expected to feel sympathy for Data, to take his side, and we do. Maddox has complained that opposition to his plan is "emotional and irrational... because it looks human, but it is not. If it were a box on wheels I would not be facing this opposition!"

Well, he's not just a box on wheel. And we are probably convinced by his case. But how sure can we be? Might there be more to be said for Maddox's position?

Conscious Computers?

The question of how we can know whether a machine is conscious was central to Alan Turing's paper 'Computing Machinery and Intelligence', where he proposed the Turing Test (TT), which has been in the news recently. This proposed a test where a number of judges converse via teletype with a human and a machine. If, after a few minutes, the majority of judges couldn't tell detect the nature of the machine, it would have won, and the test would be a worthy replacement of the original question "Can a machine think?"

There's been some of furore about the alleged claimed passing of the TT by a chatbot which apparently to convinced at least 30% of a panel after a 5 minute conversation. It hasn't convinced many others, though.[1]

When I first heard about the Turing Test I was impressed, but later I had doubts. It seemed much too behaviourist to me, assuming that we could base our test purely on external behaviour. "If it walks like a duck..." Then I heard about John Searle's Chinese Room argument, and it confirmed my doubts. Searle's target is the computational theory of mind, (computationalism for short) the claim that minds are to brains as programs are the computers; the foundation of a whole scientific project known as cognitive science, which helped to replace behaviourism and solve the mind/body problem. Beyond the scientific or philosophical claims of computationalism, I think there are some cultural assumptions that become widespread, though not of course universal.

When I first became aware that there were such things as computers, in the late 60's and early 70s, they were often referred to as 'electronic brains'. Although this usage has died out, the metaphor has become deeply embedded.

I think it was in James Burke's TV series Connections that I first heard the idea that cultures tend to use central technologies of their time as metaphors for the way the body and the mind work. So at one time, the heart, and sometimes the physical aspect or analogue of the human personality were sometimes seen as furnaces; later, the heart was understood as a pump, at a time when hydraulics and pneumatics were becoming established. Freud's metaphors draw on this hydraulic and mechanical metaphor, with pressures and drives. If this is correct, then it's not surprising that we would adopt as a metaphor the newest technology, and the one which seems most to replicate at least some capabilities of the human brain/mind; for example in the processes of calculation that have historically been called 'computation'.

Searle, a philosopher of language and of mind, was asked to comment on a particular AI program which could draw inferences about events in stories that were not explicitly stated in the stories, at Yale's Ai Lab.[2] As he tells it, he came up with what became known as the Chinese Room thought experiment on the flight there. He was worried that it wouldn't be enough to last for the couple of hours of the seminar. We needn't have worried; three and a half decades later, it's still being hotly debated.[3]

"I imagine that I'm locked in a room with a lot of Chinese symbols (that's the database) and I've got a rule book for shuffling the symbols (that's the program) and I get Chinese symbols put in the room through a slit, and those are questions put to me in Chinese. And then I look up in the rule book what I'm supposed to do with these symbols and then I give them back symbols and unknown to me, the stuff that comes in are questions and the stuff I give back are answers.

Now, if you imagine that the programmers get good at writing the rule book and I get good at shuffling the symbols, my answers are fine. They look like answers of a native Chinese speaker. They ask me questions in Chinese, I answer the questions in Chinese. All the same, I don't understand a word of Chinese. And the bottom line is, if I don't understand Chinese on the basis of implementing the computer program for understanding Chinese, then neither does any other digital computer on that basis, because no computer's got anything that I don't have. That's the power of the computer, it just shuffles symbols. It just manipulates symbols.

So I am a computer for understanding Chinese, but I don't understand a word of Chinese. You can see this point if you contrast Searle in Chinese with Searle in English. If they ask me questions in English and I give answers back in English, then my answers will be as good as a native English speaker, because I am one. And if they gave me questions in Chinese and I give them back answers in Chinese, my answers will be as good as a native Chinese speaker because I'm running the Chinese program. But there's a huge difference on the inside. On the outside it looks the same. On the inside I understand English and I don't understand Chinese. In English I am a human being who understands English; in Chinese I'm just a computer.

Computers, therefore—and this really is the decisive point—just in virtue of implementing a program, the computer is not guaranteed understanding. It might have understanding for some other reason but just going through the steps of the formal program is not sufficient for the mind."[4]

Searle has given a formal presentation of the idea proceeds from the following three axioms:

(A1) Programs are formal (syntactic).

(A2) Minds have mental contents (semantics).

(A3) Syntax by itself is neither constitutive of nor sufficient for semantics.

to the conclusion:

(C1) Programs are neither constitutive of nor sufficient for minds.

Searle then adds a fourth axiom:

(A4) Brains cause minds.

from which we are supposed to "immediately derive, trivially" the conclusion:

(C2) Any other system capable of causing minds would have to have causal powers (at least) equivalent to those of brains.

whence we are supposed to derive the further conclusions:

(C3) Any artifact that produced mental phenomena, any artificial brain, would have to be able to duplicate the specific causal powers of brains, and it could not do that just by running a formal program.

(C4) The way that human brains actually produce mental phenomena cannot be solely by virtue of running a computer program."[5]

Searle's point is to question the idea that running a computer program is sufficient to give us a conscious mind. Searle wants to reclaim the importance of acknowledging the first-person perspective, and also of taking the brain seriously in philosophy and science of mind—which sounds obvious, but had not been the case for decades when Searle started writing.

Searle made a distinction between Strong AI, the project of creating conscious minds through computational means, which he believes is misconceived, and Weak AI, the use of computer modelling to gain a better understanding of the brain and mind, which he believes is useful.

There were a lot of replies to the Chinese Room, which Searle listed and named in order to rebut them. The System reply is probably the most important and most promising reply. It argues that Searle is only the part of the total system; it is the room as a whole which understands. Searle replies; then get rid of the room, let's imagine that I'm able to memorise the cards and rule books. Everything is in my brain, but still I don't understand any Chinese.[6]

When anyone mentions John Searle's philosophy, the most likely reply will be "Oh, the Chinese Room". This is probably the main argument of Searle's that most people will have heard of, if not the only one. It was certainly the context in which I first heard of Searle.

However the Chinese room is just one aspect of one area of Searle's philosophy, and perhaps not the most important or radical one in connection with artificial intelligence and the computational theory of mind. The Chinese room is a one part of a broader position that, if it's correct, completely undercuts the conception of the brain as computer and the mind as a computer program.

Digital computers are usually electronic devices. They manipulate data, typically in the form of two different voltage levels, taken to represent 1s and 0s, using electronic logic gates. However, in principle a computer could be made in many different ways. Babbage's Analytical Engine was the first design for a true computer, and that was purely mechanical, made of gears and levers, like a clock. In principle, a computer could be made of water pipes and valves, or even an elaborate system of cats and mice and cheese,[7] or pigeons pecking.[8]

A recent episode of the BBC series Bang Goes the Theory made a simple digital calculator processor out of Primary school children tapping each other on the shoulders. So, if we assume that the human brain is a computer, and the mind, including consciousness is a computer program, this implies that the right configuration of cats and mice and cheese would have to be conscious. This seems counterintuitive, to say the least, but Searle does not dismiss it on those grounds, though he does say that it should be a worrying finding for computationalists.

Searle point is that there is nothing about the nature of computers that makes their different physical states intrinsically 1s and 0s. This is just an ascribed meaning; ascribed, that is by conscious beings. In the same way, the word 'cat' in English doesn't intrinsically mean one of those furry things. It's just a sound (vibrations in the air) or some marks on paper. It only means something because we agree that does. That's what allows language to "carry" intentional meaning from one mind to another. Intentional states, like beliefs or pains, must either be conscious states or at least be capable for becoming conscious For example, I believe that Obama is the US President even when I'm not thinking about that topic.

There are some things that are real and exist regardless of what we believe about them, Searle says, mountains and molecules, for example. He calls these intrinsic or observer independent. Other things are real, but only because we have a social agreement that they exist; for example, money, political office, chairs and bath tub. He calls these observer relative. And, he says, computing is observer relative. I found that startling and improbable when I first heard it, but having looked closely at the argument, I'm convinced.

As Searle puts it in The Rediscovery of the Mind

This is a different argument from the Chinese room argument, and I should have seen it ten years ago, but I did not. The Chinese room argument showed that semantics is not intrinsic to syntax. I am now making the separate and different point that syntax is not intrinsic to physics. (p. 210)

The implication is that concepts like 'symbol' and 'syntax', which are central to computational explanations of the mind, are only meaningful if there are already conscious or intentional minds to ascribe meaning to them. In that, they are like words in natural languages. I start with the idea that the cat is on the mat. That is an intrinsic intentional state, precisely because it is in a conscious mind. To tell you about it, I use words in English, which do not have any intrinsic meaning, only "ascribed intentionality"—ascribed by agreement between users of the language. You see these shapes on paper "The cat is on the mat", and use that shared agreement to understand them, creating a new, intrinsic intentional state in your mind.

The point is that computational processes and symbols are like natural language, in being observer relative, and carrying only ascribed intentionality. Thus they can't be used to explain the conscious, intentional mind, because they presuppose its prior existence.

As you can imagine, this lead to further intense arguments, which I won't try to describe here. So what alternative does Searle propose to the computational theory of mind? He wants us to focus on the brain, and its physical causal powers, because we know by our own experience that it does give rise to consciousness. He calls his perspective "biological naturalism" and has claimed (without significant contradiction) that many neurobiologists agree with it, implicitly or explicitly.

It's often claimed that Searle believes that artificial consciousness is not possible. That's not true; he thinks that it may well be possible to reproduce the brain's casual power in an artificial machine. His point is that computers, as such, don't have those causal powers, and that computation in itself is not sufficient to produce conscious minds.

Now this raises a problem about Data's trial; are we right to judge that he is conscious on the basis of his behaviour. A version of the classic problem of other minds, and behaviourism took behaviour to be the only scientific evidence available to us. In doing so, it had to "feign general anaesthesia" And, as Searle emphasises, it had to ignore out everyday experiences; if I feel a toothache, do I really feel it, if not, who is mistaken about it? Searle likes the joke about behaviourist after sex. Searle says we have other reasons than shared behaviour to believe that other humans are conscious; we know that we share a common biology. He discusses how far we can extend that to animals, like his dog Ludwig to others phylogenetically less like us.[9]

Now in the case of Commander Data, we don't have that evidence. That's perhaps one reason why the courtroom demonstrations against him are effective; removing his hand, getting him to bend steel and switching him off. They show that we don't seem to share a common physical construction. So perhaps Data really is very clever simulation of a human, with equal or better intellectual powers, and greater strength, and some very effective rules about how to behave to appear to be conscious? Maybe the lights are on, but nobody's home?

If Searle is right, and if Data's mind is essentially a computer running an algorithmic program to manipulate symbols, then Data isn't conscious. So does Searle claim that have shown that conscious AIs are impossible? No! Nor does he say that only brains could produce consciousness, and that there is some kind of special essence about biological matter that "oozes" or "secretes" consciousness.[10]

But he does not think that anything about non-biological matter prevents it from producing consciousness. This confusion seems to arise, in some cases, from a related one, an equivocation, between the terms "machine" and "computer".

Like the judge in Star Trek, Searle says that we are ourselves a machine in most important respects (except in the common folk attribute of machines, of being the artificial product of intelligent design). So, of course machines can be conscious. Nor does he believe that no artificial machine could be conscious. But to be conscious, such a machine would have to share the physically causal property that exists in the brain. They need not be biological, but they must share the same physical causal powers that biology uses. His various arguments again Strong AI only aims to show that computers, in the usual sense of algorithmic data processing devices that manipulate symbolic data, are qualitatively the wrong kind of machine to do that.

So, it's quite possible we could have a conscious AI; just not a conscious computer. Take the robots in Isaac Asimov's stories. I've checked, and I can't find anywhere where he refers to their brains as computers. They are "positronic brains" made of meshed platinum-iridium; which sounds like a device intended to reproduce the physical causal powers of the brain; that is, not a computer as we know them. I think Asimov regards the main difference as between fixed computers and mobile robots that interact with the world, though. In the Star Trek:TNG episode we learn of Data having a positronic brain, like Asimov's robots; there are also mentions of his "programs"; but perhaps that was just loose or metaphorical talk. So it's unclear whether Data is a computer.

Does Searle matter?

So if Searle does believe that it would be possible to build the right kind of conscious machine, why does his argument against Strong AI matter at all? Two reasons: the computational theory of mind has been the core of cognitive science, perhaps the dominant theory of mind for decades, and if Searle is right it's wrong (though with the potential to adopt a neurobiological turn that might retrieve it; he compares it to alchemy, which gave rise to chemistry.)

Secondly, there is a possible practical side to the question, albeit one that's still in the realm of science fiction at the moment. There are transhumanists[11] who argue that we can avoid death and enhance ourselves greatly by uploading ourselves into computers. They may be right, but if Searle is right, they are trying to lead us towards a tragedy of grotesque proportions. In that case, we would voluntarily rush to our own destruction, and not even be aware of it while it was going one. Each uploaded and enhanced person would eagerly assure us that all was well, and encourage us to "come on in, the water's lovely" And all the time, they would be zombies, programmed unintentionally to give those assurances by the processes that provide a simulation that is mistaken for a reproduction with extra bells and whistles.

I haven't seen this scenario in science fiction, though I'd be surprised if it hasn't been covered somewhere.

I should mention Raymond Tallis: in Why the Mind is Not a Computer he goes further than Searle in criticising the equivocation involved in many of the key terms in the lexicon the debate computation and mind. 'Information', "is the big one", has a different meaning in its everyday usage to that in the mathematical and engineering Information Theory, as founded by Shannon and Weaver. This has little to do with information in the ordinary sense.

Weaver, one of the first to think of information in the way just described, underlined this:

Information in this theory is used in a special sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning. In fact, two messages, one of which is heavily loaded with meaning, and the other of which is pure nonsense, can be exactly equivalent from the present viewpoint as regards information. [Information in this theory is used in a special sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning.][12]

Yet, Tallis argues, exactly that confusion has become endemic in computationalism.

Tallis discusses the multiple and misleading meanings of other terms such as complexity, grammar, goals, instruction, and interpretation, language or code, pattern, processing representation or model, rule; in other words many of the key words and concepts in computational theory of mind and cognitive science.[13]

Fear the Android?

In The Measure of a Man Maddox is accuse of fearing Data; in 2001 such fear is born out. Why do stories about androids and Ai often express fear and anxiety? And is the anxiety justified? A major reason why we are concerned and anxious about androids is not only in how they present themselves physically, but in our conceptual framework. They don't seem to fit within our accepted categories. The uncanny valley is a hypothesis in the field of human aesthetics which holds that when human features look and move almost, but not exactly, like natural human beings, it causes a response of revulsion among some human observers Neither human, animal or machine, or at least not wholly any of these. Like the dog in Oxford we don't know how to react to them and it causes us severe concern.

We may find ourselves in the uncanny valley conceptually, too. Kenan Malik in his book Man, Beast and Zombies on p.274 discusses "The Measure of a Man" and the dilemma over whether an android should be accorded full human privileges as a key theme in much science fiction. This is because the notion of an android blurs the distinction between mechanical objects and conscious beings, two fundamental categories by which we understand the world.

These define our expectations and our responsibilities towards them. We may say that TVs or cars have a mind of their own, but we think we know that they cannot think for themselves. Conscious beings, on the other hand, have a mysterious inner force animating them. Mechanical objects are designed for the use of human beings, and we do not owe them any moral consideration as object. Kant's categorical imperative doesn't apply to objects; or it never has up until now:

Act in such a way that you treat humanity, whether in your own person or in the person of any other, never merely as a means to an end, but always at the same time as an end. (Immanuel Kant, Grounding for the Metaphysics of Morals)[14]

The android, the conscious AI, violates our expectations and what we think we know about how to divide up objects in world.

What's more, there are reasons to think that these fundamental categories relate to what are called "core domains" revealed in psychological experiments, and starting with how infants understand the nature of objects and as they develop, of other people. [See Malik, MBZ—275 ff][15]

There is a debate about how the major categories we use to carve up the world—such as the mechanical, the biological and the conscious—are innate, the products of ancestral dispositions, or of the interaction of innate biases with environmental and cultural factors. However, both these views are historically static; the dilemma over Commander Data's status would have been incomprehensible five centuries ago. In the Aristotelian or the Cartesian view there would have been no question; Data, as a brute machine, could not possibly be conscious. [Kenan Malik MBZ p288ff]

Hence the stories, fraught with anxiety, fear and moral concern. Hence the Frankenstein complex as Asimov calls in his Robots stories. These may combine or conflict with the cultural acceptance of an analogy, if not an identity, between mind and computer, that I mentioned before.

Are Friends Electric?

Another question that is posed in science fiction question is "Could we be AIs and not know it?"

In the new version of Battlestar Galactica, first the audience and then one character herself come to suspect that she may be an AI. Much ink has been spilled over the question of whether Deckard in Blade Runner is himself a replicant, like those he hunts down, although replicants seem strictly to be "wet" artificial life rather than artificial intelligence the distinction doesn't matter here.[16] If that idea is cause enough for anxiety in fiction, how about in reality?

Enter Nick Bostrom's Simulation Argument [17].

To paraphrase, this suggests that any very advanced civilisation would have massive amounts of computing power at their disposal. For them to create computer-based research simulations of virtual universes, perhaps based on variations of their ancestors. But why should they stop at one, or a few variations; far better to create a huge number of simulations. I've outlined these points sketchily; Bostrom goes into detail to show that they are credible, for example by examining what order of computer capacity would be needed, and whether that is feasible. Now the payoff; Bostrom argues that this implies that simulated universes would massive outnumber real ones, In which case, the odds are that overwhelming we are living in such a simulation. Bostrom has said that he estimates that the odds that this is true are about 60 percent. One argument against that possibility is Searle's of course, as Bostrom acknowledges in the Philosophy Bites podcast. If Searle is right, and the simulations are computational, as Bostrom's reasoning and calculations assume, we can't be in such a simulation, because we know we are conscious, and computer simulations can't be. What a relief.


Finally, what about the Terminators? Would AIs threaten us?

Nick Bostrom has produced an excellent philosophical discussion of this question in his recent book Superintelligence: Paths, Dangers, Strategies[18].

Humans have survived natural threats to our existence for millennia. The greatest danger probably comes from new technologies that we create ourselves. One of these is the creation of an artificial superintelligence that massively outstrips and supersedes our own. That would be the last invention that humans make, with enormous promise or risk.

It's very difficult to tell how soon we might reach that point. An opinion survey of AI experts asking when thought there is a 50% chance we will have human-level AI; the median answer was around 2045, with a wide spread either side.

Once that is achieved, the next step would be to enhance its capability beyond the human. At that point we can use that intelligence to advance itself further, in a kind of feedback, perhaps a runaway and accelerating effect towards something radically superintelligent.

Note that the important goal here is something like "general intelligence" in humans, rather than having consciousness as its central goal. Most of Bostrom's arguments still apply to the case of superintelligent AIs that only present the appearance of consciousness.

We cannot tell yet which of a number of approaches to AI might lead to this. We can't tell whether the techniques that yield immediate dividends now will eventually scale up or whether one or more basic breakthroughs will be required.

One approach is what's sometime call Good Old Fashioned AI, (GOFAI) the kind of approach that lead the field when Searle launched his first attacks; using mathematical approaches to create algorithms and data structures. This has been more or less abandoned since Searle, partly through his attacks and partly because it hasn't yielded the progress towards Strong AI that was hoped. Another is to try to discover how the brain actually produces intelligence and learn to use similar algorithms or architectures. (Massive dispute are embedded in the difference between those last two words!).

At its extreme, there is whole brain emulation, literally copying the data structure in a human brain and running that. Searle would say that this is still computational and hence observer relative and symbolic, so it wouldn't give Strong AI (conscious AI), although emulating it in a physical reproduction of brain structure might. However, in response to brain emulation programs, he's says first that we don't yet know enough about how the brain works, and once we do, we should use it instead to make physical, non-computational artificial brains if consciousness is our goal. If useful general intelligence at human level or above is what we are aiming at, though, an unconscious system might be better; it would avoid the ethical questions about slavery, for one thing). The fact there are multiple paths towards the same destination increases the probability that we will get there eventually.

Once we have human-level intelligence, some methods might be easier than other to amplify, to increase intelligence beyond the human level. There are a number of potential ways to amplify an existing human level AI's intelligence: make a lot of copies of it, perhaps millions, to get a collective superintelligence; or run something qualitatively like a human mind on much faster machines; or improve its algorithms.

Now, if we are worried about the risk, there are issues about how far such a superintelligence could escape its box to gain the power in the world to threaten us. It might be directly or indirectly linked to manufacturing, or to weapons systems; like Skynet in the terminator films. It would probably be a superhacker, gaining ever-widening control through the Internet. Or it might use it's superiority in what hackers sometimes call social engineering; persuading humans by manipulation and deceit to carry out its goals, or to give it direct control[19].

Why might a superintelligence harm or threaten us? It would probably have some goal or goals of its own, or perhaps goals hardwired when it was constructed. Bostrom give a simplified example that stands in for any goal; make as many paperclips as possible. It might transform the earth into a paperclip factory; and although it might have the power to change its own goal, it would evaluate any such change in the light of whether it would improve its capacity to make paperclips. And since it would have a sophisticated model of human psychology, it would realise that we might want to switch it off, and take steps to prevent that, perhaps by eliminating or controlling humans[20]. "Almost any goal we might specify, if consistently pursued by a superintelligence, would result in to destruction of everything we care about", write Bostrom.

Couldn't we program the AI to be nice? Even trying to specify in advance goals for a super-AI that are beneficial to human might fail. Whether we could engineer a goal system that can do this is a big outstanding challenge.[21]

Can you pull the plug? Only while it's true weak to thwart attempt to turn it off[22]. Can gorillas switch us off?

Our one advantage is that we get to build the super-AI in the first place. That's our one opportunity to make it go right, and we had better take it while we can. We'll only get one chance. There are different cases, for example where one Ai emerges way ahead of others, or when many AI end up in a kind of evolutionary struggle.

What about a rosier version, where a new superintelligence starts telling us to recycle more and save the dolphin. We know we have a lot of weaknesses in understanding our world; perhaps a benevolent AI could help? But, say Bostrom, that represents only a narrow band in the range of possibilities. We would have to get a whole host of things right. And still we might make parallels with paternalist government. More likely we would get a situation where the AI has no need of humans, even as slaves.

Suppose we make an AI whose goal is to make us smile. At first it works on ever better jokes; then it works out how to implant electrodes in every human face to keep us perpetually smiling. Bostrom's message is, be careful what you wish for from a hyper-intelligent AI, you might get it. So we change the goal to making us all happy; and the best way to do that is to turn us into brains in vats with our pleasure centres continuously stimulated[23]. A benevolent Matrix would still be the Matrix.

Bostrom believes that making long lists of what we want in a future utopia is hopeless; we would forget something. We could try to specify a process or method by which the AI itself could figure out what we were pointing at. We might try to design the AI so that it would try to do that which we would want it to do if we had had thousands of years to think about the problem, and been smarter, and had more information. Bostrom calls this approach "indirect normativity" and thinks it the current best bet.

Would Asimov's Three Laws of Robotics help us?

These are really what Asimov's Robot stories are all about. These laws are irrevocably built into the core functioning of the robots brains:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given to it by human beings, except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The stories are dramatic workings out of these axioms. Although they depict many glitches and conflicts in practice, Asimov clearly feels that they could work, In fact, his robots are perhaps better, more moral beings than the humans around them. Asimov rejects the Frankenstein complex, the irrational fear of robot violence and conquest.

So does Bostrom believe that they answer the threat of superintelligence? Probably not, he says. Notice how underspecified they are; the First Law is "A robot may not injure a human being or, through inaction, allow a human being to come to harm."

What does this really mean? Minimise the probability of any human ever coming to harm? Is harm just physical? In which case, embed us in foam mattresses and make sure we can't move. No new humans should be allowed to be born; everybody comes to harm at some point. If you can't avoid any human ever being harmed, how do you weigh up harms to different groups or individuals, or animals, or artificial minds? It seems hopeless to try to develop a whole system that could get all these trade-offs right, Bostrom argues.

In later stories, Asimov introduced the Zeroth Law, at the top of the hierarchy:

"0. A robot may not harm humanity, or, by inaction, allow humanity to come to harm".

In the short story 'That Thou Art Mindful of Him?' from 1974 Asimov addresses what happens when it becomes necessary to balance harms to different people. Do they have to decide that some people are more valuable? And when they ask whether robots are themselves, better, smarter, more moral beings than humans?

So what do we do? There are two big problems: one is how to create a superintelligence. The other is how to ensure that it was safe and human-friendly.

The key thing here is to solve these two problems in the correct order.

To survive, we have to solve the control problem before we solve the competence problem. In the world today, many thousands of people are working on the competence problem- because there big economic payoffs and kudos from that. The second, controlling the superintelligence, is almost completely neglected. It's difficult to imagine slowing down work on computing and AI; what we could do is speed up work on the control problem.


1. John Humphrys grills the robot that "passed" the "Turing test" in this video: [http://www.telegraph.co.uk/culture/tvandradio/bbc/10891699/John-Humphrys-grills-the-robot-who-passed-the-Turing-test-and-is-not-impressed.html]

2. Schank and Abelson's (1977) "Script Applier Mechanism" story understanding program (SAM).

3. There's an odd link to our recent Stoa on pragmatism, where we heard about John Dewey's experimental schools.

When we lived in New York I went to an experimental school run by Columbia University called Horace Man-Lincoln, which was the original John Dewey School. It was the experimental school of the Columbia Teachers' College and it was a remarkable school.

Faigenbaum, Gustavo, Conversations with John Searle (Kindle Locations 74-78).

Another connection; Putnam, with Quine and Rorty the modern pragmatist-influenced trio echoing the original three, developed the idea of the brain as a computer in the late '50 and early '60s and laid foundations for the 'functionalist school' and the computational theory of mind

4. John Searle Interview: Conversations with History; Institute of International Studies, UC Berkeley. [http://globetrotter.berkeley.edu/people/Searle/searle-con0.html, quoted in Faigenbaum, Gustavo (2005-07-09). Conversations with John Searle (Kindle Locations 837-851). Kindle Edition.

5. Internet Encyclopedia of Philosophy [http://www.iep.utm.edu/chineser/]

6 See Minds. Brains and Programs; Searle's original paper about the Chinese Room: [http://www.cs.tufts.edu/comp/50cog/readings/searle.html]

7. Ned Block: The Computer Model of the Mind 1990

8. Z.M. Pylyshyn Computations and Cognition: Towards a foundation of Cognitive Science 1984

9. Ludwig is named for Wittgenstein, one of a series of Searle's dogs named after philosophers; Frege, Russell, Gilbert (for Ryle) and Tarski. See [https://www.youtube.com/watch?v=btsXMdbYfps] for a very cute picture of Tarski the dog.

10. Opponents sometimes accuse Searle of baseless attachment to supposed near-mystical properties of biological tissue, and of neuronal chauvinism. These are very, very common misstatements of his views by cognitive science supporters and others, such as Sam Harris, Kenan Malik.

11. Transhumanism [http://en.wikipedia.org/wiki/Transhumanism]

12. Tallis, Raymond (2013-10-08). Why the Mind Is Not a Computer: 7 (Societas) (Kindle Locations 890-893). Imprint Academic. Kindle Edition. Originally from Shannon and Weaver's The Mathematical Theory of Communication (1949)

13. There is a fallacy of equivocation that has been perpetrated by religious apologists: the brain is a computer, computers have designers, therefore brains have a designer.

14. I'm reminded of the imperatives built into the very structure of Asimov's robots, the Three Laws which we will discuss in the last section.

15. See Kenan Malik, Man, Beast and Zombie p275 ff

16. Artificial life [http://en.wikipedia.org/wiki/Artificiallife]

17. Nick Bostrom's simulation argument. [http://www.simulation-argument.com/], see also this Philosophy Bites podcast: [http://philosophybites.com/2011/08/nick-bostrom-on-the-simulation-argument.html], and Nick Bostrom's website: [http://nickbostrom.com/]

18. This summary is based on the interview with Bostrom in [http://www.theguardian.com/science/audio/2014/aug/04/science-weekly-podcast-nick-bostrom-ai-artificial-intelligence]

19. For a long time, we might not realise that anything was happening. In Warwick Collins' SF novel Computer One the superintelligence aim sets out to destroy humans by maximising pollution and environmental impact, while using its control over almost all the world steams of data to make use believe that things are getting better.

20. Parallels have been drawn between computers and bureaucracies, as discussed by Weber; rule based formal systems aimed at defined specified goals, and in the more complex form, perhaps at least an intermediate goals is to perpetuate itself.

21. I notice that Bostrom talks about the AI calculating the utility of each action in achieving its goals. This suggests that, if we can talk about a superintelligence having ethics, they would be utilitarian or consequentialist gaols. We know that opponents of consequentialism have shown that it's possible to apply such ethical theories to produce outcomes that seem contrary to intuitive moral sense, at least; for instance that it might be moral to execute an innocent person to head off a potential riot.

22. I remember a comic book story where a cleaner accidentally liberates humanity from domination by a megacomputer, by pulling out its mains plug to plug in a vacuum cleaner.

23. "It is better to be a human being dissatisfied than a pig satisfied; better to be Socrates dissatisfied than a fool satisfied" John Stuart Mill, Utilitarianism

Get the free Acrobat reader Print-friendly versions of articles are in PDF format and require Acrobat Reader