Francisco Arcediano. 21st Century Psychology: A Reference Handbook. Editor: Stephen F Davis & William Buskist. Volume 1. Thousand Oaks, CA: Sage Publications, 2008.
Artificial intelligence (AI) comprises a vast interdisciplinary field, which has benefited from its beginning from disciplines such as computer science, psychology, philosophy, neuroscience, mathematics, engineering, linguistics, economics, education, biology, control theory, and cybernetics. Although the goals of AI are as wide as the field is interdisciplinary, AI’s main goal is the design and construction of automated systems (computer programs and machines) that perform tasks considered to require intelligent behavior (i.e., tasks that require adaptation to complex and changing situations).
The role of psychology in AI is twofold. On the one hand, psychologists can help in the development and construction of AI systems because knowledge of cognitive and reasoning processes such as perception, language acquisition, and social interaction is crucial to AI. AI has much to learn from humans because we are the best model of intelligent behavior we know, and because many AI machines will have to interact with us. On the other hand, psychologists could benefit from the AI techniques and tools to develop further their own discipline using AI tools such as modeling and simulation of theories, expert systems in diagnosis and organization, and interactive techniques in education, just to mention a few.
History of AI
It seems that the desire to build machines that behave intelligently has always been a part of human history. For example, around 2500 BCE in Egypt, citizens and peregrines turned to oracles (statues with priests hidden inside) for advice. Homer’s Iliad, a remarkable literature work from ancient Greece, narrates how the Greek god Hephaestos creates Talos, a man of bronze whose duty is to patrol and protect the beaches of Crete. The idea of building humans and machines with intelligence transferred from mythology into modern literature. For example, Karel Kapek’s play R.U.R. (Rossum’s Universal Robots), which opened in London in 1923, coined the word “robot.” Shortly after, the very popular science fiction movie Metropolis, by Fritz Lang (1927), had a robot character (Maria) that played a decisive role in the plot of the movie. And, in the 1940s, Isaac Asimov started publishing his famous collection of books about robotics.
However, people not only wrote about the possibility of creating intelligent machines; they actually built them. For example, the ancient Greeks were fascinated with automata of all kinds, which they used mostly in theater productions and religious ceremonies for amusement. In the 4th century BCE, the Greek mathematician Archytas of Tarentum built a mechanical bird (a wooden pigeon) that, when propelled by a jet of steam or compressed air, could flap its wings. Supposedly, in one test, it flew a distance of 200 meters (however, once it fell to the ground, it could not take off again). Toward the end of the Middle Ages, clockmakers helped build devices that tried to mimic human and animal behavior. For example, Leonardo da Vinci built a humanoid automaton (an armored knight) around the end of the 15th century for the amusement of royalty. This armored knight was apparently able to make several humanlike motions, such as sitting up and moving its arms, legs, and neck. Reportedly, da Vinci also built a mechanical lion that could walk a programmable distance. In the early 16th century, Hans Bullmann created androids that could play musical instruments for the delight of paying customers. In the 18th century, Jacques de Vaucanson created a mechanical life-size figure (The Flute Player) capable of playing a flute with a repertoire of 12 different tunes. He also created an automatic duck (The Digesting Duck) that could drink, eat, paddle in water, and digest and excrete eaten grain.
In modern scientific AI, the first recognized work was Warren McCulloch and Walter Pitts’s 1943 article A Logical Calculus of the Ideas Immanent in Nervous Activity, which laid the foundations for the development of neural networks. McCulloch and Pitt proposed a model of artificial neurons, suggesting that any computable function could be achieved by a network of connected neurons and that all logical connectives (and, or, not, etc.) could be implemented by simple network structures. In 1948, Wiener’s popular book Cybernetics popularized the term cybernetic and defined the principle of the feedback theory. Wiener suggested that all intelligent behavior was the result of feedback mechanisms, or conditioned responses, and that it was possible to simulate these responses using a computer. One year later, Donald Hebb (1949) proposed a simple rule for modifying and updating the strength of the connections between neurons, which is now known as Hebbian learning. In 1950, Alan M. Turing published Computing Machinery and Intelligence, which was based on the idea that both machines and humans compute symbols and that this commonality should be the basis for building intelligent machines. Turing also introduced an operational strategy to test for intelligent behavior in machines based upon an imitation game known as the Turing test. (A brief description of the test and its impact on AI is discussed.) Because of the impact of his ideas on the field of AI, Turing is considered by many to be the father of AI.
The term AI was coined at the Dartmouth Summer Research Project on Artificial Intelligence in 1956 at Dartmouth College. This two-month workshop was organized by John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel Rochester and included as participants Trenchard More from Princeton, Arthur Samuel from IBM, Ray Solomonoff and Oliver Selfridge from MIT, and Allen Newell and Herbert Simon from Carnegie Tech, all of whom played fundamental roles in the development of AI. The Dartmouth workshop is considered the official birthplace of AI as a field, and it provided significant advances from previous work. For example, Allen Newell and Herbert Simon demonstrated a reasoning program, the Logic Theorist, which was capable of working with symbols and not just numbers.
The early years of AI were promising and full of successes. Both a symbolic approach (i.e., an approach that uses symbols and rules) and a subsymbolic approach (i.e., an approach that does not use rules but learns by itself) to AI coexisted with many successes. In the symbolic approach, some of the early successes include the presentation of the General Problem Solver by Newell, Shaw, and Simon (1963), a program designed to imitate human problem-solving protocols, and John McCarthy’s LISP (1958), which became one of the predominant languages in AI. Some of the early successes in subsymbolic AI include the development of the Adelines by Widrow and Hoff (1960), which enhanced Hebbs’s learning methods, and the perceptron, by Frank Rosenblatt (1962), which was the precursor of the artificial neural networks we know today.
However, by the end of the 1960s, difficulties arose as the AI promises from the decade before fell short and started to be considered “hype.” Research in subsymbolic AI was mostly relegated after Minsky and Papert formally proved in 1969 that perceptrons (i.e., simple neural networks) were flawed in their representation mechanism because they could not represent the XOR (exclusive-OR) logical problem: a perceptron could not be trained to recognize situations in which either one or another set of inputs had to be present, but not both at the same time. The discovery that AI systems were not capable of solving simple logical problems that humans can easily solve resulted in significant reductions in research funding for artificial neural networks, and most researchers of this era decided to abandon the field.
The 1970s focused almost exclusively on different techniques in symbolic AI (such as production systems, semantic networks, and frames) and the application of these techniques to the development of expert systems (also known as knowledge-based systems), problem solving, and the understanding of natural language. By the mid-1980s, interest in symbolic AI began to decline because, once again, many promises remained unfulfilled. However, artificial neural networks became interesting again through what became known as the connectionism movement, largely due to two books discussing parallel distributed processing published by Rumelhart and McClelland (1986). These books demonstrated that complex networks could resolve the logical problems (e.g., X-OR) that early perceptrons could not resolve, and allowed networks to resolve many new problems. This new impulse of AI research resulted in the development of new approaches to AI during the late 1980s and 1990s, such as the subsymbolic approaches of evolutionary computing with evolutionary programming and genetic algorithms, behavior-based robotics, artificial life, and the development of the symbolic Bayesian networks. Today, AI is becoming successful in many different areas, especially in the areas of game playing, diagnosis, logistics planning, robotics, language understanding, problem solving, autonomous planning, scheduling, and control.
Knowledge representation addresses the problem of how knowledge about the world can be represented and what kinds of reasoning can be done with that knowledge. Knowledge representation is arguably the most relevant topic in AI because what artificial systems can do depends on their ability to represent and manipulate knowledge. Traditionally, the study of knowledge representation has had two different approaches: symbolic AI (also known as the top-down approach) and subsymbolic AI (also known as the bottom-up approach).
Symbolic AI (Top-Down Approach)
Symbolic AI has been referred to also as conventional AI, classical AI, logical AI, neat AI, and Good Old Fashioned AI (GOFAI). The basic assumption behind symbolic AI is that (human) knowledge can be represented explicitly in a declarative form by using facts and rules: Knowledge, either declarative (or explicit) or procedural (or implicit), can be described by using symbols and rules for their manipulation.
Symbolic AI is traditionally associated with a top-down approach because it starts with all the relevant knowledge already present for the program to use and the set of rules to decompose the problem through some inferential mechanism until reaching its goal. A top-down approach can be used only when we know how to formalize and operationalize the knowledge we need to solve the problem. Because of its higher level of representation, it is well suited to perform relatively high-level tasks such as problem solving and language processing. However, this approach is inherently poor at solving problems that involve ill-defined knowledge, and when the interactions are highly complex and weakly interrelated, such as in commonsense knowledge, when we do not know how to represent the knowledge hierarchy, or when we do not know how to represent the mechanism needed to reach a solution. Many different methods have been used in symbolic AI.
Predicate Logic or First-Order Logic
Logic is used to describe representations of our knowledge of the world. It is a well-understood formal language, with a well-defined, precise, mathematical syntax, semantics, and rules of inference. Predicate logic allows us to represent fairly complex facts about the world, and to derive new facts in a way that guarantees that, if the initial facts are true, so then are the conclusions. There are well-defined procedures to prove the truth of the relationships and to make inferences (substitution, modus ponens, modus tollens, unification, among others).
Rule-Based Systems (Production Systems)
A rule-based system consists of a set of IF-THEN rules, a set of facts, and some interpreter controlling the application of the rules, given the facts. A rule-based system represents knowledge in terms of a set of rules that guides the system inferences given certain facts (e.g., IF the temperature is below 65 F degrees AND time of day is between 5:00 p.m. and 11:00 p.m. THEN turn on the heater). Rule-based systems are often used to develop expert systems. An expert system contains knowledge derived from an expert or experts in some domain, and it exhibits, within that specific domain, a degree of expertise in problem solving that is comparable to that of the human experts. Simply put, an expert system contains a set of IF-THEN rules derived from the knowledge of human experts. Expert systems are supposed to support inspection of their reasoning processes, both in presenting intermediate steps and in answering questions about the solution processes: At any time we can inquire why expert systems are asking certain questions, and these systems can explain their reasoning or suggested decisions.
Fuzzy logic is a superset of classical Boolean logic that has been extended to handle the concept of partial truth. One of the limitations of predicate (first-order) logic is that it relies in Boolean logic, in which statements are entirely true or false. However, in the real world, there are many situations in which events are not clearly stated and the truth of a statement is a matter of degree (e.g., if someone states a person is tall, the person can be taller than some people but shorter than other people; thus, the statement is true only sometimes). Fuzzy logic is a continuous form of logic that uses modifiers to describe different levels of truth. It was specifically designed to represent uncertainty and vagueness mathematically and provide formalized tools for dealing with the imprecision intrinsic to many problems. Because fuzzy logic can handle approximate information systematically, it is ideal for controlling and modeling complex systems in which an inexact model exists or systems where ambiguity or vagueness is common. Today, fuzzy logic is found in a variety of control applications such as expert systems, washing machines, video cameras (e.g., focus aperture), automobiles (e.g., operation of the antilock braking systems), refrigerators, robots, failure diagnosis, pattern classifying, traffic lights, smart weapons, and trains.
Semantic Networks, Frames, and Scripts
Semantic networks are graphical representations of information consisting of nodes, which represent an object or a class, and links connecting those nodes, representing the attributes and relations between the nodes. Semantic networks are often called conceptual graphs. Researchers originally used them to represent the meaning of words in programs that dealt with natural language processing (e.g., understanding news), but they have also been applied to other areas, such as modeling memory.
One interesting feature of semantic networks is how convenient they are to establish relations between different areas of knowledge and to perform inheritance reasoning. For example, if the system knows that entity A is human, then it knows that all human attributes can be part of the description of A (inheritance reasoning).
A problem with semantic networks is that as the knowledge to be represented becomes more complex, the representation grows in size, it needs to be more structured, and it becomes hard to define by graphical representations. To allow more complex and structured knowledge representation, frames were developed. A frame is a collection of attributes or slots with associated values that describe some real-world entity. Frame systems are a powerful way of encoding information to support reasoning. Each frame represents a class or an instance (an element of a class, such as height) and its slots represent an attribute with a value (e.g., seven feet).
Scripts are used to develop ideas or processes that represent recurring actions and events. They are often built on semantic networks or frames, although production systems are also common. Scripts are used to make inferences on a whole set of actions that fall into a stereotypical pattern. A script is essentially a prepackaged inference chain relating to a specific routine situation. They capture knowledge about a sequence of events, and this knowledge has been used as a way of analyzing and describing stories. Typical examples of scripts are the sequence of actions and the knowledge needed to, for example, take a flight or buy a train ticket.
Bayesian networks are also known as belief networks, probabilistic networks, causal networks, knowledge maps, or graphical probability models. They are a probabilistic graphical model with nodes, which represent discrete or continuous variables, and links between those nodes, which represent the conditional dependencies between variables. This graphical representation with nodes and links connecting the nodes provides an intuitive graphical visualization of the knowledge, including the interactions among the various sources of uncertainty. Because a Bayesian network is a complete model for the variables and their relationships, it can be used to answer probabilistic queries about them, and it can allow us to model and reason about uncertainty in complex situations. For example, a Bayesian network can be used to calculate the probability of a patient having a specific disease, given the absence or presence of certain symptoms, if the probabilistic dependencies between symptoms and disease are assumed to be known.
Bayesian networks have been used for diverse applications, such as diagnosis, expert systems, planning, learning, decision making, modeling knowledge in bioinformatics (gene regulatory networks, protein structure), medicine, engineering, document classification, image processing, data fusion, decision support systems, and e-mail spam filtering.
Subsymbolic AI (Bottom-Up Approach)
After researchers became disillusioned in the mid-1980s with the symbolic attempts at modeling intelligence, they looked into other possibilities. Some prominent techniques arose as alternatives to symbolic AI, such as connectionism (neural networking and parallel distributed processing), evolutionary computing, particle swarm optimization (PSO), and behavior-based AI.
In contrast with symbolic AI, subsymbolic AI is characterized by a bottom-up approach to AI. In this approach, the problem is addressed by starting with a relatively simple abstract program that is built to learn by itself, and it builds knowledge until reaching an optimal solution. Thus, it starts with simpler elements and then, by interacting with the problem, moves upwards in complexity by finding ways to interconnect and organize the information to produce a more organized and meaningful representation of the problem.
The bottom-up approach has the advantage of being able to model lower-level human and animal functions, such as vision, motor control, and learning. It is more useful when we do not know how to formalize knowledge and we do not know how to reach the answer beforehand.
Neural networks, or more correctly, artificial neural networks, to differentiate them from biological neural networks, are computing paradigms loosely modeled after the neurons in the brain and designed to model or mimic some properties of biological neural networks. They consist of interconnected processing elements called nodes or neurons that work together to produce an output function. The output of a neural network depends on the cooperation of the individual neurons within the network. Because it relies on its collection of neurons to perform or reach a solution, it can still perform its overall function even if some of the neurons are not functioning, which makes them able to tolerate error or failure. They learn by adapting their structure based on external or internal information that flows through the network. Thus, they are mostly used to model complex relationships between inputs and outputs or to find patterns in data.
As you may recall, one of the simplest instantiations of a neural network, the perceptrons, were very popular in the early 1960s (Rosenblatt, 1962), but interest in them dwindled at the end of the 1960s because they were not able to represent some simple logical problems (Minsky & Papert, 1969). They became hugely popular again in the mid-1980s (McClelland et al., 1986; Rumelhart et al., 1986) because the problem associated with the perceptron was addressed and because of the decreased interest in symbolic approaches to AI. The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations. Such inferences are particularly useful in applications in which the complexity of the data or task makes the design of such a function by hand impractical.
There are three major learning paradigms in the neural network field, each corresponding to a particular abstract learning task: supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the network is given a set of examples (inputs) with the correct responses (outputs), and it finds a function that matches these examples with the responses. The network infers the mapping implied by the data by using the mismatch between the mapping and the data and correcting the weight of the connection between the nodes until the network is able to match the inputs with the outputs. With this function, new sets of data or stimuli, previously unknown to the system, can be correctly classified. In unsupervised learning, the network has to learn patterns or regularities in the inputs when no specific output values are supplied or taught to the network. Finally, in reinforcement learning, the set of examples with their answers are not given to the network, but are found by interacting with the environment and integrating the instances that lead to reinforcement. The system performs an action and, based on the consequences of that action (some cost according to some dynamics), relates the stimuli (inputs) and the responses (outputs). Artificial neural networks have been applied successfully to speech recognition, image analysis, adaptive control and navigation, video games, autonomous robots, decision making, detection of credit card fraud, data mining, and pattern recognition, among other areas.
Evolutionary computation applies biologically inspired concepts from natural selection (such as populations, mutation, reproduction, and survival of the fittest) to generate increasingly better solutions to the problem. Some of the most popular methods are evolutionary programming, evolution strategies, and genetic algorithms. Evolutionary computation has been successfully applied to a wide range of problems including aircraft design, routing in communications networks, game playing, robotics, air traffic control, machine learning, pattern recognition, market forecasting, and data mining. Although evolutionary programming, evolution strategies, and genetic algorithms are similar at the highest level, each of these varieties implements evolutionary algorithms in a different manner.
Evolutionary programming, originally conceived by Lawrence J. Fogel in 1960, emphasizes the relationship between parent solutions (the solutions being analyzed) and their offspring (new solutions resulting from some modification of the parent solutions). Fogel, Owens, and Walsh’s 1966 book Artificial Intelligence Through Simulated Evolution is the landmark publication in this area of AI. In general, in evolutionary programming, the problem to be solved is represented or encoded in a string of variables that defines all the potential solutions to the problem. Each full set of variables with its specific values is known as an individual or candidate solution. To solve the problem, a population of “individuals” is created, with each individual representing a random possible solution to the problem. Each of the individuals (i.e., each candidate solution) is evaluated and assigned a fitness value based on how effective the candidate solution is to solving the problem. Based on this fitness value, some individuals (usually the most successful) are selected to be parents, and offspring are generated from these parents.
In the generation process, a mutation operator selects elements of the parents’ representation of the solution and manipulates these elements when they are transferred to the offspring. A mutation operator is a rule that selects random variables and randomly alters the value of these variables in some degree, generating new individuals or candidate solutions from the selected parents. Thus, some characteristics of the parent solutions are changed slightly and then transferred to the offspring solution. In general, the degree of mutation is greater in the first generations, and it is gradually decreased as generations evolve and get closer to an optimal solution. The offspring candidate solutions are then evaluated based on their fitness, just like their parents were, and the process of generating offspring from the parents is repeated until an individual with sufficient quality (an optimal solution to the problem) is found or a previously determined computational limit is reached (e.g., after evolving for a given number of generations).
Evolution strategies (Bäck, Hoffmeister, & Schwefel, 1991; Rechenberg, 1973) and evolutionary programming share many similarities. The main difference is that, in evolution strategies, offspring are generated from the selected parents not only by using a mutation operator but also by recombination of the code from selected parents through a crossover operator. A crossover operator applies some rule to recombine the elements of the selected parents to generate offspring. The recombination operation simulates some reproduction mechanism to transfer elements from the parents to their offspring.
The crossover operator can take many variants (e.g., interchange the first half of the elements from one of the parents and the second half from the other one for one offspring; the reverse for another offspring). The crossover operator is inspired by the role of sexual reproduction in the evolution of living things. The mutation operator is inspired by the role of mutation in natural evolution. Generally, both mutation and reproduction are used simultaneously. Recombination and mutation create the necessary diversity and thereby facilitate novelty, while selection acts as a force increasing quality.
Genetic algorithms, popularized by John Holland (1975), are similar to evolution strategies in the general steps that the algorithm follows. However, there are substantial differences in how the problem is represented. One of the main differences is that the problem to be solved is encoded in each individual of the population by having arrays of bits (bit-strings), which represent chromosomes. Each bit in the bit-string is analogous to a gene (i.e., an element that represents a variable or part of a variable of the problem). In a genetic algorithm, each individual or candidate solution is encoded at a genotype level, whereas in evolutionary programming and evolution strategy, the problem is encoded at a phenotype level, in which there is a one-to-one relationship between each value encoded in the phenotype and the real value that it represents in the problem. A genetic algorithm can differentiate between genotype (the genes) and phenotype (the expression of a collection of genes). The manipulation at the level of genotype allows for more elaborated implementation of the crossover operator and the mutation operator. Additionally, the focus on genetic algorithms when creating offspring in successive generations is on reproduction (crossover) and no mutation, which is often considered as a background operator or secondary process.
Particle Swarm Optimization
PSO applies the concept of social interaction to problem solving. It was developed in 1995 by James Kennedy and Russ Eberhart, and it is inspired by the social behavior of bird flocking and fish schooling.
PSO shares many similarities with evolutionary computation techniques. The system is initialized with a population of random potential solutions (known in this framework as particles), which searches for an optimal solution to the problem by updating generations. However, unlike evolutionary computing, PSO has no evolution operators such as crossover and mutation. In PSO, the particles fly through the problem space by following the current best solution (the particle with the best fitness). Each particle (individual) records its current position (location) in the search space, records the location of the best solution found so far by the particle, and a gradient (direction) in which the particle will travel if undisturbed. In order to decide whether to change direction and in which direction to travel (searching for the optimal solution), the particles work with two fitness values: one for that specific particle, and another for the particle closer to the solution (best candidate solution). Thus, particles can be seen as simple agents that fly through the search space and record (and possibly communicate) the best solution that they have discovered so far. Particles travel in the search space by simply adding their vector (the direction in which they were traveling) and the vector (direction) of the best solution candidate. Then, each particle computes its new fitness, and the process continues until the particles converge on an optimal solution.
Behavior-based AI is a methodology for developing AI based on a modular decomposition of intelligence. It was made famous by Rodney Brooks at MIT (see Brooks, 1999, for a compendium of his most relevant papers in the topic), and it is a popular approach to building simple robots, which surprisingly appear to exhibit complex behavior. The complexity of their behavior lies in the perception of the observer, not in the processing mechanism of the system. This approach questions the need for modeling intelligence using complex levels of knowledge representation and the need for a higher cognitive control. Brooks presents a series of simple robots that mimic intelligent behavior by using a set of independent semi-autonomous modules, which interact independently with the environment and are not communicating information at any higher level. For example, a spiderlike robot will navigate through a path with obstacles just by each of its legs addressing its own situation, without a mechanism relating what each other leg knows about the environment. This approach has been successful when dealing with dynamic, unpredictable environments. Although behavior-based AI has been popular in robotics, it also can be applied to more traditional AI areas.
AI in Psychology
The greatest impact of AI in psychology has been through the development of what has come to be known as the information processing paradigm or the computer analogy. Once computers started to be perceived as information-processing systems able to process symbols and not just numbers, an analogy between computers and the human mind was established. Both systems receive inputs (either through sensors or as the output from other processes or devices), process those inputs through a central processing unit, and generate an output (through motor responses or as the input for other processes or devices). The idea that the mind works on the brain just as a program works on a computer is the focus of cognitive psychology, which is concerned with information-processing mechanisms focusing especially on processes such as attention, perception, learning, and memory. It is also concerned with the structures and representations involved in cognition in general.
One of the dominant paradigms in psychology, before the surge of AI at the end of the 1950s, was behaviorism, in which the focus was on the study of the responses of the organism (behavior) given particular inputs (stimuli). Its main assumption was that, because researchers can only scientifically study what they can observe and measure, behavior should be the only subject matter of study of scientific psychology. With cognitive psychology and the computer analogy, the focus started to shift toward the study of mental processes, or cognition. Cognitive psychology is interested in identifying in detail what happens between stimuli and responses. To achieve this goal, psychological experiments need to be interpretable within a theoretical framework that describes and explains mental representations and procedures. One of the best ways of developing these theoretical frameworks is by forming and testing computational models intended to be analogous to mental operations. Thus, cognitive psychology views the brain as an information-processing device that can be studied through experimentation and whose theories can be rigorously tested and discussed as computer programs.
A stronger focus on computer modeling and simulations and on study of cognition as a system resulted in the development of cognitive science. Cognitive science is an interdisciplinary discipline concerned with learning how humans, animals, and machines acquire knowledge, represent that knowledge, and how those representations are manipulated. It embraces psychology, artificial intelligence, neuroscience, philosophy, linguistics, anthropology, biology, evolution, and education, among other disciplines.
More recently, AI philosophy and techniques have impacted a new discipline, cognitive neuroscience, which attempts to develop mathematical and computational theories and models of the structures and processes of the brain in humans and other animals. This discipline is concerned directly with the nature of the brain and tries to be more biologically accurate by modeling the behavior of neurons, simulating, among other things, the interactions among different areas of the brain and the functioning of chemical pathways. Cognitive neuroscience attempts to derive cognitive-level theories from different types of information, such as computational properties of neural circuits, patterns of behavioral damage as a result of brain injury, and measures of brain activity during the performance of cognitive tasks.
A new and interesting approach resulting from developments in computer modeling is the attempt to search and test for a so-called unified architecture (a unified theory of cognition). The three most dominant unified theories of cognition in psychology are SOAR (based on symbolic AI), PDP (based on subsymbolic AI), and ACT-R (originally based on symbolic AI, currently a hybrid, using both symbolic and subsymbolic approaches).
SOAR (State Operator and Result)
SOAR (Laird, Newell, & Rosenbloom, 1987; Newell, 1990) describes a general cognitive architecture for developing systems that exhibit intelligent behavior. It represents and uses appropriate forms of knowledge such as procedural, declarative, and episodic knowledge. It employs a full range of problem-solving methods, and it is able to interact with the outside world.
PDP (Parallel Distributed Processing)
In the 1980s and 1990s, James L. McClelland, David E. Rumelhart, and the PDP Research Group (McClelland et al., 1986; Rumelhart et al., 1986) popularized artificial neural networks and the connectionist movement, which had lain dormant since the late 1960s. In the connectionist approach, cognitive functions and behavior are perceived as emergent processes from parallel, distributed processing activity of interconnected neural populations, with learning occurring through the adaptation of connections among the participating neurons. PDP attempts to be a general architecture and explain the mechanisms of perception, memory, language, and thought.
ACT-R (Adaptive Control of Thought-Rational)
In its last instantiation, ACT-R (Anderson et al., 2004; Anderson & Lebiere, 1998) is presented as a hybrid cognitive architecture. Its symbolic structure is a production system. The subsymbolic structure is represented by a set of massive parallel processes that can be summarized by a number of mathematical equations. Both representations work together to explain how people organize knowledge and produce intelligent behavior. ACT-R theory tries to evolve toward a system that can perform the full range of human cognitive tasks, capturing in great detail how we perceive, think about, and act on the world. Because of its general architecture, the theory is applicable to a wide variety of research disciplines, including perception and attention, learning and memory, problem solving and decision making, and language processing.
The benefits of applying computer modeling to psychological hypotheses and theories are multiple. For example, computer programs provide unambiguous formulations of a theory as well as means for testing the sufficiency and consistency of its interconnected elements. Because computer modeling involves using a well-formulated language, it eliminates vagueness and highlights hidden or ambiguous intermediated processes that were not previously known or made explicit with the verbal description of the theory. Explicit formulations also allow researchers to falsify (i.e., test) the theory’s assumptions and conclusions. Additionally, given the same data set, alternative programs/theories can be run through the data to analyze which hypotheses are more consistent with the data and why.
Computer modeling has focused mostly in the areas of perception, learning, memory, and decision making, but it is also being applied to the modeling of mental disorders (e.g., neurosis, eating disorders, and autistic behavior), cognitive and social neuroscience, scientific discovery, creativity, linguistic processes (e.g., dyslexia, speech movements), attention, and risk assessment, among other fields. Importantly, the impact of AI on psychology has not been relegated to only theoretical analyses. AI systems can be found in psychological applications such as the development of expert systems for clinical diagnosis and education, human-computer interaction and user interfaces, and many other areas, although the impact in application areas is not as extended as might be expected—there seems to be much work to do in this area.
Criticisms of AI
Is true AI possible? Can an AI system display true intelligence and consciousness? Are these functions reserved only to living organisms? Can we truly model knowledge?
Weak AI versus Strong AI
The weak AI view (also known as soft-cautious AI) supports the idea that machines can be programmed to act as if they were intelligent: Machines are only capable of simulating intelligent behavior and consciousness (or understanding), but they are not really capable of true understanding. In this view, the traditional focus has been on developing machines able to perform in a specific task with no intention of building a complete system able to perform intelligently in all or most situations.
Strong AI (also known as hard AI) supports the view that machines are really intelligent, and that, someday, they could have understanding and conscious minds. This view assumes that all mental activities of humans can be eventually reducible to algorithms and processes that can be implemented in a machine. Thus, for example, there should be no fundamental differences between a machine that emulates all the processes in the brain and the actions of a human being, including understanding and consciousness.
One of the problems with the strong AI view centers on the following two questions: (a) How do we know that an artificial system is truly intelligent? (b) What makes a system (natural or not) intelligent? Even today, there is no clear consensus on what intelligence really is. Turing (1950) was aware of this problem and, recognizing the difficulties in agreeing on a common definition on intelligence, proposed an operational test to circumvent this question. He named this test the imitation game and it was later known as the Turing test.
The Turing test is conducted with two people and a machine. One person, who sits in a separate room from the machine and the other person, plays the role of an interrogator or judge. The interrogator knows the person and the machine only as A and B, respectively, and has no way of knowing beforehand which is the person and which is the machine. The interrogator can ask A and B any question she wishes. The interrogator’s aim is to determine which one is the person and which one is the machine. If the machine fools the interrogator into thinking that it is a person, then we can conclude that the machine can think and it is truly intelligent, or at least as intelligent as the human counterpart. The Turing test has become relevant in the history of AI because some of the criticisms and philosophical debates about the possibilities of AI have focused on this test.
Criticisms of the Strong AI View
For many decades it has been claimed that Gödel’s (1931) incompleteness theorem precludes the possibility of the development of a true artificial intelligence. The idea behind the incompleteness theorem is that within any given branch of mathematics, there will always be some propositions that cannot be proved or disproved within the system using the rules and axioms of that mathematical branch itself. Thus, there will be principles that cannot be proved in a system, and the system cannot deduce them. Because machines do little more than follow a set of rules, they cannot truly emulate human behavior. Human behavior is too complex to be described by any set of rules.
Machines Cannot Have Understanding—The Chinese Room Argument
John Searle stated in his Chinese room argument that machines work with encoded data that describe other things, and that data are meaningless without a cross-reference to the things described. This point led Searle to assert that there is no meaning or understanding in a computational machine itself. As a result, Searle claimed that even a machine that passes the Turing test would not necessarily have understanding or be conscious. Consciousness seems necessary in order to show understanding.
In his thought experiment, Searle asked people to imagine a scenario in which he is locked in a room and receives some Chinese writing (e.g., a question). He does not know Chinese, but he is given a rule book with English instructions that allows him to correlate the set of Chinese symbols he receives with another set of Chinese symbols (e.g., by their shape), which he would give back as an answer. Imagine the set of rules to correlate both sets of symbols is so advanced that it allows him to give a meaningful answer to the question in Chinese. Imagine the same thing happening with an English question and its answer. For a Chinese and an English observer, respectively, the answers from Searle are equally satisfying and meaningful (i.e., intelligent). In both situations Searle would pass the Turing’s test. However, can we say Searle understands Chinese in the same way that he understands English? Actually, he does not know any Chinese. A computer program would behave similarly by taking a set of formal symbols (Chinese characters) as input and, following a set of rules in its programming, correlating them with another set of formal symbols (Chinese characters), which it presents as output. In this thought experiment, the computer would be able to pass the Turing test. It converses with Chinese speakers, but they do not realize they are talking with a machine. From a human observer’s view, it seems that the computer truly understands Chinese. However, in the same way that Searle does not understand Chinese, the machine does not understand Chinese either. What the computer does is mindless manipulation of the symbols, just as Searle was doing. Although it would pass the Turing test, there is no genuine understanding involved.
Several other authors have argued against the possibility of a strong AI (see, e.g., Penrose, 1989). The arguments denying the possibility of a strong AI and their counterarguments have populated the AI literature, especially the philosophical discussion of AI.
Machines able to sense and acquire experiences by interacting with the world should be able to solve most of these criticisms. In fact, most AI researchers are concerned with specific issues and goals to resolve, such as how to improve actual systems or how to create new approaches to the problem of AI, and not passing the Turing test and proving that AI systems can be truly intelligent. Additionally, the Turing test has been seriously criticized as a valid test for machine intelligence. For example, Ford and Hayes (1998) highlighted some of the problems with the Turing test. The central defect of the Turing test is its species-centeredness. It assumes that human thought is the highest achievement of thinking against which all other forms of thinking must be judged. The Turing test depends on the subjectivity of the judge, and it is culture-bound (a conversation that passes the test in the eyes of a British judge might fail it according to a Japanese or Mexican judge). More important, it does not admit as measures of intelligence weaker, different, or even stronger forms of intelligence than those deemed human. Ford and Hayes made this point clearly when they compared AI with artificial flight. In the early thinking about flight, success was defined as the imitation of a natural model: for flight, a bird; for intelligence, a human. Only when pioneers got some distance from the model of a bird did flying become successful. In the same analogy, you do not deny an airplane can fly just because it does not fly like a bird does. However, truly artificial intelligence is denied if it does not parallel that of a human being.
Many supporters of strong AI consider that it is not necessary to try to isolate and recreate consciousness and understanding specifically. They accept that consciousness could be a by-product of any sufficiently complex intelligent system, and it will emerge automatically from complexity. They focus on the analogy of the human brain. In humans, a single neuron has nothing resembling intelligence. Only when billions of neurons combine to form a mind does intelligence emerge. It would appear, then, that the brain is more than the sum of its parts. Intelligence emerges with sufficient joint complexity of neurons.
In the final analysis, the debate of the possibility of a strong AI seems to be based on the classical mind/body problem. On the one side, we have the physicalism view (the belief that nothing exists but the physical world that can be studied scientifically; mental states are assumed to mirror brain states). In this view, strong AI could be eventually possible. On the other side, we have the view based on dualism (mind and matter are not the same thing). In this view, strong AI is not possible.
The most important influence of AI in psychology has been the computer metaphor, in which both living organisms and computers can be understood as information processors. The information-processing approach has brought psychology new paradigms such as cognitive psychology and cognitive science. The goal of these disciplines is to learn what happens within the organism, at the level of the brain. Brain or mental processes cannot be observed directly, but computer formalization of theories of how mental processes work can be tested scientifically.
Although AI and cognitive science share techniques and goals, there is at least one fundamental difference between them. Psychology has the restriction that its computer programs and simulations have to achieve the same results as the simulated system (human and animal cognition), and they have to do it following the same processes. Only when there is a match with real processes can psychologists assume that the relationships proposed in the program were correct. AI does not have a restriction of similarity of processes: Psychological theory must be able to predict error empirically. AI focuses on efficiency; psychology focuses on plausibility.
Additionally, computer modeling and the focus on brain structure and their function have given rise to the field of neuroscience. Neuroscience is one of the most rapidly emerging disciplines in psychology, largely due to computer modeling. Computer modeling allows for the formalization of models and the testing of hypotheses about how neurons work together. The new computational techniques applied to brain activity measurement (e.g., fMRI) allow researchers to observe how brain structures actually work. Thus, formal models that represent theories and hypotheses can be tested against real data.
The impact of cognitive modeling in psychology should increase as psychological models move forward toward a unified theory of cognition. Only through computer formalizations and implementations does it seem plausible to work on a unified view of psychology in which all the theories might be integrated in a single framework. One of the underlying assumptions of computer modeling is that the role of formal modeling of theories and hypotheses in psychology plays a role similar to the one of mathematics in the physical sciences.
Although computer modeling has strongly influenced AI in psychology, the impact of AI in psychology goes far beyond computer modeling. AI techniques can be applied to many areas of psychology, especially those that focus on diagnosis, classification, and decision making.