How to Out-Smart a Smart Animal

Testing the intelligence of intelligent animals, by intelligent animals, is a conundrum…

How smart is your neighbor, coworker, boss, dog, hamster, goldfish…

One of the basic questions we ask of each other, competitively or derisively or respectfully, and of almost any living thing is, how smart are you? You would think that a bunch of really smart scientists would have figured out after a couple hundred years how to test the intelligence of what we consider less intelligent animals.

But it seems perhaps the smartest of us have been continually out-smarted by our test subjects.

Or, perhaps, this is a very hard question.

1. Human Intelligence Tests

Measuring human intelligence is embroiled in one of those seemingly intractable political, social, and ethical fracases. Standardized intelligence tests were first developed in the late 1800s by the apparently ubiquitous Francis Galton, who also invented eugenics which we rode like a greased handrail right into the Holocaust and similar human rights violations in the US. Galton was a cousin of Charles Darwin, and was a brilliant polymath who appeared to lack an ethical compass for his brilliance. Intelligence tests made their first foray onto the broader world stage during World War I, and was used to more reliably link recruits to assignments — especially selecting candidates for officer training.

Galton was the first to establish a psychometric center as a means of testing human intelligence. This of course led directly to the controversy over the racial bias of intelligence testing and the 1994 book The Bell Curve which falsely claimed a link between race and intelligence.

Please note — the fact that IQ tests broadly do not test intelligence per se but are powerfully affected by income, zip code, health, and a whole host of other factors, has a direct analogy to our use of tests to understand animal intelligence.

One of the outcomes of psychometrics was the idea that a general intelligence factor, called g, has a large (40–50%) influence on an individual’s success in a wide range of tasks. This is controversial today, but it has triggered a search for a similar g factor in animals.

In this article we will politely sidestep the landmine of human intelligence testing and focus exclusively on animal intelligence, as in non-human animals.

2. Animal Intelligence Tests

I just wrote a review of a comprehensive intelligence test of one of the smartest of all animals, the raven, which the authors compared to the current gold-standard of non-human brainiacs, the chimpanzee:

Bird Brains May Be Smarter Than We Thought

Four-month-old ravens may be as smart as adult chimps.

medium.com

I summarized some of the key weaknesses of these intelligence tests which the authors themselves highlighted, and wanted to dive into them a bit more here.

First, let’s examine how one of the most widely used and advanced animal intelligence tests works. The most comprehensive and current test for animal intelligence is called the Primate Cognition Test Battery (PCTB), developed by Esther Herrmann at the Max Planck Institute for Evolutionary Anthropology. Let’s look at how a couple examples of the two dozen tests in this battery are conducted, here on lemurs:

The first test in that video assessed spatial memory, described as follows in the journal PLOS:

“…Three cups were placed in a row on the platform in front of the testing cage. The experimenter then showed the subject two rewards and placed them under two of the three cups in full view of the subject. Then the platform was pushed towards the subject and it was allowed to make up to two choices in succession. If, however, the subject chose the empty cup first, it was not allowed to make further choices. The response was counted as correct when the subject had chosen both baited cups in succession…”

Using these and similar tests, Herrmann showed that a two and a half-year-old human child has similar cognitive abilities about the physical world as a ten-year-old chimpanzee. However, when it came to the social world, human children far outperformed both chimpanzees and orangutans.

Performance of human child compared to adult apes in physical and social intelligence (image from Herrmann et al, 2007)

The tasks within the PCTB for testing the physical domain included the following (for details of these tasks read here):

Spatial memory
Object permanence
Rotation
Transposition
Relative numbers
Addition numbers
Noise
Shape
Tool use
Tool properties

The tasks in the PCTB to test the social domain included the following:

Social learning
Comprehension
Pointing cups
Attentional state
Gaze following
Intentions

These primate tests are so far the most comprehensive, and have failed to detect a g factor, the general intelligence factor attributed to humans. The PCTB instead found distinct domains, or modules, perhaps we can call them building-blocks, of intelligence. The PCTB assumed a social and physical domain as separate modules in primate intelligence.

3. Important Limitations…

The authors of the corvid intelligence test, based on the PCTB, illuminated the limitations of their study and about testing animal intelligence in general:

The PCTB tests assume a natural division into social and physical intelligence, which may not be true.
Interference between subject and tester may overshadow test validity.
Behavior can be driven by completely different wiring and mechanisms of intelligence which current testing is blind to.

The limits and constraints of a test sound like a poor subject to spend much time on. Why not spend more time on the test itself?

Think of an intelligence test as a piece of technology. The purpose of this technology is to characterize animal intelligence. Only by critically examining the limits and constraints, can we improve and make significant progress past those very limits.

4. Social and Physical Intelligence…

The PCTB tests assumed a natural division into social and physical intelligence, which is a logically satisfactory assumption. However, reanalysis of the chimpanzee and human child data debunked that hypothesis. This is also a fantastic illustration of how to thoroughly and dispassionately challenge your own data and conclusions, and honestly publicize the results even if it disproves your own previous conclusions.

Herrmann et al. in 2009 published a re-analysis of their 2007 paper, showing that their original assumptions were incorrect. Herrmann’s key hypothesis when she designed the PCTB was that primate cognitive skills could be broadly divided into two independent groups or domains: social and physical. They then used a statistical method called confirmatory factor analysis (CFA), which does exactly that, confirm whether the original factors of an analysis were correctly chosen. Herrmann showed with CFA that there was actually another factor, spatial cognition, and that social cognition was not separate from physical cognition. This raised the possibility that our intelligence about the physical world evolved in lockstep with our social intelligence.

This revelation triggered a host of discussions on how to improve animal intelligence tests. A 2016 paper by Jelbert et al., published in Biology Letters, pointed out that despite revolutionary advances in our understanding of the evolution of intelligence, the cognitive tests themselves are the key limitation to our progress. They stated:

“…If factors other than cognition can systematically affect the performance of a subset of animals on these tests, we risk drawing the wrong conclusions about how intelligence evolves…”

Jelbert showed with New Caledonian crows that the type of training a bird received influenced their ability to pass a basic self-control test called an A-not-B task. In this test, the animal is repeatedly shown and trained to a treat placed under cup A, and then on the test run, the treat is placed under cup B. Does the animal make an error of choosing cup A, or correctly choose cup B? Crows trained to track hands then passed the A-not-B task at a higher rate than crows trained to an unrelated skill.

The importance of Jelbert’s study was to show that pre-existing but overlooked skills in an animal will determine its performance in a given test, not the intelligence of the animal.

A 2017 paper by Rachael Shaw and Martin Schmelz, in the journal Animal Cognition, also argue for the need to improve animal cognitive tests on multiple fronts including the design of the tasks, the cognitive domain targeted by the test, and even the species tested.

Shaw and Schmelz point out that unlike the language-based psychometric tests for humans, we need behavioral-based tests for animal intelligence. And that using behavior in a test poses tremendous difficulties in inferring intelligence.

5. Tester and Tested…

Pika et al. also point to the raven’s naturally competitive social life, with each raven looking at each other as both social partner as well as competitor for resources. The hand-raised ravens thus may have seen the experimenter as a competitor for the food reward — and this factor may have overshadowed the effects of intelligence alone in test performance.

Others have pointed out that primates also have a highly competitive social environment, and that a competitive component should be included in experimental designs. Brian Hare published a 2001 paper in the journal Animal Cognition suggesting and evaluating the inclusion of competition in primate studies. Hare states:

“…Primate social life is highly competitive. This means that all aspects of primates themselves, including their cognitive abilities, have likely been shaped by the need to out-compete conspecifics [other members of the species]…”

Practically, this means that the laboratory staff who raise primates may be seen as members of their social network and conspecifics and competitors, and thus the behavior of the human tester during the test will highly influence the behavior of the animal. Therefore, the test administered by the human is not really testing intelligence, but a social interaction.

Another insight into the interference of tester and tested, is that the raven’s performance during their CCTB test differed tremendously from parrots, which are also generally known for their outstanding intelligence.

A 2019 paper by Krasheninnikova et al., published in Behaviour, showed that applying the PCTB to four different parrot species showed that all failed to perform better than random chance. This is in contrast to the ravens, in which their test performance was clearly better than random chance. The raven and parrot studies varied considerably in the details of their execution. The raven study was performed by the same people who hand-raised the birds. The parrot study was done by people both familiar and unfamiliar to the birds, none of whom raised them. These details of human interactions with test subjects likely overwhelmed the validity of the parrot tests.

The raven’s ability to track the gaze of another animal is another critical and confounding skill. Since ravens are exceptionally competitive with each other, it makes sense that they developed an ability to follow the gaze of other ravens. Where a raven looks is probably where the food is, so ravens learned to pay attention to where other ravens look. A study by Christian Schloegel et al. found that ravens developed an ability to follow another raven’s gaze by eight weeks of age, but took another seven weeks before they could also follow a human gaze.

Therefore, it is likely that these intelligence tests in many social animals with gaze-following abilities are not testing intelligence — but gaze-following.

5. Wiring of Intelligence…

Although the raven’s experimental scores were similar to great apes, Pika et al. did not claim that the intelligence of these different species are generally similar. The same behavior can be driven by completely different wiring and mechanisms of intelligence.

A 2009 paper by Amanda Seed et al., published in Ethology, acknowledged that corvids and primates both “solve social and physical problems with similar speed and flexibility,” but they also noted that we don’t currently understand the “representations and algorithms underpinning these computations in both groups”. Seed et al. use the analogy of winged flight in different species as examples of convergent evolution, repeated invention of similar structures to solve the problem of flight in bats versus birds. Similarly, they say that intelligence likely evolved multiple times, independently in species as different as primates and corvids but functioning similarly despite very different biological structures, the wiring, for each.

A 2016 paper by Onor Gunturkun and Thomas Bugnyar in Trends in Cognitive Sciences, noted that birds lack the neocortex found in primates, but still perform similarly in measures such as “delay of gratification, mental time travel, reasoning, metacognition, mirror self-recognition, theory of mind, and third-party intervention”. Bird and mammals have evolved separately for over 300 million years, and both have become highly successful, often top predators in their respective niches, due in part to their cognitive abilities that let them quickly adapt new methods to excel in changing environments. The vast differences in brain architecture reflects this long separation in time, and the question is how intelligence evolved separately in each class of animal.

A 2012 paper by Vanessa Schmitt et al., in PLOS One, showed the surprising results that old-world monkeys (long-tailed macaques and olive baboons) performed similarly to chimpanzees in the PCTB, and that the chimpanzee exceeded the monkeys only in spatial cognition and tool use. This contradicted the prediction that cognitive performance is related to brain size, predicting that the chimpanzee would be far ahead of monkeys. Schmitt noted, however, that the PCTB tests were designed with human-specific skills thus perhaps “underestimating both true nonhuman primate competencies as well as species differences”.

What is clear is that the behavioral cognitive tests are unable to tease apart the differences in the underlying biology which directs the cognitive effort. Two cars can pass a minimum threshold acceleration, braking, and cornering tests with similar pass/fail scores, yet one is driven by a gasoline engine, and the other by an electric motor. Those three automotive tests do not shed any light on the unique performance characteristics of the electric vehicle or the limitations of the conventional car, and can’t reveal what is under the hood.

5. What next…

For now, our smartest cognitive scientists are outwitted by the intelligent animals they study. We know the light is on, but we don’t know how that light is wired, and how that cognitive light affects the behavior of the animal.

How do we study intelligence when we don’t have a precise definition of intelligence? We don’t know how intelligence is built and how it operates, and therefore we can’t a-priori design a precise test to gauge its extent, limits, and effects.

We design intelligence tests based on our best guesses about the components of intelligence (for example, Herrmann dividing it into social and physical domains), only to discover afterwards that there is a completely different factor (spatial), and that the two domains of social and physical are not really separate but tightly linked.

We are progressing by braille, feeling our way in the dark. We lack intelligence about intelligence, we lack light on the light behind our eyes. But by fits and starts, we are making progress. True, we may end up in a dead end, a wrong turn that takes us back to the beginning where we must reformulate all our assumptions.

But so it is with science in general. We all started in the dark on any subject, and made our first steps into the unknown without knowing what lay ahead. That is both the mystery and romance of science and of the world we live in.