Followup to What Intelligence Tests Miss.


Tripartite model seems largely the same as previous book.

Innate responses tend to be efficient and normative in environments that aren’t actively hostile. Arguable that modern environments are often actively hostile and deliberately try to exploit edge cases in heuristics eg marketing, politics.

Normative response in hostile environment requires that the correct mindware is available, either innate or installed. Zones of mindware instantiation:

In middle zone, normative response is gated on sustaining override, which we already believe to be strongly correlated with IQ. Also suspect that mindware acquisition is related to IQ. So any rationality test is going to strongly correlate with IQ.

Taxonomy of thinking errors:

Should failure to correctly execute mindware come under it’s own heading? Eg if I know about Bayes rule but the calculations are too complicated to carry out in my head.

Can arrange CART subtests on two axes: knowledge dependence (mindware) and processing requirements (detect + initiate + sustain).

There are a bunch more charts and taxonomies and flowcharts and they don’t all seem to agree. Feels a little muddy.

Emphasizes that all of this so far is conjecture from limited evidence. Have to start somewhere.


Probabilistic and statistical reasoning:

(Very generous range of accepted answers - focus on getting the right magnitude or direction)

Scientific reasoning:

Avoidance of miserly information procession (direct):

Avoidance of miserly information procession (indirect):

Knowledge tests:

Contaminated mindware:

Dispositions and attitudes:

(All self-reported. Differs from other tests in that it’s not a ‘more is better’ test, so not folded into final score. BUT most people are probably short of optimum point in modern environment and there are interesting correlations with other tests.)

Skipping much of the detail on scoring mechanisms - not very compressible and hard to assess.


CART takes up to 3 hours. Tested on 350 paid volunteers in lab, and 397 mechanical turkers. Includes self-reported SAT score, missing for half of the turkers. Turkers also took a short cognitive ability test, focused mostly on verbal.

Short-form CART up to 2 hours. Keeps core reasoning, contaminated mindware, numeracy and AOM thinking tests. Skips knowledge-heavy, non-AOM disposition and difficult-to-score tests. Includes self-reported SAT score. Tested on 372 uni students.

Skipping much of the detail on results - hard to assess without a particular question in mind.

Used multiple regression to compare SAT score and AOT score as predictors of other tests. Surprisingly, on many of the tests they get similar beta weights. AOT is a better predictor of full CART score. AOT didn’t correlate strongly with impression management test (but impression management test is also self-reported?).

Principle component analysis is not super revealing. Groups into, roughly: most of the actual tests, the self-reported mindware contamination tests, other stuff.

Students in later years score significantly higher. Makes sense - much of that mindware is supposed to be taught in uni.

Gender difference in scores exist after controlling for cognitive ability and years of education. Female scores significantly higher on temporal discounting. Male scores significantly higher on prob/stat reasoning, reflection vs intuition, practical numeracy, financial literacy / economic knowledge. Cognitive ability scores were higher in general for males, so not sure how much to trust the controlling. Sounds similar to the gender gap across different STEM subjects.

Correlation between full and short CART was 0.97. Severely shortened 38-point test made of prob/stat reasoning and scientific reasoning correlates 0.79 with remaining tests and 0.88 with full CART, and has significant betas when regressed alongside cognitive ability and AOT.


CART aims to be broad rather than deep, so chooses cheap subtests rather than the best known test of each aspect.

Compares CART to Decision-Making Competence Scale for Adults and Halpern Critical Thinking Assessment. Argues A-DMC is heavily process loaded. Most A-DMC tests are also tested in CART. Social norm test is not in CART because it doesn’t fit in their tripartite model of rationality. CART mostly dominates HCTA in coverage, excluding the real-world problem solving section which is arguably hard to score objectively.

Notes that their samples are pretty non-representative of the general population. But most potential uses of the CART would at least be on university-educated subjects.

Notes that the correlations with cognitive ability might be increased by adapting existing tests to be within-subject rather than between-subject - higher ability subjects are more likely to notice the structure of the test.

CART vs Great Rationality Debate. Argues that System 2 is more aligned to the persons overall goals, so override should generally be preferred. My System 2 is maybe more aligned to what I think my goals are - that doesn’t mean that it’s more likely to maximize my happiness or satisfaction. Sometimes my conscious goals are dumb. Also, subjects often retrospectively endorse the normative response and the axiom behind it.

Thin rationality - make choices that maximize your values/goals/desires. Broad rationality - are your values/goals/desires dumb? CART focuses on thin notion of rationality to avoid grappling with unsolved philosophical questions.

Justifies relatively low (0.75 and 0.67) reliability of prob/stat and scientific reasoning tests - they are an amalgam of different skills and mindwares, rather than a single skill/mindware that is tested in multiple ways. No reason to expect something like a g factor for rationality - too many independent supporting skills and mindwares. Eg teaching someone about Bayes rule should not be expected to improve their temporal discounting score. Might still be high correlations between skills, which is useful for quickly measuring but still need to test all skills if you care about improving rationality.

Many tests they would like to include were just too unwieldy to administer (eg requiring multiple testing sessions) or had normative responses that were still under dispute (eg unrealistic optimism).

Correlation with IQ is 0.50 for short-form and 0.69 for full-form. Would be happy to find that the CART only measure intelligence, because it would mean that intelligence could be measured with a test that has more obvious real-world applicability than eg obscure vocab questions and so would be more widely accepted. Also plenty of room for dissociation in there eg reading comprehension correlates with IQ at 0.6-0.7 but we treat dyslexia (high IQ, low reading comprehension) as a useful concept.

Possibly susceptible to coaching effects but argues that coaching for CART (at least the non-self-reported parts) should also increase rationality.


This predictive power of the AOT test is really weird! Self-reported open-mindedness is a good predictor on all the hard tests, including numerical reasoning, even when controlling for SAT score. Is using multiple regression like this legit? Are we just measuring a correlation with IQ or education or studiousness? Or is an open-minded disposition a result of being better able to examine and suppress gut reactions to new ideas? Also, AOT is not supposed to be a more-is-better test but it sure looks like one. Is the average student just very close-minded?

The previous book hammered pretty strongly on the idea that various rationality subtests had low correlation with IQ. This book seems to be walking that claim back a little. The CART has been criticized for it’s strong correlation with IQ compared to the original framing of rationality being largely orthogonal to IQ ie ‘smart people acting dumb’. I think Stanovich’s explanation is reasonable and it seems valuable to attempt to expand IQ tests to include hostile environments.

A more compelling criticism for me is that they intend their tests to be typical conditions rather than maximal conditions, but to anyone with any relevant education most of the questions clearly signal what is being tested. The fact that I can apply Bayes law in a clearly coded test question does not mean that I will habitually detect real-life situations that require it. The CART is very much a maximal condition test for me.

Still, if I was hiring for a position where rationality was important I would probably be tempted to administer at least the short-form CART, for exactly the reasons that Stanovich discusses at the end - even if it’s basically an IQ test, it’s an IQ test in a form that is more clearly relevant to on-the-job performance.