http://www.amazon.com/dp/0374112673/

Scientific and philosophical concepts can change the way we solve problems by helping us to think more effectively about our behavior and our world. Surprisingly, despite their utility, many of these tools remain unknown to most of us.

Chapters

Introduction

Can rationality be taught?

Previously believed that any mental exercise increased intelligence. Common justification for teach Latin. No evidence whatsoever.

Thorndike demonstrated that training in one cognitive task does not transfer to another, but was focused on fairly mechanical tasks. Herbet Simon and Allan Newell demonstrated same for logical tasks like Towers of Hanoi.

Piaget believed that people built schemas for understanding and that those schemas could be applied to many tasks but also believed that schemas could only learned, not taught.

Piaget was correct in that people have different sets of mental tools, but wrong in that they can be taught. Also, while mental exercise in general does not transfer, there are tools that can be learned that increase ability over a wide range of problems.

Teases about research showing that courses studied in university drastically affect thinking years later, and that even brief lessons in statistical concepts lead to better thinking in different contexts weeks later.

Goal of this book is to collect tools that are:

Thinking about thought

Three major insights into how the world works.

Everything’s an inference

Everyone is familiar with visual illusions. Note that knowing that it is an illusion does not allow you to stop seeing it.

Also subject to mental illusions. Similarly to visual illusions, knowing that the illusion is present does not allow you to stop experiencing it. It still feels like direct perception of reality. Requires conscious effort to work around.

First example is schemas/stereotypes. These are frameworks/templates/rule-systems that we use to structure our mental model of the world. Apply to virtually everything we encounter and inform us about likely properties and behavior eg if you see a restaurant with bright primary colors and plastic seating you would reasonably assume that it is a fast-food restaurant and that it will be cheap and low quality food.

Stereotype is generally a negative word, but couldn’t function without them. Only cause problems when they are poorly constructed / inaccurate, or when we mistake them for reality instead of heuristics. Can think of schemas as providing base rates. Compare to reference-class forecasting, which is kind of a way of finding an external schema.

Schemas can cause subconscious influences via priming - an effect where activating one part of a schema can spread activation to the rest without the subjects awareness eg students given word problems that include words associated with old people walked slower after the test. Similarly, using words associated with hospitals makes people faster when later answering yes/no questions about hospitals. Followed by a huge list of examples of priming effects causing poor decisions eg hurricanes with female names cause less deaths, apparently because they sound less dangerous. Priming effects seemed to suffer somewhat in the reproducibility crisis. TODO find out which priming experiments have survived.

Framing - presentation of information matters. Order of information eg is it ok to smoke while praying vs is it ok to pray while smoking. Labels eg undocumented worker vs illegal alien. Focus eg 75% lean vs 25% fat. Simple framing can dramatically effect important decisions like whether to go for surgery vs radiation. Useful habit is to deliberately consider different framings of the same problem to see if the intuitive answer changes.

Representativeness heuristic - events are judged as more likely the closer they are to the prototypical example of the reference class eg homicide is a more representative cause of death than suicide is, so homicide is incorrectly judged to be more likely than suicide. Problematic because it ignores basic rules of probability eg Linda. One special case of this is that our prototype for randomness is inaccurate, so we often assign meaning to things because they don’t seem random. Another example is conspiracy theories - it doesn’t seem like huge events should be caused by single, unremarkable individuals so we look for big causes to match big effects. Some of these seem to be a bit of a stretch to fit the representativeness heuristic.

Big lesson - the best predictor of future behavior is past behavior. Schemas are a fallback for when you don’t know the past behavior.

Availability heuristic - judge the frequency or probability of an event by how easy it is to recall or imagine examples. Distorted by how memorable events are - this is why parents worry about child abduction but not child obesity.

Summary:

The power of the situation

Fundamental attribution error / context blindness - we tend to dramatically overestimate the influence of personal attributes and underestimate the influence of context/environment.

Favorite example - on the way to an experiment students pass a collaborator who appears to be ill and in need of help. 2/3 of students who were told they were on time stopped to help, but only 1/10 of students who told they were running late. But an observer would not know this, and would see one group as charitable and the other group as heartless.

Correction to the big lesson before. The best predictor of future behavior is past behavior across a wide range of situations. Otherwise you are seeing mostly the effect of the situation, not of the person. Seems to imply interesting corollary - that best predictor of behavior in a situation is the past behavior of others in the same situation. This squares with results in Stumbling on happiness - that the best predictor of future happiness is not your own internal simulation but the outside view of others in similar situations.

Special case of environmental influence is peer pressure. Again, effects are heavily underestimated eg a significant influence on the eventual grade of male students was whether or not their (randomly assigned) roommate drank heavily in high school. Not likely to be the explanation given by the students in question.

Can use environmental effects deliberately eg giving students accurate statistics on drinking rates at their school reduces drinking overall.

Fundamental attribution error is less strong for ourselves because we experience our own situation somewhat directly. Result is that people are more likely to attribute their own actions to environment than they are for others. I’m having a bad day, they’re an angry person.

In a number of experiments, American students are more susceptible to fundamental attribution error and less likely to focus on context than Japanese students.

Summary:

The single most useful thing I’ve gotten from this book is the habit of watching out for fundamental attribution error, especially when I am meeting someone for the first time or have only met them in one environment.

Looking at close friends, I can remember my initial impressions of them being completely different. Many of them only became friends because chance lead to repeated meetings and corrected the first impression. Suggest I may have missed out on many other potential friendships where first impressions where not corrected.

Similarly, I’ve learned to weight the judgment of people who have known the subject for a long time much more than my own judgment from a short period.

The rational unconscious

Rationalization. Run priming experiments and ask participants about reasons for the their actions. Typically find strong denial that the priming stimulus had any influence, despite the results. In some, the participants claim the opposite causality. So basically no awareness of priming effects.

Familiarity effect - with repeated exposure a neutral stimulus can become a positive stimulus. Doesn’t work for negative stimulus, naturally.

Subliminal perception - can cause priming effects with perceptions that are not consciously experienced eg 0.1s flash of a word.

Verbalizing reasons for preferences can lead to less satisfaction later. Interpretation is that System 1 is much better at weighing choices with many facets, and System 2 just gets in the way. Also limits consideration to facets that can be verbally described.

This is all similar to the ideas in Blink. TODO does there exist a good breakdown of which decisions are better made this way? Does the research on expertise in eg Superforecasting bear on this?

Experiments showing that people can learn complex patterns (learning determined by better performance over time, until the pattern is changed) without awareness. Again, participants rationalized their suddenly worse performance.

Common phenomenon where solution to a problem comes out of the blue when thinking about something else. Experiment where subconscious hints to participants led to sudden revelations. Again, participants could not accurately identify the stimulus that inspired them.

But ‘sleeping on the problem’ won’t help you factorize a large prime number. Don’t have any good characterization of which problems the subconscious is able to get solve. Humbug.

Uses chess as an example where experts are unable to access the processes involved. Reading accounts by chess grandmasters contradicts this somewhat. Certainly they describe some conscious operations. Could be double-checking the unconscious processes?

Interesting point - we believe we have access to processes governing judgment and behavior but not perception or recall. Why is that? Time for an evo-psych just-so story! We need to be able to generate reasons for actions to convince other people. We don’t ever need to convince other people of perception.

Main point - can’t access System 1 directly and so we can’t know the reasons for our actions/beliefs/perceptions. Given reasonably accurate models can we consciously recreate the processes? In most cases probably don’t know enough of the inputs.

Summary:

Would be interested to apply these ideas to programming. How much of peoples work process are they actually aware of? Some easily observable examples done before eg people claim they don’t use the mouse at all but recording shows they use it 50% of the time.

The formerly dismal science

Behavioral economics is a thing.

Should you think like an economist?

Cognitive dissonance theory - if our beliefs and our behavior do not agree, one must change. We can’t easily control our beliefs but we can control our behavior. Change behavior and beliefs will follow, because dissonance is unpleasant. Really? Many people seem to manage dissonance just fine eg I believe that pigs are sufficiently intelligent that eating them is immoral, but…uh… I still do it. Also complicated by eg aliefs.

Frameworks for making decisions.

Cost-benefit theory. Maximize expected value. Requires finite number of options and perfect information. Get stuck forever trying to acquire more information before choosing.

Bounded rationality. Satisficing. Move up a level - optimize the expected value of the decision process. Handy rule of thumb for financial decisions is to work out your earning rate per hour. Don’t spend longer on a decision than it could possibly save you.

Good advice, but does not describe how people actually behave.

Advises doing cost-benefit analysis even when many of the variables are unknown, because laying out the structure gives your subconscious something to work with. Using you conscious mind to find and feed info to subconscious. Very similar to Tetlocks advice to use Fermi equations. Seems like a broad lesson.

Hard to use this for decisions where costs and benefits are not in the same units eg how much money is a human life worth? Many institutions actually generate numbers for this eg here. See also QALY.

Worth doing these analyses even if they are very rough/uncertain, because when you get to sensitivity analysis you now find out how large/small certain variables would have to be to change the decision. Might find out that some unknown variable would have to be unreasonably extreme to change the result. Again, seems like similar reasoning to using Fermi equations. The process is valuable even if the exact results are not.

Revealed preferences. Can eg place a value on life by looking at how much people are willing to pay to avert various risks to life.

Summary:

Spilt milk and free lunch

Sunk costs - only future costs/benefits should affect your decisions. Beware of exceptions (eg relationships) where costs are not sunk but invested into future returns.

Opportunity cost - cost/benefit analysis also has to include the value of the time used - could you do something more valuable with that time?

Should use cost/benefit analysis in general?

Find that people self-report using cost/benefit analysis in proportion to how much time they have spent studying it, but they may have chosen to study it because it appeals to them. Find that teaching random people also increases their likelihood to agree with cost/benefit analysis.

Mild correlation between SAT scores and self-reported use of cost/benefit analysis. Self-reported use of cost/benefit analysis correlates with making more money in uni staff. Self-reported use correlates with better grades in students, even when accounting for SAT scores.

Overall, weak evidence, mostly self-reported. Do people who self-report use do better on decision problems?

Summary:

Foiling foibles

Humans fail to follow cost/benefit analysis in systematic ways.

Loss aversion - lack of symmetry - attach more weight to losses than to equal gains. Can use this to influence behavior by reframing gains as losses. Effectively, can alter peoples value functions if you can change where they place their reference point.

Status quo bias - hard to switch away from the default. Can alter behavior by changing the perceived default. Does status quo bias make sense as opportunity cost of flip-flopping? Need to stabilize some systems.

Choice architecture - the way decisions are framed/structured, especially choice of default and ease of making each choice (eg which choice corresponds to no-action). Libertarian paternalism - setting defaults to benefit the common good. See Nudge.

Choice overload - more choices can lead to taking no choice at all, or at least not properly evaluating decision. Example of people recognizing opportunity cost?

Social priming again - can alter behavior by setting reference point for ‘normal’ behavior.

Summary:

Counting, coding, correlation and causality

Soft-area (social psych, dev psych) students are more able to apply stats principles to their own lives than hard-area (bio psych, cogsci, neurosci) students. Both take the same stats courses, but soft-area students get practice in applying stats to more relatable areas.

Key skills

Theorems alone are not sufficient - for daily application need heuristics that can be trained into System 1.

Odds and Ns

Sampling.

Law of large numbers - sample stats converge to population stats as sample size grows. Independent of population size.

Sample bias - when sample chance is not independent of measured variables.

People better at intuitively understanding sampling in domains they are familiar, weighting individual samples less. Is this an example of experience trumping the fundamental attribution error?.

Predictions based on interviews are barely correlated (~0.1) with reality, across many different domains and experiments. An interview is a single sample! Given that it’s hard to avoid weighting interviews too strongly, better not to interview at all! This is an interesting point. Kahneman contends that we can’t train away bias. That doesn’t mean we can’t predict and avoid situations that cause bias.

Intuitive grasp of LLN can help counter FAE.

Regression to mean - extreme values are likely to be followed by less extreme values, solely because extreme values are by definition less likely in the first place. Most patients get better, so most treatments appear to work. Also related to problems around significance eg Jellybeans cause cancer. Has anyone tried applying bandit algorithms to this problem? Also vaguely recall some recent argument that regression to mean explains away much of the placebo effect.

Rough SD percentiles for normal dist (one side): 34% 14% 2% 0.1%

Need some sort of intuitive Bayesian update. If a meal from a restaurant is heavenly, your posterior estimate of the mean meal from that restaurant should go up, but not all the way up to the meal itself, and the difference between meal value and size of update is larger as the meal gets more extreme. Declining returns from deliciousness on beliefs :)

Surprising stats. Odds of an American aged 25-60 being in top 1% annual income at least once in their life > 0.110. Odds of same for 6 consecutive years are 0.006. Income is much more variable than we think, especially at the high end. Similarly, while many people are on welfare at any one time, few people are on welfare for many consecutive years.

Summary:

Linked up

Given a table with symptom yes/no and disease yes/no and asked whether the symptom can be used to diagnose the disease. People fall prey to confirmation bias and only look at the yes/no row. You always need all four cells to answer the question. Even most doctors and nurse fail to get this right.

Looking at NAAL, half of US adults fail problems like ‘Determine whether a car has enough gasoline to get to the next gas station, based on a graphic of the car’s fuel gauge, a sign stating the miles to the next gas station, and information given in the question about the car’s fuel use.’ So I have to wonder, in many of these experiments, are we seeing a specific cognitive bias or just a general lack of quantitative literacy. Can the average person even properly understand the question? I haven’t seen any experiments establish a baseline ability to solve similar problems when the bias does not come into play.

Statistical significance - odds of such an extreme result occurring by chance under the default model. Abused as a declaration of whether or not the test model is correct. Does not account for other models, poor default model, number of other tests run but not reported ,prior beliefs in test model etc.

Correlation. Measure of association between two variables. Pearson correlation only measures linear relationships. Easily distorted by transformation of variables. Prefer shared information?

Correlations which are hard to detect visibly can still provide significant information. Example given is correlation of 0.3 between income and IQ. Given IQ in 84th percentile (mean + 1SD), expect income in 63rd percentile (mean + 0.3SD).

Example correlations (useful for mental scale):

0.3 = income <-> IQ, grad school grades <-> college grades, cardiovascular illness <-> weight

0.5 = IQ <-> job performance (for average job)

0.7 = height <-> weight

0.8 = SAT score this year <-> SAT score last year

Correlation != causality. Well known but very hard to resist in practice, especially when causality is plausible. Does this make sense from a Bayesian perspective? How much should correlation increase belief in causation? Especially prior to modern distribution of knowledge and experimental data.

Illusory correlation due to confirmation bias - sample is distorted by prior belief in positive or negative correlation. Similarly, if implausible we are unlikely to even notice evidence of correlation.

In neutral condition ie no prior belief, how high does correlation have to be before we notice. Given all the data at once and when actively looking, still needs to be around 0.6. When presented in pairs, detection rate decreases sharply as the time interval between pairs increases. Does this apply to earlier results of subconscious pattern recognition? Are there some types of correlations which our subconscious can detect at lower levels?

Key point - your belief in whether or not a correlation exists is highly unreliable. Have to rely on systematic sampling + analysis.

Reliability - correlation between repeated test results. Validity - correlation between test results and reality. Eg astrological sign is a reliable but not valid test of personality. A test cannot be valid but unreliable. If two tests for the same trait do not agree, at least one of them must not be valid.

In examples given, trait samples are much more variable than ability samples but people make predictions as if they are less variable. One of the causes of FAE. Claims that coding is important ie people would better understand variability of traits if they had good numerical/categorical measurements.

Rule of thumb - if you can’t even figure out how to code a variable, you are likely underestimating the variability because you can’t see it.

Training in statistical heuristics like this was shown to transfer between tasks in different domains for the authors students.

Experiments

Multiple regression - find correlation between two variables that is not explained by listed confounders. Only works if you catch all of the confounders. Often find that randomized control experiments give different answer to the multiple regression, showing that some confounding factor was missed.

Measurements > self-reports.

Ignore the HiPPO

A/B testing - continuous live experiments with similar alternatives to move towards local optimum.

Within designs vs between designs. Between designs vary the condition across different people/places/whatever. Within designs vary the condition at for each person/place/whatever, which controls for differences between person/place/whatever. Before/after designs vary the condition over time - remember to vary order across sites.

Try to vary only one thing, keeping everything else constant (otherwise we are back to the problems with multiple regression).

In tests of significance, cases only count when they are statistically independent (follows from deriving the significance test).

Summary:

Experiments natural and experiments proper

Natural experiment - data in the wild that contains presumably independent cases of the variable.

Correlational evidence - not actually clear on the definition given, think it is correlation found between cases that are not presumed to be independent, thus requiring multiple regression

Randomized control experiments - variable chosen at random so guaranteed not have any confounding factor.

Natural experiments are useful for identifying hypotheses. Randomized experiments are more reliable but usually costlier. Some experiments are too expensive, impractical or unethical to perform, so have to rely on natural experiments.

We often undertake huge programs without any experiments. Many examples were actual effects of a program are the opposite of expected. Experiments are cheap relative to the cost of being wrong.

Summary:

Eekonomics

Place low confidence in studies that rely on correlation alone. Multiple regression is useful and sometimes the only tool available, but it is very error-prone.

MRA can provide weak evidence for causality via lagged correlations.

Dose-dependence - correlation increases for higher dose - somewhat stronger evidence than correlation alone.

Some effects cannot be proven by MRA eg is there bias against hiring people who have been unemployed for a long time - lack of hiring could be caused by bias, or unsuitability for job could cause long-term unemployment.

Experiments which may withhold intervention from the control group often viewed as unethical. Probably more unethical to apply interventions when you don’t know if they help or harm. Similar story in evidence-based medicine - doctors in the 50s were opposed to randomized control experiments of new medicines. History is not on their side.

Lack of correlation doesn’t prove lack of causation. Confounding variables may hide the correlation eg suppose some new diet is not correlated with weight loss - it may be because the people who are struggling most with weight gain are more likely to try the diet - the diet is bringing their high numbers down to the average numbers so no obvious effect appears.

Summary:

Don’t ask, can’t tell

Self-reports on internal state are entirely unreliable. Very heavily influenced by priming/framing. Lack of introspection into mental processes means that in practice model of own internal state is often only about as accurate as model of other peoples internal state.

Stumbling on happiness has a lot of good examples of this. In particular, makes a strong case that the outside view is better ie ‘would I be happy in situation X’ is better answered by ‘are other people happy in situation X’ than by introspection/simulation.

Reference group effect - when asked to rate some trait/attribute/effect, people rate it relative to what they believe to be the norm. Causes problems if different subjects have different reference groups. Often vanishes when the reference group is made explicit.

Self-enhancement bias - people believe themselves to be better than average in most areas. Varies across cultures. In some cultures effect is negative - a modesty bias.

Agreement response bias - more likely to agree to question that disagree. Varies by culture. Controlled by counter-balancing questions - same quantity of normal and reversed questions for each category.

Behavioral measures more accurate eg better to measure conscientiousness by grades, timeliness etc. How is this established?.

Rule of thumb - behavioral measures > self-reports for concrete scenarios > self-reports for abstract properties.

Self-experiments. Try to only change one thing. Simple measures often good enough eg three-point scale for mood. Record data immediately, don’t try to recall.

In some sense, self-experiments are easier because there are many confounders you just don’t care about. Maybe coffee only works for you because of your genetics. Don’t care - still works. Similar to A/B testing in that you can improve the goal variable without necessarily understanding the causal structure.

Summary:

Thinking, straight and curved

Deductive logic - given premises and rules of valid inference can determine the truth of a statement. Assumes closed world, reliable knowledge.

Inductive logic - given samples, reach (probablistic) conclusions about distribution. Open world. Assumes independent samples.

Deductive and inductive logic both give frameworks on what inferences are valid.

Dialectical reasoning - less about refereeing inference and more about ways to structure search for truth.

Deductive/inductive logic are concerned with validity. Dialectical reasoning is concerned with truth.

Logic

Propositional logic etc.

Pragmatic reasoning schemas - rules-sets for commonly encountered situations. Examples where people fail a logical puzzle but solve it easily when it is encoded in a familiar setting. Some schemas are valid (in terms of formal logic). Some are more like heuristics.

Deontic reasoning - not concerned with validity or truth but with whether a persons conduct is proper. Often signaled by use of the word ‘should’.

Argues that purely logical schema are not useful, because Confucian China was successful without developing them.

Summary:

Dialectical reasoning

Confucian China did not develop formal logic. Instead relied on dialectical reasoning.

Three rules of thumb described as underling Easter dialecticism:

Experimental evidence that people raised in this tradition are more focused on context than individuals, less adept at applying formal logic to real-world examples, more comfortable with uncertainty.

Speaking of the weaknesses of natural experiments, I find much of this chapter unconvincing. Superforecasting covers what seems to me to be similar material (eg practical value of synthesizing multiple viewpoints) without pulling in the huge confounders of Eastern vs Western culture.

Separately, while I didn’t find much practical value in Trying not to try, it’s a very accessible and enjoyable introduction to Confucian-era philosophy.

Knowing the world

Epistemics - discipline which fuses theory of knowledge, cognitive science, philosophy of science - study of knowledge and how it is obtained

KISS and tell

Occams razor. Simpler theories preferred because they are easier to test. Define simple.

Compares Occams razor to ‘release early, release often’. Investigating simple hypotheses first provides faster feedback. Because they are easier to falsify? Why not skip the middle man and recommend picking the most easily falsified hypothesis?

Reductionism - reductionist theories are those that assert that the whole is nothing more than the sum of the parts. My preferred definition - reductionism is a strategy that assumes that the properties of the whole follow more from the properties of the parts than from their organization. A reductionist approach to understanding a computer might be able to model one perfectly, but would not notice the high-level patterns in the bits in memory, would not realise that there is a simpler way to understand the behavior than a full simulation.

NIMH refuses to support research in behavioral science, preferring to focus on neuroscience and genetics. This reductionist approach has produced no new treatments. There may be high-level structure in the brain that can be understood without understanding the parts, and understanding the parts may not cast any light on this high-level structure. To continue the computer analogy, even a full working model of a computer will not tell you what the program is supposed to do or how it behaves in general. It can only tell you how it will behave under the specific conditions of each simulation run. Entanglement with the world provides meaning to the bits in memory and we need to understand that entanglement.

We are too good at generating truthy theories. What makes a good theory?

Falsifiability. Putting the focus on falsification counters confirmation bias. Interesting observation from Rationality: AI to Zombies is that Bayes rule requires that if outcome X raises your confidence in a theory then outcome ¬X must lower your confidence. If you believe that both outcomes confirm your theory then you can raise your confidence without even looking.

Is evolution falsifiable? No way to run direct experiments. This is why the above formulation of falsifiability is interesting - there are observations that would lower our confidence in evolution, even if we can’t test the theory directly.

Induction and black swans. Maybe the sun won’t rise tomorrow.

Significance alone is not enough. Have to consider prior beliefs. Some experiments confirm ESP and some deny it, but the prior is so low that we would need significantly more evidence to be convinced, even if many experiments reach the magical 0.05 significance.

Ad-hoc amendments to theories - seemingly unfounded additions to a theory that enable explaining some additional, inconvenient fact. Post-hoc amendments - rendered seemingly reasonable by hindsight bias - prevents noticing falsification.

Rationality: From AI to Zombies has a thorough, novel and convincing treatment of this whole topic that heavily influenced my thinking. I’m increasingly struck by how many concepts can be usefully replaced by Bayesian reasoning and information theory.

Keeping it real

Theories often generated and pursued because of intuition.

Paradigm shifts. Exciting new theory comes along. Scientists jump on board and start experimenting, eventually finding evidence that supplants the old theory.It’s like a new hammer is discovered and for a while everything looks like a nail. Eventually it settles down and only the useful applications remain.

Dominant theories are often heavily influenced by culture.

On the whole, the scientific process in reality is not nearly as disinterested and impartial as it sounds on paper.

Thoughts

Some very useful ideas, mixed in with lots of interesting stories and experiments.

East vs West digressions were unconvincing, especially since they came immediately after the section explaining all the possible problems with the methodology. Frequently made the leap from ‘Eastern students display behavior X’ to ‘Eastern display behavior X because of cultural trait Y’ with little more than truthy feels.

For the later chapters on the scientific process, I much prefer the treatment in Rationality: From AI to Zombies.

The sections on statistics and behavioral economics were mostly not new to me, but I did learn from the warnings about MRA.

Main things I want to remember from this:

Lastly, there were also some minor mentions of some of these skills transferring across different domains, which is something I’m on the lookout for.