Quick and dirty review of Psychology of Programming Interest Group 1989-2015

Inspired by Ji Yi's homework assignment, I decided to skim all 360-odd papers in the Psychology of Programming Interest Group archives.

Yi reports his students taking ~10 hours to skim 100 papers over a single weekend. His students are clearly way more disciplined than I am. I dragged it out for months. It was brutal.

Towards the end I did finally get up to about 50 papers a day, so if I do this again for other journals I'll probably just take a random sample of 50 papers and try to do the whole lot in one go.

What did I learn?

Not a great deal about the psychology of programming itself. For the most part the field doesn't feel like a stumbling progression towards enlightenment, but just plain stumbling.

Here are some common failure modes that frustrated me:

Plain bad science, especially in the early years where a lot of the experiments are 'my pet project vs the world' and somehow the pet project always comes out looking good. My favorite example has a graph where the control group are clearly performing better, and the author explains this away, saying it's because the control group were cheating, and in the conclusion of the paper declares the treatment a success. And it was published!
Failing to validate instruments. In particular, a lot of papers that involve coding qualitative data didn't bother to have two people code the data independently to check for agreement.
'We have to do some science, and this is science'. Many of the experiments are from the start clearly incapable of answering the original question. I realize that getting good data for these subjects is hard, but the opportunity cost still stings. Do a different experiment!
Theoryless science. For example, one paper had programmers read programs under an eye-tracker and found there was a significant difference in the gaze patterns between the more experienced group and the less experienced group. So what? There's no suggestions as to what it means or how it can be used or under what conditions it's expected to be replicable. These papers typically end with "further research is required" and then no further research materializes.

The last point is often detectable right at the beginning. If I read the description of the experiment and I can't even make up an interesting result, that's not a good sign.

There were also a lot of papers that read like "I found this in a book, we should apply it to programming". I don't know if it's fair to call that a failure mode, but I rarely found anything enlightening in these and most of them were not followed up with papers where they actually apply the idea.

On the positive side:

There are a ton of negative results. PPIG seems to be totally willing to publish failed experiments, which is awesome.
Things get better over time. Somewhere around the mid-2010s I started finding papers that actually seemed like results. The series of papers on testing whether students form consistent models is a particular highlight - it starts with an interesting correlation, then some failed replications, then refinements of the test, then some successful replications, then more refinements, then combining it with interviews to test validity. It's not sexy, but it does seem to be actual progress towards a reliable measure of one specific leak in the education pipeline.

I have a sort of vague idea that most of this work is just attacking things at too high a level. Programming is a big, diverse, complex skill and we don't really understand really basic subskills yet, or even know how to break it down into subskills. Without that it's impossible to know whether it makes sense to extrapolate any one set of results. If barely trained students benefit from syntax highlighting, does that mean professional programmers will? Or programmers in other languages? Or programmers dealing with programs longer than 50 lines? We have no idea.

Much of this reminds me of Feynman's stories about rat experiments.

...his papers are not referred to, because he didn't discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats.

What are the things we have to discover before we can discover things about programming? I have no idea.

Anyway, here are the full notes. They're highly opinionated and I made no attempt to be fair to the authors. If I said something mean about your research, don't worry - I barely read it anyway :)

EDIT PPIG broke all their urls :(

1989

1990

Using Systematic Errors to Investigate the Developing Knowledge of Programming Language Learners

Two year programming class. 100 engineers. Errors in semicolon use concentrated on specific contexts, even though students have learned explicit syntax rules. Later errors become uniformly distributed ie attention lapses rather than misunderstanding.

Students don't learn syntax from specification of syntax - need contextualized experience. Parallel to natural language learning.

Opportunistic Planning in the Specification Stage of Design (full paper)

Single engineer observed writing spec in the wild for three weeks. Claimed to be following hierarchical plan but was observed deviating opportunistically.

Tools should allow interrupting planning of one component to jump to another.

Empirical Study of Design in an Object-Oriented Environment

8 experienced programmers - 4 OOP beginners, 4 OOP experienced. Up to one day to solve two problems. Observed heavy resuse of solutions/schemas once discovered.

1991

1992

ACE - An Application Construction Environment

Framework for building end-user programming environments. Aims for progressive specialization where programmers hand-off partially finished environments to end-users. No study on actual users.

Addressing the psychology of programming in programming language design

Judge programming language by breaking writing down into steps, and look at how much knowledge required to choose between options at each step. Novel language, observe four paper writers working through problems. Example of a macro feature where the correct choice is not clear.

Provides a method for evaluating how natural/obvious a given language or feature is.

An analysis of novice programmers learning a second language

Studies transfer of 'plans' learned in Pascal to Ada or Icon. 13 students with Pascal experience. 3 work in Pascal, 5 in Ada, 5 in Icon. Observed, broken into episodes, episodes classified by consensus.

GPT's preparation of students for programming in the real world

Two week industrial course for new graduate hires. Subjects forced to work only in company hours - no last-minute rush - teaches important of planning/scheduling. Role-play vague and unhelpful customer. Uni courses have to be fair => perfect environment - better learning from realistic disruptions.

Possible extensions to the Byrd Box tracer aimed at experts

Proposes study of how expert Prolog programmers use tracer. Didn't actually perform study.

Programming in the Real World: Computer Science students' perceptions of the values and difficulties of learning formal methods

2nd and 3rd year students. 51 returned questionnaire and some were interviewed? Focused on four courses that use formal methods.

Students struggle to manipulate formal expressions by hand. Need automated tools. Only realized relevance of early course much later, when it was too late. Don't believe that formal methods are used 'in the real world'.

Restricting manipulations within a device space: Effects upon strategy, errors and display-based problem solving

Do expert programmers develop better working memory, or do they learn to use external memory more effectively? Notice that experts jump around more often, while beginners tend towards linear program generation.

10 professionals. 12 2nd year undergrads.

Experiment 1. Have to program while speaking strings of digits - strains working memory. Expert performance did not suffer. Novices have more errors under suppression, and jump around code more often.

Experiment 2. Write a program using a program that doesn't allow lines to be edited after hitting enter. Experts produce more errors than novices in this environment.

Suggests that experts don't develop more working memory, but instead switch to strategies that use external memory more.

Some thoughts on designing an intelligent system for discovery programming

Plan for a dynamic visualization of code execution.

Teaching formal software engineering at Loughborough

Incoherent.

Text vs graphics in Prolog tracers

13 subjects solve problems using one of three tracers. No clear differences noted.

The Design and Application of Visually-Oriented Tools for the Use During Software Development

Seems to be research proposal, not a paper.

The interpretation of states: a new foundation for computation?

Proposes a model of computation based on agents which make discrete observations of state?

Visual programming & visualisation of program execution in Prolog

Graphical version of prolog. Slides only.

1993

1994

An Experimental Evaluation of Different Proposals for Teaching Prolog

Two axes. #1 whether common patterns are taught. #2 whether explicit mappings are given between example and exercises. Shipped experiment as free online class. No results yet.

Cognitive and Organisational Issues in Programming in the Large: Preliminary Findings from a Case Study

Case study in UK bank. Large team. Interviews and questionnaire. Early results only. Nothing notable.

Computer Programming for Noughts and Crosses: New Frontiers

Agent-based model of programming for OXO games. No experimental results.

Learning Computer Programming: The Effects of Collaborative Explanations and Metacognition on Skill Acquisition

Self-paced LISP course. 25 students, no programming experience. Taped sessions and students think aloud. Analyses student explanations.

Better-performing students made more complex and more connected explanations of LISP concepts, and asked more questions of other students. No data or statistics given.

Moilère: A Visual Programming Environment Based on a Theatre Metaphor

Visual OOP env built on smalltalk. Based around theatre metaphor. Abstract only.

Towards an Experiential Description of Programming and Learning to Program

14 students x 6 interviews. Self-reported experience of programming. No results.

1995

A model of programming

Studies of visual programming useless because they focus on toy problems. Personal model of how OOP programming works.

A pilot study on novice Pascal programmers on vocational courses in further education in Northern Ireland

Questionnaire to explore connection between programming ability and math ability. No results.

An investigation into strategies employed in solving a programming task using Prolog

32 2nd year undergrads. Experience with procedural programming. Asked to solve problem in Prolog.

Two different strategies observed. Both were procedural.

Comparing program comprehension in different cultures and different representations

R-technology - visual control flow developed in Soviet Union. 24 professional R-tech programmers. Comprehension test for two 150loc programs. Then asked to write a program in either C or R-tech.

No results.

Control strategies used by expert program designers

Wrong link.

Courseware design support

Case library tool for existing courseware design suite. Designed by observing users on mockups. Tool was not effective.

Do diagrams make us smart(ER)?

Study how multiple representations of logic helps. No body.

Documentation skills in novice and expert programmers: an empirical comparison

20 novice students. 7 faculty members. Add comments to ADA program.

Falsified hypothesis that experts would produce semantic comments and beginners would produce paraphrases of code.

Facilitating the acquisition of mental models of programming with GIL: an integrated planning and debugging learning environment

Graphical version of LISP supporting execution of partial programs. No body.

Forms/3, a declarative graphical language

No body.

Knowledge exploration in design: communicating across boundaries

Case studies of multi-disciplinary design problems. Unclear results.

Learning graphical programming: an evaluation of KidSim

Kids programming environment. 56 children, 10-13yo. Varied problems, class schedule and instruction.

Clear enjoyment, but no good evidence of transferable learning.

MADLab: masking and multiple bug diagnosis

Toolkit for studying debugging of programs with multiple bugs. Little information.

Maintenance of object-oriented systems: an empirical analysis of the performance and strategies of programmers new to object-oriented techniques

14 grad students. Experienced in structured programming, 6 months experience with OO. Procedural and OO program with identical spec. Asked to make changes.

Worse performance on 4/6 tasks for the OO group.

Mental representation and computer use

No body.

Prolog without tears: an evaluation of the effectiveness of a non Byrd Box model for students

New Prolog tracer based on different mental model of evaluation. 48 cogsci students. Given printed program, worked through modified versions in one of three tracers and asked to identify differences.

Exposure to language predicts success more than exposure to tracer tool. Students were more successful with Plater tracer.

Software Design using GOOSE

Dev environment based on observing what programmers actually do, rather than what models prescribe.

No body.

The evaluation of TED, a techniques editor for Prolog programming

32 students. 10 week course. Standardized teaching methodology. Group 1 got standard course. Group 2 got standard course with different method of teaching recursion. Group 3 got same course as 2 but also used the TED editor.

Group 3 made less mistakes overall. Mixed results for different problems. TED editor not described.

Transforming verbal descriptions into mathematical formulas in spreadsheet calculation

Argues that this process goes through an intermediate imagery step.

No body.

Using episodic knowledge in design

No body.

Validating knowledge based systems with software visualization technology

No body.

1996

1997

1998

The Software Factory

Sarcastic faux history.

Software reuse from an external memory: the cognitve issues of support tools

Examines process of finding software to reuse. No data.

1999

Skipping the doctoral consortium - no bodies.

The need for computer scientists to receive training on people skills

Two environments experienced by the authors where people skills were lacking.

Spatial measures of software complexity

Proposes metrics on callable-from graph. No evaluation.

Representation and structure in the re-use of design rationale by novice analysts

18 1st year undergrads. Graphical vs tabular vs narrative representation of design rationale. Options-based vs criteria-based (unclear on meaning).

Narrative format had slower response times, fewer correct answers. Graphical and tabular were matched. Criteria-based had similar times to option-based but more correct answers.

Programming with a purpose: Hank, gardening and schema theory

Visual language for cognitive modeling. Focus is not on building models but on understanding them. Used for student project. Questionnaire at end of course, qualitative analysis.

Students liked having no textual syntax. Found it easier to understand Hank models than Prolog models. Able to understand each others programs. Could understand execution process. Hank was useable with pen and paper.

Mental representation and imagery in program comprehension

32 c programmers. 32 spreadsheet programmers. Checking whether programmers build mental models based on control flow or data flow. Used recognition task with priming to see whether recognition is improved by primes nearby in control flow or nearby in data flow. Not clear what language the test program was in or how distance is judged in each model.

Interpretation of results is not clear to me.

Investment of attention as an analytic approach to cognitive dimensions

Considering user action in terms of choosing where to invest limited resource of attention eg spend attention looking up api call vs just hope that memory is correct.

Getting a GRiP on the comprehension of data-flow visual programming languages

Investigating visual programming languages for novices.

No body.

Focal structures in Prolog

Program comprehension in Prolog. 10 experts, 10 novices, 10 non-programmers. Asked to read program, reconstruct from memory and then explain purpose. Programs are very short.

Considered 4 different ways of choosing key points of program. Only way for which difference in recall was significant was 'schemas' - common patterns of computation in prolog eg build list and then aggregate.

Evaluating Hank

Describing future studies proposed for Hank. No results yet.

EPSRC and support for the psychology of programming

Something about funding decisions.

ENCAL: a prototype computer-based learning environment for teaching calculator representations

Piaget-ian construction of mental imagery. Traditional calculators and algebra notation don't provide mapping to any concrete imagery, so children struggle to map classroom math to their own experiences. Proposes a computer environment with multiple representations, at different levels of abstraction, running simultaneously.

Desirable features of educational theorem provers - a cognitive dimensions viewpoint

Questionnaire on usability of theorem provers. Subjects complained about vague and sometimes inapplicable questions.

Computer science undergraduates learning logic using a proof editor: work in progress

170 1st year students in logic course. Post-course questionnaire.

Students preferred to memorize rules rather than practice with Jape. No other results yet.

Bricolage Forever!

10 science teachers. Five problems in Word, where knowledge of underlying implementation would allow easy solution.

Most subjects resorted to trial and error, and out-loud gave incorrect explanations of underlying mechanics. No attempts to falsify their models.

Proposes that experiments in programming would produce similar results.

2000

Uncovering effects of programming paradigms: Errors in two spreadsheet systems

Want to see if changing naming scheme in spreadsheets from individual cells to named blocks of cells changes the kinds of errors that users make. 154 1st year undergrads, various subjects. Four different spreadsheet tasks, given in various orders. Compared Excel vs Basset (home-grown system). Errors are categorised by author.

Lower number of errors for some tasks, higher for others. No clear numbers for categories.

Tools for observing study behaviour

Tool for recording student interactions with a Smalltalk environment.

The influence of the psychology of programming on a language design: Project status report

Developed a language based on review of past psychology of programming results. Experimented with different forumulations of boolean queries. Found confusion over 'AND', precedence/grouping and users totally ignoring parens.

The effect of programming language on error rates of novice programmers

GRAIL vs LOGO. 26 1st year undergrads, no experience. Similar exercises for both groups, but not identical.

Fewer errors for GRAIL. Difference larger for syntax errors than for logic errors. Presumably graded by author.

Some problems of programming in robotics

Theorising about how to teach robotics to children.

Programmer education in Arts and Humanities course degree

Questionnaire given to 14 students. Self-reported background / computer use. Programming quiz.

Structured interview given to ? students assessing learning style, development process.

Concludes that sample is too small to say anything of interest.

On the use of functional and interactional approaches for the analysis of technical review meetings

Transcribed 7 design meetings. Cut up into speech acts, classified according to coding scheme developed during analysis. Classifies into 5 kinds of exchanges, using unclear methodology.

How do people check polymorphic types?

Questions: 34 what is the type of this, 34 why is this an error. 6 subjects, all of whom at least post-grad. Video categorised by the methods subjects used to answer, unclear methodology.

How a visualization tool can be used: Evaluating a tool in a research & development project

Jeliot - animations for understanding algorithms.

Programming intro w/ 564 undergrads, didn't regularly use Jeliot. Short programming course w/ 37 high school students, course based around Jeliot. Semi-structured interviews for small numbers of students in each group, not explained how they were selected.

Rejected by group 1 lecturer because of library issues. Students found UI confusing.

Gotos Considered Harmful and Other Programmers' Taboos

Discusses how taboos spread socially. Argues that all the notable taboos are about crossing abstraction boundaries.

Expertise and the comprehension of object-oriented programs

16 novices. 16 experts. Showed a program for 2 or 10 seconds. Asked 5 comprehension questions. Experts perform better in both 2s and 10s cases. Data- and control- flow questions benefit from 10s more than function, operation and state questions.

Experiences with novices: The importance of graphical representations in supporting mental mode

Trying to teach beginners the mental representation of recursion used by experts.

49 students taught to use a diagrammatic representation of recursive programs. Test mental model by giving a list of possible solutions and asking which ones work correctly. Students who were tested with diagrams made fewer errors than students who tested with diagrams ie model was learned but failed to transfer to normal code.

Cognitive Dimensions: An experience report

Design of real-time temporal logic language using cognitive dimensions.

Cognitive Dimensions of Use Cases: Feedback from a student questionnaire

Questionnaire on UML use cases, covering cognitive dimensions. 14 students.

An assessment of visual representations for the 'flow of control'

Testing whether to use arrows, lines or juxtaposition for control flow. 84 students. Given maze in each style and have to follow paths. Measured response time. Arrows were fastest and most accurate.

A framework for knowledge: Analysing high school students' understanding of data modelling

How well are mental representations transmitted to students via verbal discussions? 3x groups of 7-9 high school students, oral interview + written questionnaire, questions on syllabus material.

High school students suck at definitions.

A Cognitive Dimensions view of the differences between designers and users of theorem proving assistants

Questionnaire sent to users and designers of a prover assistant. Answers on numerical scale. Minor differences in responses between the two groups.

A Cognitive Dimensions questionnaire optimised for users

Generalised CD questionnaire. Tested on users of various systems. Not clear whether this is useful.

2001

Using a Graphical Design Tool for Formal Specification

Adapt UML for formal specs. No evaluation.

The Usability of Formal Specification Representations

51 students, 4 questions, each a choice of two implementations of simple problem. After four weeks of Z classes, given same set of questions described in Z.

Claims strong shift in approach, but just eyeballing the numbers doesn't look like much.

The Science of Web-Programming

Functional programming in templates. Maybe misunderstood, but seems incredibly trivial.

The Model Matters: Constructing and Reasoning with Heterarchical Structural Models

Compares 4 formal modeling methods. Evaluated directly by authors on a single example. Prefer tables over trees.

The Coach - Supporting students in the area of error reports

Smalltalk env. Made a 'coach' with list of recent actions and error messages. No evaluation yet.

The Application of reflective Practitioner to Software Engineering

Advocates for using this practice. Not totally clear on process or goal, but seems related to the problem of not begin able to teach implicit knowledge.

Team Performance Factors in Distributed Collaborative Software Development

Comparing two teams of students, high vs low performance. Coded email and irc histories.

Concludes that communication is important.

Some Evidence for Graphical Readership, Paradigm Preference, and the Match-Mismatch Conjecture in Graphical Programs

Experiments with graphical languages.

Different graphical layouts for programs. 21 students tested on comprehension of graphical vs textual. 60 students tested on comprehension of different graphical layouts. No main effect in either.

Additional cognitive skills test, no correlation with main experiment.

Salvation for Bricoleurs

Switching between RLT and LTR writing in Word is confusing. Gave students an explicit conceptual model.

Control group exam was under different conditions. Used to explain their better performance.

Research Agenda for Computer Science Education

Argues for more empirical research in CSE.

Observations of student working practices in an online distance education learning environment in relation to time

Patterns in working hours for remote learners in an OU course.

Native-End User Languages: A Design Framework

Plans for end-user programming in languages other than English.

Long Term Comprehension of Software Systems: A Methodology for Study

there is little in a way of a conclusion, other than the common oft repeated call of, ‘we need to do more research in this area

Human and "human-like" type explanations

Comparing human approach to explaining type errors to approach of authors' new type-checker. Coded by author, without explanation.

Evaluation of the Cognitive Dimensions Questionnaire and Some Thoughts about the Cognitive Dimensions of Spreadsheet Calculation

Gave (translation of) general CD questionnaire to 10 spreadsheet users.

Evaluating a new programming language

CD questionnaire to evaluate new language, given to 5 professional programmers.

Designing a Programming Language for Home Automation

Media Cubes. Physical cubes that correspond to (dataflow?) operators. Placed together to create program. Provides direct referencing eg place cube on tv to reference tv.

Iota.HAN. ML / Pi-calc based language.

Cognitive Dimensions Profiles: A Cautionary Tale

Ignoring some cognitive dimensions in analysis can lead to missing effects on those dimensions.

An Open-Source Analysis Schema for Identifying Software Comprehension Processes

Protocol for analyzing spoken transcripts to determine whether programmers are understanding a given program by comparing against preexisting domain knowledge or by recognizing patterns of code.

A Study of Human Solutions in eXtreme Programming

Mostly XP advocacy. Little in the way of actual numbers.

A phenomenographic view on the socio-cultural activity theory in research concerning university students' learning of computer science in an internationally distributed environment

Planned data collection from student projects.

"It's just like the whole picture, but smaller": Expressions of gradualism, self-similarity, and other pre-conceptions while classifying recursive phenomena

Studying language high-school students use.

2002

Shared Data or Message-Passing - A Human Factor in Technical Choices?

How do people choose between the two? Give parallel programming problems to students along with Myers-Brigg test.

Hugely forking paths. Only 0-3 students per MBTI / shared-mv-messaging intersection.

What is Programming?

Trying to define programming in light of rising end-users.

Ideas on difficulties - loss of direct manipulation, use of notation, abstraction.

Visualizing Roles of Variables to Novice Programmers

Break variables into different types - constant, counter, control etc. Apply different graphics when visualizing program.

No evaluation.

The Roles Beacons Play in Comprehension for Novice and Expert Programmers

45 students. Given single line, judge how likely it is to have come from a binary search. Ratings(?) influenced by expertise.

30 students. Given single line, map it to one of three algorithms. Unclear results.

19 students. Shown algorithm and asked to comprehend/memorize (under eyetracker). Then given comprehension test. Divide sections of code into categories, normalize time in each category by screen area. Experienced group focused more on 'complex' category.

The Misplaced Comma: Programmers' Tales and Traditions

Speculation on programmer folklore.

Softening the Complexity of Intelligent Systems Programming

Students struggle surprisingly with symbolic systems. May be caused by overhead of manipulating data-structures. Suggest pattern-matching support in language to help.

Revitalising Old Thoughts: Class diagrams in Light of the Early Wittgenstein

...

Programming Aptitude Testing as a Prediction of Learning to Program

33 students. No significant correlation between VB course final exam grade and high-school grades or SAT grades. Huoman programming aptitude test explains 25% of final grade.

Patterns for HCI and Cognitive Dimensions: Two Halves of the Same Story?

HCI researcher compares pattern languages to cognitive dimensions.

On Concurrency in Educational Software Authoring Systems

Educators given animations of systems and asked to describe how they work.

No coding or statistical analysis.

Inexperienced subjects tended to describe each entity until influenced by external event. Suggests describing systems in terms of individual behaviors and cross-entity conditions for behavior change.

Modelling Software Organisations

Proposes agent-based modeling of software organizations.

Making the Analogy: Alternative Delivery Techniques for First Year Programming Courses

Anecdotal reports from lecturer of first-year programming course.

3 main problems. Students have no prior exposure to computational thinking, or even basic logic. No familiar experience to compare most programming concepts to. Unused to rigid syntax rules.

Suggests several analogies for use in teaching.

Learning Styles in Distance Education Students Learning to Program

Distance learning students. Given questionnaire to assess learning style.

Many forks. Significant gender differences.

HASTI: A Lightweight Framework for Cognitive Reengineering Analysis

Presents simple computational model of human cognition. Suggests studying software tools in terms of how their use maps onto this model.

Evaluating Languages and Environments for Novice Programmers

Common-sense ideas for comparing languages and IDEs.

Dimension Driven Re-Design - Applying Systematic Dimensional Analysis

Unclear. Map language spec into this tool and automatically compute CDs?

Class Libraries: A Challenge for Programming Usability Research

Library design should be treated as a usability problem.

A Study of Usability of Z Formalism Based on Cognitive Dimensions

CD questionnaire given to students in Z course reveals many issues, too many to list here.

A Comparison of Empirical Study and Cognitive Dimensions Analysis in the Evaluation of UML Diagrams

Two different notations for UML diagrams. CD analysis favours one. Multiple-choice exam given to students shows no significant difference.

2003

A development study of cogntive problems in learning to program

Anecdotal reports from programming course.

Students can often give the correct code without understanding how it works. Often mislearn concepts, while making the correct noises.

An ethnography of XP practice

Very similar to previous paper.

Applying Cognitive load theory to computer science education

Ideas for how to reduce cognitive load when teaching programming.

Characterising software comprehension for programmers in practice

Empirical studies of programmers tend to take place in very artificial settings. Plans to study programmers in-situ.

Cognitive Dimensions of tangible programming techniques

At lexical level TUIs provide no advantage - too small a vocabulary, no established conventional symbols, no significant improvement in interaction speed or recall.

At syntactic level - more possible kinds of relationships (position, orientation, proximity...), input-only interface (eg no undo).

Cognitive Dimensions questionnaire applied to visual languages evaluation - a case study

CD questionnaire given to 2(!) subjects. Both were critical of the questionnaire.

Does the empirical evidence support visualisation?

Brief survey finds mixed support for visualization in programming tools.

First results of an experiment on using roles of variables in teaching

Categorize variables into one of 10 roles.

Taught 80 students with a) normal methods b) variable roles c) variable roles + animated simulator which shows roles.

Conclusion is confusing - raw data looks to me like roles group did worse in their final exam, but author claims that poor grading was the cause.

Investigating the influence of structure of user performance with UML interaction diagrams

Tries again with the two different UML notation. Three different studies show no difference in performance.

Java Debugging strategies in multi-representational environments

Java debugger with ghetto eye-tracking. 49 novice students. Given spec and test cases, then given 10 mins in debugger to fix program.

Numbers are in a different paper. This one only discusses verbalizations from 2 subjects.

Subjects started by reading code almost top-to-bottom. Debugging switched between forward and backward reasoning.

Little Languages for Little Robots

Teaching language design by working with students to build a language for programming Mindstorms robots.

Software Effort Estimation: unstructured group discussion as a method to reduce individual biasis

Groups of programmers produce less over-optimistic estimates than individual programmers (effect size 1.25).

Seems similar to proposed mechanism in Superforecasters - if estimate is ideal time + list of things-that-might-go-wrong, then compiling things-that-might-go-wrong from whole group should provide better coverage.

Some parallels between empirical software engineering and research in human computer interaction

Current experiments in psych-prog do not inform practice. HCI faced similar criticism in previous decades and changed focus from lab experiments to field studies.

Team coordination through externalised mental imagery

Several observed cases of mental imagery originating in one dev and spreading throughout an entire team.

Tensions in the adoption and evolution of software quality management systems

4 companies. (4?) quality managers. Semi-structured interviews on department structure, history, responsibilities, practices.

Not clear what the conclusions are.

Towards authentic measures of program comprehension

Scheme for grading student explanations of programs.

Using cognitive dimensions to compare prototyping techniques

CD analysis suggests that existing categorization of prototyping tools into lo- and hi-fidelity is not enough.

Identifies 4 key activities: authoring, validation, implementation, confirmation. Not clear how these relate to categorization.

Using laddering and on-line self-report to elicit design rationale for software

Design rationale may not be available as explicit, conscious, verbalizable knowledge.

Authors elicit rationale from students designing web pages using real-time self-reporting and laddering. (Laddering seems similar to root-cause analysis, but for exploring knowledge rather than causes).

Using the cognitive dimensions framework to measure the usability of a class library

Suggests a variant CD specifically designed for evaluating libraries.

2004

Roles of variables and strategic programming knowledge

Variable roles again. No new results.

Programming without code: a work in-progress paper

Proposal for natural language programming for kids.

Extreme programming: all of the elegance but none of the models?

Plans for studying extreme programming.

XP: Taking the psychology of programming to the eXtreme

Review of previous XP studies.

Visual attention and representation switching in Java program debugging: a study using eye movement tracking

Same as previous paper, but with a real eye-tracker this time. Similar results.

Understanding our students: incorporating the results of several experiments into a student learning environment

Collaborative IDE for teaching. Anonymous group work was popular.

Towards the development of a cognitive model of programming: a software engineering proposal

Proposes teaching system that tracks students weaknesses down to the level of individual concepts. (Not unlike Khan academy).

PicoVis: a dynamic visualisation tool for simulating a Bluetooth communication environment enhancing student understanding

Bluetooth network simulation for teaching. No evaluation.

Metaphors we program by

Many examples of metaphorical language used by devs.

Learning object-oriented programming

Anecdotal observation of students. Students were reluctant to draw object diagrams before coding. Instructor didn't approve of their modelling choices.

Learning and using formal language

36 psych students, later 64 vaguely-sourced adults. Tested on 'does regex match string' and 'make a regex to match this class'. Exposure to former doesn't improve performance in latter (but only 6 questions per person?).

Investigating patterns and task type correlations in open source mailing lists for programmer comprehension

Collected program snippets from Java mailing lists (mostly from bug reports and patches). Don't really follow the results.

Factors affecting course outcomes in introductory programming

75 students. Computer Programming Self-Efficacy Scale questionnaire. Program comprehension test. Program recall test. Weak correlations between self-efficacy at end of course and test scores.

Evaluating algorithm animation for concurrent systems: a comprehension-based approach

Proposes evaluating understanding of concurrent systems with a particular talk-aloud protocol and coding system.

Dynamic rich-data capture and analysis of debugging processes

Proposes using restricted focus viewer and other recording methods.

Design diagrams for multi-agent systems

Proposes an extension of UML to handle multiple agents.

CORBAview: a visualisation tool to aid in the understanding of CORBA-based distributed applications

Visualisation tool for CORBA. Records events, allows replay. No evaluation.

Comparison of three eye tracking devices in psychology of programming research

3 commercial eye-trackers. 12 subjects given program comprehension tests. Head-mounted device took longer to setup and was less accurate but allowed subjects to move around.

Aspects of cognitive style and programming

~150 students. Modified Witkin field-dependency test. Digit span test for working memory. Correlation to exam results 0.1-0.2 for working memory, 0.2-0.4 for field-independency.

(Would effect of field-independency be explained away by spatial IQ?)

An Inter-Rater Reliability Analysis of Good's Program Summary Analysis Scheme

Coding scheme for program summaries shows ~80% agreement between 3 users. Suggests refinements to the scheme to reduce differences.

An Examination of E-Commerce Homepage Design Guidelines by Measuring Eye Movements

11 subjects visit ecommerce pagers under eye-tracker. Self-reported data agreed with eye-tracker - users look at top-left or top-middle first.

A first look at novice compilation behavior using BlueJ

63 students. Record full source code every time they hit compile. More than half of errors accounted for by missing semicolons, misspelled variables, missing brackets, illegal start of expression, misspelled class. Majority of recompiles after error take place within 20s. Vast majority of recompiles after success take place after > 5 minutes.

2005

A Framework for Evaluating Qualitative Research Methods in Computer Programming Education

'Grounded Theory'. Framework for guiding qualitative research.

Attitudes Toward Computers, the Introductory Course and Recruiting New Majors: Preliminary Results

Computer literacy course. Computer Attitude Scale questionnaire. Attitudes became increasingly negative over the duration of the course.

Attuning: A Social and Technical Study of Artist-Programmer Collaborations

Interviews with 4 programmers and 2 artists. Conclusions unclear.

Concretising Computational Abstractions: What works, what doesn't, and what is lost

ToonTalk. No body.

Effects of Experience on Gaze Behavior during Program Animation

18 high-school students in undergrad programming course. Animate three short programs. Comprehension task. Gaze-tracking.

No significant variation in behaviour wrt experience.

Factors Affecting the Perceived Effectiveness of Pair Programming in Higher Education

58 students pair-programming. Observation, questionnaires, semi-structure interviews, field notes.

Differences in skill level affect collaboration. Debugging tasks reported as particularly tiring / unenjoyable.

Graphical Visualisations and Debugging: A Detailed Process Analysis

29 undergrads. 1 modification task, 1 comprehension task, 6 debugging tasks.

Forking paths. Questionable interpretations of results.

Introducing #Dasher, A Continuous Gesture IDE

Language model for writing C# with Dasher. No evaluation.

Mining Qualitative Behavioral Data from Quantitative Data: A Case Study from the Gender HCI Project

27 male and 24 female subjects. Self-efficacy questionnaire. Spreadsheet with extensions for testing. 2 spreadsheets with bugs.

Female subjects had lower self-efficacy about debugging ability. Less likely to use new debugging features, although no difference in learning time. Introduced more new bugs, but also correlates highly with use of new debugging features.

Pair Programming: When and Why it Works

Ethnographic study of pair programming at two companies. No results yet.

PP2SS - From the Psychology of Programming to Social Software

No body.

Preliminary Study to Empirically Investigate the Comprehensibility of Requirements Specifications

Pilot to study whether Java implementation of Irish electoral system is easier to understand than the legal language.

8 software eng postgrads. Comprehension questionnaire. Java group scored worse and expressed more confusion and frustration in talk-aloud.

Psychometric Assessment of Computing Undergraduates

Aptitude Profile Test Series (sounds like an IQ test) administered to 34 students. Correlates 0.4 with exam results.

Demographic survey of 197 students. Country of birth and first language significantly affecte exam results - non-english students fare worse.

APTS and demographic survey of 80 students. Students with prior experience in programming fared better. APTS correlates 0.3-0.4 with exam results.

Notes sample bias - students that volunteered are much more motivated than those that didn't.

Rating Expertise in Collaborative Software Development

45 pair programmers asked to rate ability and experience for themselves and for peers. Unclear results.

Representation-Oriented Software Development: A Cognitive Approach to Software Engineering

Deja vu. Published effectively the same paper a few years earlier.

Roles of Variables in Experts' Programming Knowledge

13 professional programmers asked to sort variables into groups based on similarity. Claims that grouping agrees with roles, but not clearly falsifiable.

Short-Term Effects of Graphical versus Textual Visualisation of Variables on Program Perception

12 students. Field-dependency test. Taught variable roles. Visualisation of pascal program, asked to write program summaries. Questionnaire about tools.

Unconvincing analysis.

Sidebrain: A Sidekick for the Programmer's Brain

Tool that records stack of open tasks, notes and queue of pending tasks.

(Particularly interesting, because I made a similar tool for myself but didn't use it for very long.)

No evaluation.

Software Authoring as Design Conversation

Talk-aloud of subjects working on sheep-dog game. No real results yet.

The influence of Intra-Team Relationships on the Systems Development Process: A Theoretical Framework of Intra-Group Dynamics

Proposes framework for studying social dynamics in teams.

The Influence of Motivation and Comfort-Level on Learning to Program

57 students. Motivated Strategies For Learning Questionnaire. Modified version of Rosenberg Self-esteem questionnaire. Modified version of Computer Programming Self-Efficacy Scale.

Samples test scores were representative of whole class.

Intrinsic motivation and self-efficacy both correlated with performance at ~0.5.

The Programming Language as a Musical Instrument

Argues that live coding is an interesting topic for PPIG.

The Psychology of Invention in Computer Science

Mines existing published interviews with famous computer scientists for quotes on creativity.

The Role of Source Code within Program Summaries describing Maintenance Activities

Studying code excerpts from Java mailing lists. Couldn't reject the null hypothesis. Not sure I understand the null hypothesis anyway.

Theoretical Considerations on Navigating Codespace with Spatial Cognition

Plans to study whether mental models of code build on existing spatial navigation strategies.

Using Roles of Variables in Teaching: Effects on Program Construction

20 students. Group 1 taught traditionally. Group 2 taught using roles. Group 3 taught using roles and animator. Only group 3 had any students who successfully completed programming task.

2006

Cognitive Perspectives on the Role of Naming in Computer Programs

Unsurprising observations on names.

Comparing API Design Choices with Usability Studies: A Case Study and Future Directions

Plans to study general principles of api design rather than just specific apis.

Empirically Refining a Model of Programmers' Information-Seeking Behavior during Software Maintenance

2 programmers talk-aloud at work. Try to generalize from this a model of how programmers look for information.

Initial Experiences of Using Grounded Theory Research in Computer Programming Education

Still talking about grounded theory without yet reporting any results.

Metaphors we Program By: Space, Action and Society in Java

Studying spatial and social metaphors in javadocs. Don't see any actual examples, just graphs.

Program Visualization: Comparing Eye-Tracking Patterns with Comprehension Summaries

Does gaze predict comprehension? Highly forking, no strong signal.

Reading as an Ordinary and Available Skill in Computer Programming

Ethnographic studies at software company. I don't really know how to summarize this, but it was interesting.

Stories from the Mobile Workplace: An Emerging Narrative Ethnography

Ethnographic study. No real information.

Subsetability as a New Cognitive Dimension?

Studying programming environments by the extent to which one can make useful programs without leaving various subsets of features.

Demonstrates that eg hello world in C or Java involves a large subset of the language features.

Arrange feature subsets as a lattice and examine what can be done in each subset.

Teaching Programming: Going beyond "Objects First"

Object-first teaching leads to students who can't code for-loops. Proposes variable-first teaching - start with teaching variable roles and control flow.

Testing Programming Aptitude

Previous study noted that consistency of mental model in early tests was predictive of long-term success in programming courses. Criticized for being too vague to replicate. Introduces a new marking scheme that doesn't require marker judgment.

The Development Designer Perspective

15 devs and product managers interviewed on design decision process. Little input from professional designers during process. Main factor in decisions was consistency with existing design.

The Effect of Using Problem-Solving Tutors on the Self-Confidence of Students

Tutor program combines explanation with multiple-choice tests. Improvement in test results after use was significant but effect size is small.

Threshold for the Introduction of Programming: Providing Learners with a Simple Computer Model

Somewhat unclear. Talks about 'threhold concepts' - concepts that once learnt change the way the student views computing. Then presents a simplified model of a computer.

Towards understanding Source and Configuration Management tools as a method of introducing learners to the culture of software development

Proposes vocab for discussing scm in terms of Vygotskyian model. I don't actually see any vocab though.

Users' participation to the design process in a Free Open Source Software online community

Studies Python Enhancement Proposals. Interviewed 10 active python community members. No obvious conclusions.

Why Don't They Do What We Want Them to Do

Interviewed (all 60?) students. Asked why they didn't use state diagrams for their concurrent systems homework.

Students who used state diagrams did better on their homework.

Widespread reports that state diagrams take too much effort.

Collaboration in mature XP teams

Ethnographic study of XP team. Nothing surprising.

Open source software communities: current issues

Argues OSS communities need to be studied more.

XP and Pair Programming practices

Survey of existing work on XP and pairing.

Agile Stories: Agile Systems and Narrative Research

Anecdote about a programmer with poor communication skills.

A gentle overview of software visualisation

Does what it says on the tin.

Challenges, Motivations, and Success Factors in the Creation of Hurricane Katrina "Person Locator" Web Sites

Interviewed team leaders of 6 person locator websites. None looked for existing sites or teams before starting. All resisted aggregators combining their content.

An Experiment on the Effects of Program Code Highlighting on Visual Search for Local Patterns

21 students. Asked to find various program fragments. Two syntax highlighting schemes had no significant effect vs black-on-white control.

Abstraction levels in editing programs

Comparing structured editing to text editing. No results yet.

A Competence Model for Object-Interaction in Introductory Programming

Proposes hierarchy of understanding for OOP execution model. 125 students given questionnaire followed by test. Results appear to agree with hierarchy, but some questionable change to data before analysis.

2007

Usability Assessment of a UML-based Formal Modelling Method

Model that combines UML and B. Compared to B alone. 41 students assigned comprehension and modification tasks. UML-B scored higher.

10 students given CD questionnaire on UML-B.

The Learning of Recursive Algorithms from a Psychogenetic Perspective

Give students recursive definition of a language, ask them various questions about it and then ask them to write various recursive functions on the language.

Talks up the benefits of this approach but it's not clear to me from a quick reading what the difference is.

Student Attitude Towards Automatic and Manual Exercise and Evaluation Systems

Survey given to 455 students with 23% reply rate. Students trust automated systems more. Worried that human contact would be reduced.

SQ Minus EQ can Predict Programming Aptitude

19 students given Autism Research Centres EQ and SQ tests. Both correlate with programming test (.44, -.45, p~=.05), combined score correlates more (.67, p=.002). (Combined score also known to correlate with autism spectrum).

Suggested explanation is that high SQ-EQ makes students more likely to rack up hours messing with computers.

Spatial Ability and Learning to Program

49 students in MSc IT. Mental rotation test correlates with success at 0.48. Acknowledges that students with high spatial ability are known to be more likely to pick engineering courses.

Problem Solving in Programming

Argues that students struggly in programming because they lack basic problem solving skills ala Polya. Propses tool that nudges students through the basic process eg making them list what data is known and what unknowns need to be solved.

Moods and Programmers' Performance

Make 72 programmers watch short video clips before debugging test. Small effect on arousal axis, but poor power.

Introducing Learning into Automatic Program Comprehension

Use machine learning for understanding programs. Example of extracting variable roles. Not clear what the learning algorithm is or how it's applied.

From Procedures to Objects: What Have We (Not) Done?

Procedural to OO has been a big shift for teaching. Can't assume that psych results will carry over.

Some of the that is claims are totally unstudied seem to have been published in previous PPIG papers.

Expert strategies for dealing with complex and intractable problems

In-field observation and interviews of 10 programmers working on large complex problems. No body.

Example of Using Narratives in Teaching Programming: Roles of Variables

Anecdotal observations of teaching kids to program using narratives and variable roles - placing kids in the role of a particular variable and playing out live.

ESCAPE Meta Modelling in Software Engineering: When Premature Commitment is Useful in Representations

Advocates for process of explicitly stating models and (manually or automatically) looking for contradictions in reality to refine the model.

Main example is software tool for expressing explicit models of how software operates. User maps model to code. Tool shows where system violates their model eg calls between two components that don't have a dependency in the model.

Assisting Concept Location in Software Comprehension

Code search that is aware of documentation? Too much jargon, no idea what the tool actually does.

Analysing and Interpreting Quantitative Eye-Tracking Data in Studies of Programming: Phases of Debugging with Multiple Representations

Eye-tracking data is hard to analyze. Complex tasks, difficult to map into simple components for hypothesis testing.

Broke previous data into small chunks and looked at trends over time and switching patterns between activities.

An Experiment on the Effects of Engagement and Representation in Program Animation Perception

PlanAni again. 24 students. Two tests under eye-tracking - 1) predict variable values 2) choose inputs to produce particular values.

Various numbers produced but no idea what they were actually trying to find out.

An Experiential Report on the Limitations of Experimentation as a Means of Empirical Investigation

Failed experiment. Subject sampling/participation messed up by other commitments. Much of the data is missing because subjects didn't get around to writing it down. Two groups behaved so differently in collaborative stage as to be effectively different experiments.

A Roles-Based Approach to Variable-Oriented Programming

Proposes language design to focus on variable roles.

A multidimensional framework for analysing collaborative design: emergence and balance of roles

No body.

A Coding Scheme Development Methodology Using Grounded Theory for Qualitative Analysis of Pair Programming

Attempting to use GT initially failed. Transcription of recording was impractical so directly annotated the video. Overwhelmed by possible choice of features because they had no actual goal.

Suggests choosing feature categories before starting coding, using structured names for concepts, making a UML model for results, pair coding.

A Categorization of Novice Programmers: A Cluster Analysis Study

Want to be able to group students by current skill level for better targeted teaching.

Give a test to 254 students that follows Blooms Taxonomy. Cluster students by results - observed non-linear skill progression.

2008

XP Team Psychology - An Inside View

Anecdotal report from team of PhD students using XP.

What Happens During Pair Programming

GT concepts for pair programming.

Using Mapping Studies in Software Engineering

Several examples of mapping studies. Notes that coverage is low, both in the mapping studies themselves and in the areas they address.

Towards a Computer Interaction-Based Mood Measure Instrument

Trying to determine mood from mouse and keyboard logs. Analysis is awful.

Thinking about Thinking in Objects: Methods, Findings and Implications from a Psychological Perspective

OOPy folks argue that thinking in terms of objects is innate. Relates various psych findings that don't seem to actually be related to OOP objects. Confusion over the word.

The Stores Model of Code Cognition

Models code cognition in terms of various memory types.

The Importance of Cognitive and Usability Elements in Designing Software Visualization Tools

Designing software with multiple views. Not strongly justified.

The Abstract is an Enemy: Alternative Perspectives to Computational Thinking

Reality is messy, has to be abstracted away to fit in tidy computers. What could go wrong?

Not very convincing examples.

Structured Text Modification Using Guided Inference

Tools that infers regexes from positive/negative examples. Evaluation with 6 subjects vs manual editing with Word and vs manual rename with Adobe Bridge. Large effect size.

Scientists and Software Engineers: A Tale of Two Cultures

Culture clashes between devs and scientists, especially in waterfall style dev.

Observing Open Source Programmers' Information Seeking

More mailing list data-mining. Conclusions are banal.

MBTI Personality Type and Student Code Comprehension Skill

74 students take MBTI test and code comprehension test. Introversion slightly predicts success.

Intuition in Software Development Revisited

Philosophizing.

Integrating Extreme Programming and User-Centered Design

Largely seems to be about integrating mock-ups and end-user testing into XP workflow.

An MCL Algorithm Based Technique for Comprehending Spreadsheets

Using Markov Clustering to try to retrieve logical cell blocks. Minimal evaluation.

A Study of Visualization in Introductory Programming

LOGO clone. Slight improvement in course participation in same year.

A Loop is a Compression

Suggests that loops are hard because there is not a 1-1 mapping between text and execution.

A Longitudinal Study of Depth of Inheritance and its Effects on Programmer Maintenance Effort

Data-mining Eclipse repo. Inheritance depth did not predict refactorings.

A Lightweight Systematic Literature Review of Studies about the Use of Pair Programming to Teach Introductory Programming

Testing out systematic lit review with 1 student. Seems to work.

A Comparison Between Student and Professional Pair Programmers

10 students and 6 professionals recorded pair programming. Significant differences in their behavior mean that past studies on students probably can't be extrapolated.

2009

Using computerized procedures for testing and training abstract comparative relations

Subjects given two comparisons eg A>B and B>C and have to report what mapping from A,B,C to small,medium,large can be deduced. Success rates between 50% and 100% depending on class of problem.

10 subjects. Three sessions. Small but significant increase in performance over time.

10 subjects. Replaced middle session with 'training session' - same interface but explains answers. Near 100% success rates on last session.

Good Programmers: Nature or Nurture? (The bed of Procrustes)

General discussion of nature vs nurture as it pertains to psychometrics. Doesn't really say anything specific about programming.

Types of Cooperation Episodes in Side-by-Side Programming

Observation of 10 grad students learning Java. GT concepts again.

Software Architects: A Different Type of Software Practitioner

Interviewed software architects about projects. Found that they weren't really focused on the projects so the interviews were pointless.

Mining Programming Language Vocabularies from Source Code

Text-mining JDK. Doesn't seem to have any new results from their past paper.

Meta-analysis of the effect of consistency on success in early learning of programming

6 attempted replications of consistent-mental-model-predicts-programming-ability experiment.

First failure had too very high pass rates - almost all of the students were consistent. Second failure was a test given after the course - expect that a programming course should teach students to pass the test (this objection doesn't actually seem to me to make sense).

Improved protocol, detailed in previous paper.

Next four replications from author and collaborators. All seem to be somewhat successful.

Initial Exploration of Eye Movements in Collaborative Work: Case Pair Programming

Eye-tracking during pair programming. Method and results are not really clear.

Further Observation of Open Source Programmers’ Information Seeking

Yet more mailing list mining. Low response rate. No other interesting observations.

Examining the Structural Features of Systems Developed in C++ and Java

10 C++ and 10 Java projects out of top 100 sourceforge downloads. Conclusions seem to be far too strong given the metrics examined eg low number of protected methods -> not enough use of information hiding.

Design Requirements for an Architecture Consistency Tool

Reflexion modeling for fighting architectural drift. 2 year longitudinal study at IBM Dublin. Discovered many violations of the model, but they generally weren't removed. Designed a CI version of the tool to catch violations as they happen.

Concrete Thoughts on Abstraction

Tries to break effective use of abstraction down into subskills.

Computer Code as a Medium for Human Communication: Are Programming Languages Improving?

12 subjects given code comprehension task in either Java or Scala under eye-tracker. The dense Scala code was more quickly understood.

Communication in Testing: Improvements for Testing Management

Survey answered by 23/60 professionals in single company. Reported poor overall visibility of test state, poor coordination between departments.

Cognitive levels and Software Maintenance Sub-tasks

6 professionals talk-aloud in maintenance tasks at work. Tries to code utterances into Bloom taxonomy. Not clear what they are looking for.

Can Named Ranges Improve the Debugging Performance of Novice Spreadsheet Users?

21 students asked to debug spreadsheet that uses named ranges. Uses previous experiment as control. Worse performance than control.

Tries to justify use of separate control by comparing error distribution to control group in a different (and properly randomized) experiment.

An Evaluation of inline source code browsing

Tool opens source code inline instead of jumping to different file.

7 subjects given various comprehension tasks. 14% faster with inline. No significance test.

A Course Dedicated to Developing Algorithmic Problem Solving Skills - Design and Experiment

Problem-solving course. Students reportedly approve. No other evaluation.

2010

A Cognitive Neuroscience Perspective on Memory for Programming Tasks

Basic overview of memory models.

A Logical Mind, not a Programming Mind: Psychology of a Professional End-User

Case study - 3 years, electronic patient record system in ICU. Customised in the field by users. Hired a professional programmer for 5 days to contrast their mental model with users.

Modifications made live with little version control tooling. Users willing to make long-term learning efforts, but only when there is a clear need/payoff. Not interested in understanding for it's own sake.

Programmer spent a lot of time trying to teach inheritance and talking about OOPy models of cars. Users completely exasperated at this apparent waste of time.

Bricolage Programming in the Creative Arts

Philosophizing.

Characterizing Comprehension of Concurrency Concepts

UML diagrams for concurrency. 15 students given comprehension test on concurrent system. Miscomprehensions are tangled chains of mistakes. Tries to categorize them by semantic level at which they occur.

Confirmation Bias in Software Development and Testing: An Analysis of the Effects of Company Size, Experience and Reasoning Skills

Tendency to write tests that confirm hypotheses rather than refute them.

88 subjects drawn from 4 companies and from grad school. Watson's rule discovery task and selection task. Students did better. Don't see a control for IQ or similar.

Empirically-Observed End-User Programming Behaviors in Yahoo! Pipes

Mined 30k pipes. Most pipes are DAGs (ie no use of looping constructs). Most pipes use a small subset of the available constructs. Most pipes are hardwired - number of exposed parameters fits exponential distribution.

Enhancing Comprehension by Using Random Access Memory (RAM) Diagrams in Teaching Programming: Class Experiment

39 students in control group taught with trace tables. 61 students taught with diagrams of memory layout. Different teachers. (Numbers suggest not randomly assigned?).

Simple programming exam. Tiny but significant effect size.

Enhancing User-Centredness in Agile Teams: A Study on Programmer's Values for a better Understanding on how to Position Usability Methods in XP

Interviewing XP programmers to determine their goals. No attempt to determine reliability or validity of their methods.

Evaluating Scratch to Introduce Younger Schoolchildren to Programming

Case study with Scratch. Given LOGO test at middle and end - small improvement. Main upside is more enthusiasm, less frustration (compared to?).

Liveness in Notation Use: From Music to Programming

Breaks liveness into levels. Compares different tasks in programming and music.

Neat diagrams of feedback loops in various systems.

Perceived Self-Efficacy and APIs

Designed self-efficacy questionnaire for APIs. Don't understand their attempt to prove that the test is reliable. They note that their other experiments have terrible power.

Project Kick-off with Distributed Pair Programming

Case study of distributed pair programming. Small number of sessions, small number of subjects.

Students’ Early Attitudes and Possible Misconceptions about Programming

Ask students about recollections of efficacy at various points in time. Results are confused.

Teaching Novice Programmers Programming Wisdom

List of heuristics novices might benefit from.

The Construction of the Concept of Binary Search Algorithm

Teaching binary search. Same as previous paper, I can't figure out what the contribution is.

The use of MBTI in Software Engineering

Looks at previous studies of distribution of MBTI types in devs.

Usability Requirements of User Interface Tools

Programming interactive apps is hard. Proposed advances from academia have not taken off. Big list of properties that make interactive software different to purely computational software.

2011

Study attitudes and behaviour of CS1 students – two realities

IACHE inventory given to 72 high school students in Brazil and 258 in Portugal. No conclusive differences between the two.

The cognitive dimensions questionnaire: adapting for non-expert users

Specializing CD questionnaire to POS software.

The influence of class structure on program comprehension

211 undergrads from 3 schools. Comprehension test on OOPy and non-OOPy programs in VB and Java. Students given non-OOPy did worse, but looking at the graph the only major difference is in the 'Class' category of questions, which seems like an obvious conclusion?

The programming-like-analysis of a innovative media tool

Media designers often need to reuse 'code' eg designing same dvd menu for many different markets. Presents a tool that separates style from content.

Understanding program complexity: an approach for study

Proposes trying to relate eye-tracker data to complexity metrics.

Useful but tedious: An evaluation of mobile spreadsheets

Subjects asked to query or edit typical office spreadsheets on a phone. Problems with cell selection, character selection, inconsistency between platforms, having to focus on keyboard while editing.

User configurable machine vision for mobiles

Software that lets users train classifier on their own visual syntax, map properties of syntax to synth inputs and then play music by pointing the camera at notation.

Many usability complaints.

Using formal logic to define the grammar for memory transfer language (MTL) on the mould of register transfer language (RTL) and high level languages

Proposes a sort of high-level virtual machine with visualization + replay as a teaching tool.

What makes software engineers go that extra mile?

Interviews with 13 professionals. Motivated by the work itself, and the aesthetics for their code. Demotivated by external obstacles to producing satisfying code.

Why PPIG matters beyond the P

No body

Systematically Improving Software Reliability: Considering Human Errors of Software Practitioners

CS should borrow ideas from human error research.

A case study on the usability of NXT-G programming language

12 students of various ages. Even experienced programmers struggled with the tasks. Loop blocks confused most subjects.

An empirical study of the influence of OCL on early learners

36 students taught an OOPy UML. When asked to choose between two models, more preference for simple models when expressed as UML vs informal.

Automatic algorithm recognition based on programming schemas

Classifier for sort algorithms with 87% accuracy on student submissions. Wants to improve it so it can give automatic feedback to students.

(One of the MIT MOOCs uses a semi-interactive classifier to help grade and give feedback on the huge number of student submissions, so it's plausible in practice.)

Class participation and shyness: affect and learning to program

CS students are often shy, and report difficulties with asking questions in class or asking for help from tutors.

Evaluation of multimedia stream processing modeling language from the perspective of cognitive dimensions

Applying CD to design to a language.

Investigation of qualitative human oracle costs

Make generative testing map to more concrete natural inputs where appropriate eg random names rather than random strings. No evaluation yet.

Measurement and visualisation of software timing properties

Timing visualisation tool for hard real-time. Proposes researching such.

On the cognitive foundations of modularity

Trying to relate OOP to cognitive uses of abstraction, I think?

Origins of poor code readability

Speculates on reasons for poor code. Nothing surprising.

Programming with the user in mind

Live Sequence Charts. Seems like programming in terms of user stories.

Interviews with students suggest that it leads them to view the system from the outside rather than from the inside.

Robot dance: edutainment or engaging learning

Single session outreach. 135 school students in total.

Pre/post programming tests show improvements in most areas, but suspicious that it had to be broken down into areas to isolate the negative area.

Focus on knowledge seems weird for outreach anyway, would have expected them to survey attitudes.

Self-Reporting emotional experiences in computing lab sessions: an emotional regulation perspective

Students report emotional state in class either via desktop widget or via a hand-held ball gadget thing. Students liked the ball as a fidget toy and as a signaling mechanism.

I kind of want one.

2012

A Field Experiment on Gamiﬁcation of Code Quality in Agile Development

Tool that assigns reputation to code, and then to programmers based on their interaction history with that code.

Past experiments found correlations of .64 and .88 with actual reputation (from surveying peers?).

Field study in postgrad lab. Does not appear to increase actual code quality. Students reported perception of unfairness and opacity. Reputation score not seen as a high priority.

A Study about Students’ Knowledge of Inductive Structures

Piaget strikes again. Still have no idea what I'm supposed to be learning from these stories.

Computer Anxiety and the Big Five

Computer Anxiety Rating Scale and five-factor personality test given to >100 biz students. Agreeableness and emotional stablity negatively correlated with computer anxiety, explaining 38% of variance.

Conducting Field Studies in Software Engineering: An Experience Report

Common problems with field studies. Some are surprising, eg the need for extension cables.

Evaluating application programming interfaces as communication artefacts

Not clear.

Evaluation of Subject-Specific Heuristics for Initial Learning Environments: A Pilot Study

Proposes list of qualities to bear in mind when designing learning environments.

Gave either old list or new list to 9 students and ask them to do usability reviews. No clear conclusions.

Exploring the design of compiler feedback

Prototype compilers that give feedback on possible performance problems. Hard to evaluate because users mostly disabled the popups.

Gaze Evidence for Different Activities in Program Understanding

Pair programming under eye-tracking. Different gaze patterns between experts and novices.

In search of practitioner perspectives on ‘good code’

Software craftmanship.

Investigating the role of programmers’ peripheral vision: a gaze-contingent tool and an experiment proposal

Proposes testing whether restricted focus tools alter programmers behavior.

Learning Programming by using Memory Transfer Language (MTL) without the Intervention of an Instructor

92 students get normal instructor. 14 volunteer students get MTL tool instead. Course score very slightly higher for volunteers.

Observing Mental Models in Novice Programmers

Adjusted consistent-model test for online testing.

126 high school students in UK. Almost a third scored highly, mostly with an underlying model of parallel execution of lines. Post-test interviews confirmed the models recognized by the test by were accurate. Students who were labeled unrecognized reported multiple models, switching models partway or 'shrug'.

92 undergrad students in Mexico. Around half marked algorithmic, similar to original experiment.

Interviews used to refine questions and marking scheme for future tests.

Schema Detection and Beacon-Based Classiﬁcation for Algorithm Recognition

Expanding their algorithm classifier. On 222 implementations from textbooks and webpages, gets 94% accuracy.

Sketching by Programming in the Choreographic Language Agent

Interesting. Not sure what to summarize.

Some Reflections on Knowledge Representation in the Semantic Web

Semantic web ontologies organized around strict hierarchies. Probably not a good map to human knowledge.

Teaching Novices Programming Using a Robot Simulator: Case Study Protocol

Planning to use virtual robots for teaching.

Thrashing, Tolerating and Compromising in Software Development

Interviews with 7 devs from a single (academic?) institution. Tries to categorize response to errors. Thrashing - undirected, random. Tolerating - just ignore the error. Compromising - hack something together for the sake of progress.

2013

I don't know why there is only one paper listed for 2013.

Using meta-ethnography to synthesize research: A worked example of the relations between personality and software team processes

Not totally clear on what meta-ethnography entails. Description is surprisingly similar to the research process described in How To Read A Book.

2014

Applying Educational Data Mining to the Study of the Novice Programmer, within a Neo-Piagetian Theoretical Perspective

Proposed data-collection PhD project.

Google Sheets v Microsoft Excel: A Comparison of the Behaviour and Performance of Spreadsheet Users

Biz students given spreadsheet tasks. Text and number entry was faster in google sheets, eyeballed at about 2/3 mean time. Formula entry faster in Excel.

Kind of a pointless comparison.

Neo-Piagetian Theory and the Novice Programmer

No new results in here.

Self-explaining from Videos as a Methodology for Learning Programming

Proposed PhD project.

A case of computational thinking: The subtle effect of hidden dependencies on the user experience of version control

Interviews with devs at Google and Autodesk on their use of VCS. Common themes: high understanding of concepts, ritualized interaction staying with narrow paths, signs of fear and uncertainty. (Slapping down the oft repeated assertion that git is only confusing if you don't understand the underlying model).

Speculates that dependencies on undisplayed state (eg staging area, current branch) and premature commitment (eg many operations are potentially destructive to the working tree) are the main culprits, which is why existing guis don't alleviate the problem.

A cognitive dimensions analysis of interaction design for algorithmic composition software

Suggests representation of time as the most problematic area.

Activation and the Comprehension of Indirect Anaphors in Source Code

Proposed experiment.

Affective Learning with Online Software Tutors for Programming

Two online tutors. >2000 students from 56 schools. Affective learning questionnaire - no significant interactions with gender, race or subject. Male students and caucasian/asian students scored better on pre-tests for arithmetic tutor. Non-CS students improved more between pre- and post-test.

Blinded by their Plight: Tracing and the Preoperational Programmer

Ask students to trace code, and what the code is supposed to do. Identifies a group of students who can trace code and can't explain it. In talk-aloud, these students struggle to move from concrete value to abstract sets.

(Implies that laddering up is a additional skill on top of execution models.)

Cognitive Flexibility and Programming Performance

298 subjects. Task-switching test. Results comparable to similar published tests.

65 students with some experience + 45 novices. Significant correlation between task switching test score and average grade at -.34, but no significant correlation with final exam grade or credits received.

Concept Vocabularies in Programmer Sociolects

40 C++ samples from 14 students. Extract identifiers. No correlation between years of experience and number of not-in-stdlib concepts.

Developing Coding Schemes for Program Comprehension using Eye Movements

We don't know how to interpret all this eye-tracking data. Proposes a scheme for developing coding schemes for eye-tracking data.

Educational Programming Languages: The Motivation to Learn with Sonic Pi

12 novices. Given tutorials on either Sonic Pi or Kids Ruby. Questionnaire on motivation and background. Sonic Pi users typed more, recalled more commands. No notable differences in qualitative questions.

Evaluation of a Live Visual Constraint Language with Professional Artists

Anecdotal reports. Not summarizable.

Exploring Core Cognitive Skills of Computational Thinking

Looking for tests that predict computational thinking. Tried D.48 (abstraction, relation, analagy), spatial reasoning test, GATB (arithmetic reasoning and tool matching subsets only).

12 students. Used raw test scores only. Sample too small to obtain significant results, but D.48 and spatial reasoning show correlations ~.5 with exam grades.

Exploring Creative Learning for the Internet of Things era

Project helping 5 artists work with RPi. Frustration with setup cost and lack of portability.

Exploring Problem Solving Paths in a Java Programming Course

101 students in intro course with 100 assignments, broken into 170 tasks. Automated test system collects snapshots every time it is run.

Majority of students build programs up incrementally and pass tests one by one.

Falling Behind Early and Staying Behind When Learning to Program

Similar consistent-model test. 360 students at 2 unis. High and significant correlations with eventual grades.

Ghosts of programming past, present and yet to come

Live programming, computers as personal expression.

Learning Syntax as Notational Expertise when using DrawBridge

Programming environment that supports both direct manipulation, block editing and raw text editing.

21 school students. Taught in 4 groups, each introducing interactions in different orders. Pre/post-test of syntax knowledge.

Analysis is confusing, but conclusion suggests that introducing block editing before text editing was beneficial.

Linking Linguistics and Programming: How to start?

Sociolinguistics? Demonstrates that natural language approaches don't carry over directly.

Nonvisual Visual Programming

Noodle, a VPL for blind children. Work in progress.

Reasoning about Complexity - Software Models as External Representations

Distributed cognition. Anecdotal reports from users of NetLogo.

The Object-Relational impedance mismatch from a cognitive point of view

Trying to tease out what people in various domains mean by 'object' by showing them tables of data and asking how many objects are in view.

2015

An empirical investigation of code completion usage by professional software developers

6 devs, first given programming task and then later asked to narrate the recording. Code completion used for checking method names, api exploration, catching bugs (eg if completion list looks wrong, look for bugs in code leading up this line).

Analysing Java Identifier Names in the Wild

Mining names in 60 java projects. Not sure what the point is.

Confidence, command, complexity: metamodels for structured interaction with machine intelligence

Properties that need to be exposed in end-user machine learning tools. How sure is the answer, how well does the program understand the domain, how complex was the method used to arrive at the answer.

Evaluation of Mental Workload and Familiarity in Human Computer Interaction with Integrated Development Environments using Single-Channel EEG

Using EEG to measure cognitive load. Not clear how useful it is compared to just asking.

Harmonious Authorship from Different Representations

Describes a programming model meant to allow many different editing representations. Without either examples or formal specs it's hard to understand how it's intended to work.

Improving Readability of Automatically Generated Unit Tests

Trains a readability model for tests. Outperforms generic readability models. Develop optimization process that produces average readability improvement of 1.9%.

Intuitive NUIs for Speech Editing of Structured Content

Proposed redesign of speech-controlled tool for writing math formulae (which is a weird idea in the first place - even between humans face-to-face there is a still a preference for writing formulae).

Tales from the Workshops: A sequence of vignettes, episodes, observations, and reflections on many years of trying to teach people programming

Hard to find small problems - even experienced programmers can't do as much as they think in 40 min class.

Adults struggle to learn because afraid of / unwilling to make mistakes.

The construction of knowledge of basic algorithms and data structures by novice learners

Again, where is the content.

The dream of a lifetime: an opportunity to shape how our children learn computing

Ideas > tech. Computational thinking > programming. Progress in new UK curriculum.

The impact of syntax colouring on program comprehension

10 subjects under eye-tracker. Mental execution task. Syntax highlighting reduces completion time. Stats are a bit suspect.

Notably, under syntax highlighting there are much fewer fixations on keywords. (Maybe colour enables peripheral vision?)

The impact of Syntax Highlighting in Sonic Pi

10 subjects. Writing and debugging tasks. Significantly faster completion with syntax highlighting (looks like ~25% faster, weird lack of numbers in this section). Programming and musical experience have no significant effect.

Understanding code quality for introductory courses

How to teach students about quality of code. Not much here yet, all looking forward to future work.