skip to primary navigation skip to content



Dennis Norris - Research page


Most of my research involves developing computational models of perceptual and cognitive processes such as reading, speech recognition and memory. This is complemented by both behavioural and neuroimaging research.


Why build computational models?

Even very simple theories often behave in quite complex ways. The only way to be absolutely sure what a theory predicts is to implement it as a computer program. As with any other theory, a good computational model should make testable predictions that drive further research. The importance of a good theory is exemplified in this quotation from Charles Darwin "About 30 years ago there was much talk that geologists ought only to observe and not theorise; and I well remember someone saying that at this rate a man might as well go into a gravel pit and count the pebbles and describe the colours. How odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service!" (Charles Darwin, September 18, 1861). A book chapter (Norris, 2005 ) I wrote a few years ago provides an informal discussion of the role of computational modelling in developing cognitive theories.


What kind of computational models?

Bayesian models. All of my recent work has been devoted to Bayesian modelling. The general idea has been to see how well we can explain aspects of perception or memory by assuming that people use the available perceptual resources in a near-optimal manner. This doesn't mean that I believe that people really do behave optimally - clearly they don't - but this is a good place to start, and turns out to give some very simple explanations for phenomena that have been difficult to explain in other frameworks.




The Bayesian Reader

How do people read? Let's start by pretending that human perception was perfect. How would a perfect, or optimally designed system work? Might people come pretty close to behaving like an optimal system? Perhaps rather surprisingly, it seems that they do. If we make the assumption that perception works by a process of collecting noisy evidence from the input (in this case, from the earliest stages of the visual system) we can construct a formal model of how people should behave when reading individual words, or when performing common laboratory tasks such as deciding whether letters form real words or nonsense words. This is the principle behind the Bayesian Reader model ( Norris, 2006, 2009; Norris & Kinoshita, 2012). This simple idea turns out to give a principled explanation of a wide range of experimental data on reading.


The latest version of this model (Norris & Kinoshita, 2012, Psychological Review) addresses the question of how people represent the order of letters in words. The model explains how we can read the famous "Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy" email.


Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy
Matt Davis and I have written a brief analysis of the claims in the "Cambridge email" here and we've collected examples of the email translated into a number of other languages here


Spoken word recognition: Shortlist A & B


Much of this work is conducted in collaboration with Anne Cutler and James McQueen. A central theme running through much of our research on spoken word recognition has been the problem of how we recognise continuous speech. In contrast to written language, where there are white spaces between the words, spoken language contains few reliable cues to the location of word boundaries. How do listeners solve the problem of recognising spoken words without first knowing where the words actually are in the input? How do listeners avoid thinking that they hear 'cat', 'a', and 'log' when someone says the word 'catalogue'? We have been tackling these, and related problems in spoken word recognition by a mixture of conventional experimental work, both on English and on other languages, and by the development of a large scale connectionist model of spoken word recognition called Shortlist (Norris, 1994). When we listen to someone speaking we get the impression of hearing a discrete series of words. However, computational models of human speech recognition such as Shortlist and TRACE (McClelland & Elman, 1986) assume that listeners unconsciously have to consider 'cat', 'a', and 'log' - and maybe 'cattle' - when they hear 'catalogue'. They then have to discover which of these alternatives forms the most likely interpretation of what they hear. This process of discovering the best interpretation is often described as lexical competition. Different candidates words compete with each other to determine the optimum interpretation. Very much the same thing happens in automatic speech recognition systems. The main difference is that, if you are using a speech recognition program like Dragon Dictate, you can actually ask it to list the alternative interpretations it is considering. Human listeners are not so obliging. However, in a number of experiments we have shown that listeners really are considering many alternatives - even though we are never aware of it (McQueen, Norris and Cutler, 1994; Norris, McQueen and Cutler, 1995)


Shortlist B: A Bayesian model of continuous speech recognition.

James McQueen and I ( Norris & McQueen, 2008) have developed a Bayesian version of the Shortlist model - Shortlist B. This paper introduces two major advances over the original Shortlist model: First, it uses a more realistic input derived from perceptual confusion data. Second, and much more importantly, it replaces the interactive activation framework of the original Shortlist model (now known as Shortlist A for 'activation') with Bayesian methods. The model's behaviour follows almost entirely from the simple assumption that listeners approximate optimal Bayesian recognisers. One consequence of this is that the model is much simpler than either the original Shortlist model or TRACE - it requires far fewer parameters. The model simulates data on speech segmentation, word frequency, and perceptual similarity. The paper also describes a Bayesian implementation of the Merge model (Norris, McQueen and Cutler, 2000). The latter is based on the procedures described in the Bayesian Reader model (Norris, 2006). The code and documentation for Shortlist B can be found here.



My work on short-term memory combines computational modelling (for example, see section below on the Primacy model) with both behavioural and neuroimaging studies. Much of this work is with Kristjan Kalm at the CBU and with Susan Gathercole. Our current projects investigate the relationship between long- and short-term memory, how people learn order (e.g. learn your telephone number or the alphabet), how information is coded in memory, and whether memory can be improved with training (see Gathercole, Dunning, Holmes & Norris, 2019). Adopting the general Bayesian framework much of thes modelling tries to show that the way people behave is much as would be expected if they are trying to make the best use of limited capacity memory systems.

Are short-term and long-term memory really different?

A very old question in memory research is whether there really are separate long- and short-term memory systems. Many of us thought that this was indeed a very old question that had a clear answer - the two are separate.  However, recently there has been a move to claim that short-term memory is nothing more than activated long-term memory.  In two recent papers (Norris, 2017; Norris, in press) I've explained why the idea of short-term memory as activated long-term memory just won't work.



We've recently taken a new look at the old issue of chunking in STM.  In Miller's (1956) famous paper "The magical number seven ..", Miller suggested that the capacity of STM was determined by chunks and not information or items. But what's a chunk and how does having one help? We reach a conclusion that's rather different from Miller's. You can find out more about this work in these two papers:

Chunking and redintegration in verbal short-term memory

What’s in a chunk? Chunking and data compression in verbal short-term memory


Bayesian filtering

A new paper (June 2024) takes the concept of Bayesian filtering and applies it to learning sequences in the context of the Hebb task

Sequence learning as Bayesian filtering



The Primacy Model.

How is it that we can remember sentences or telephone numbers? More specifically, how can we remember the digits in a telephone number in the correct order? Although short-term memory (STM) is one of the most extensively studied topics in cognitive psychology, we still don't have definitive answers to these questions. But, we have made considerable progress on this topic in recent years. Much of that progress is attributable to the development of new computational models. For many years, the most influential and productive theory of STM has been the Working Memory model of Baddeley and Hitch (1986). However, the Working Memory model only made qualitative predictions, and didn't provide a detailed explanation of how order could be represented in memory. To address this shortcoming, Mike Page and I developed a computational model of STM called the Primacy model (Page and Norris, 1998). Based on quite simple assumptions, the Primacy model gives a detailed quantitative account of a wide range of empirical data. In more recent work we have focussed on understanding the relation between long- and short-term memory. In particular, we are interested in the role that short-term memory plays in learning new words (Cumming, Page, & Norris, 2003; Page, Cumming, Norris, Hitch, & McNeil, 2006).






[full publication list here]


Kinoshita, S., & Norris, D. (2009). Transposed-letter priming of prelexical orthographic representations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(1), 1-18.  [pdf]


McQueen, J.M, Jesse,A., & Norris, D. (2009) No lexical-prelexical feedback during speech perception or: Is it time to stop playing those Christmas tapes? Journal of Memory and Language. 61,1,1-18 [pdf]


Norris, D. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52(3), 189-234.


Norris, D. (2006). The Bayesian Reader: Explaining word recognition as an optimal Bayesian decision process. Psychological Review, 113(2), 327-357. [pdf]


Norris, D. (2005) How do computational models help us build better theories? In A. Cutler, (Ed.) Twenty-First Century Psycholinguistics: Four Cornerstones. [pdf]


Norris, D. (2009) Putting it all together: A unified account of word recognition and reaction-time distributions. Psychological Review. 116(1), 207-216,  [pdf]


Norris, D. & Kinoshita, S. (2008) Perception as evidence accumulation and Bayesian inference: Insights from masked priming. Journal of Experimental Psychology: General, 137(3), 434-455.  [pdf]


Norris, D. & Kinoshita, S.(2012) Reading through a noisy channel: Why there's nothing special about the perception of orthography. Psychological Review. 119(3), 517-545.


Norris, D., Kinoshita, S. & van Casteren, M. (2010) A stimulus sampling theory of letter identity and order. Journal of Memory and Language Description: [pdf]


Norris, D. & McQueen, J.M. (2008) Shortlist B: A Bayesian model of continuous speech recognition. Psychological Review, 115(2), 357-395. [pdf]


Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences, 23(3), 299-370.


Page, M. P. A., & Norris, D. (2008) Is there a common mechanism underlying word-form learning and the Hebb repetition effect? Experimental data and a modelling framework., In A. Thorn & M. P. A. Page (Eds.), Interactions Between Short-Term and Long-Term Memory in the Verbal Domain [pdf]


Page, M. P. A., & Norris, D. (1998). The primacy model: A new model of immediate serial recall. Psychological Review, 105(4), 761-781. [pdf]


All of the empirical work here relies on support from Jane Hall .