skip to primary navigation skip to content
*** PLEASE READ ************************************** THIS PAGE HAS BEEN IMPORTED FROM THE OLD SITE. FORMATTING IS MAINTAINED BY AN EXTERNAL STYLESHEET. WHEN YOU EDIT THIS PAGE, YOU MAY WANT TO REMOVE THE REFERENCE TO THIS STYLESHEET AND UPDATE THE FORMATTING. ******************************************************

The Bayesian Reader

The BayesVisual program


This document gives a brief account of how to use the version of the
Bayesian Reader program described in the paper "The Bayesian Reader:
Explaining word recognition as an optimal Bayesian decision process."
The simulations reported in the paper were run using the program BayesVisual.exe
running under Windows. The usability of this version of the program leaves
a lot to be desired, and some of the parameter names are misleading,
but the main parameters one might want to vary are those controlling what
output is produced (thresholds or points in time/steps where output should
be printed). Note that there is no facility for producing summary statistics
across a set of words. We do this with awk scripts on unix.

Because of the amount of computation required, the program runs very
slowly, even on fast machines, so we break the simulations up into
sets of a few words and run them on multiple machines using Condor


This version of the program is being made available because it is
what was used for the simulations. In our current work we use a new
version of the program where the probabilities are initially computed
at the letter level. This program allows us to investigate various positional
coding schemes, and can also simulate the masked priming task and generate
reaction-time distributions

Lexical decision mode



Use: BayesVisual script_file lexicon_file stimulus_file output_file

A script_file to perform lexical decision looks something like this:

=== start of script file ======

Average 50
PooledWordThreshold 0.80 0.80 150
PooledWordThreshold 0.85 0.85 150
PooledWordThreshold 0.90 0.90 150
PooledWordThreshold 0.95 0.95 150
PooledWordThreshold 0.97 0.97 150
PooledWordThreshold 0.990 0.99 150


MaxSteps 2000
Steps 800 1000 1400
RankSize 10
RunInBackGround
CalcNonWordDensity
VirtualPseudoWord 2.0 500
StandardError 1.0

LexiconFile %1
CharacterFile letter_set
VLD %2
OutputFile %3

=== end of script file ======


Average 50 // average over 50 runs for each word
PooledWordThreshold 0.80 0.80 150 // Lexical decision threhsold
// p(a word) threshold is 0.80,
// p(a non-word) is 0.80, don't produce any output/decisions before step 150

PooledWordThreshold 0.85 0.85 150
PooledWordThreshold 0.90 0.90 150
PooledWordThreshold 0.95 0.95 150
PooledWordThreshold 0.97 0.97 150
PooledWordThreshold 0.990 0.99 150


MaxSteps 2000 // terminate after 2000 steps
Steps 800 1000 1400 // print details after these steps
RankSize 10 // print best 10 words
RunInBackGround // set to low priority in windows
CalcNonWordDensity // do the calculations to estimate the non-word
// density to correct for background non-words.
VirtualPseudoWord 2.0 500 // ND - distance of virtual non-word.
// 500 is the frequency of the virtual non-word when only a single
// non-word is used. The frequency only applies when CalcNonWordDensity
// is not used.
StandardError 1.0 // standard deviation of sampling noise

LexiconFile %1 // first argument after the script_file is the lexicon_file
CharacterFile letter_set // file specifying letter vectors
VLD %2 // second argument after the script_file is the stimulus_file
OutputFile %3 // third argument after the script_file is the output_file

// The %Ns just indicate that the Nth argument should be assigned to this parameter.
// The %Ns could therefore be replaced by the full filenames and
// only the name of the script_file would need to be supplied as an argument.

Identification mode



To get the program to print out times for identification rather than lexical decision use the
IDThreshold parameter, e.g.

IDThreshold 0.8 50 // threshold, minimum number of steps.
IDThreshold 0.85 50
IDThreshold 0.9 50
IDThreshold 0.93 50
IDThreshold 0.95 50
IDThreshold 0.97 50
IDThreshold 0.99 50

neither CalcNonWordDensity nor VirtualPseudoWord have any effect in
identification mode.



In general, all you might want to change are the thresholds, VirtualPseudoWord, and StandardError.

File formats



the CharacterFile contains one row of real numbers per letter.
All of the simulations in the paper use the following CharacterFile:

a 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
n 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
r 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
s 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
t 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
v 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
w 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

You can use fewer or more columns/features if required.
Each letter in the lexicon must have a feature vector.
Feature vectors are the same for all letter positions.


The lexicon file just contains two columns, with words in the first
column and frequencies in the second e.g.

shred 37
begun 1114
alter 316
duels 8
...


The input_file (VLD) is just a column of words/non-words.
All words/non-words in Both the LexiconFile and the VLD file must
be the same length.

Output




The output looks like this (//comments do not appear in the actual file):


BayesVisual 1.05, compiled Sep 24 2004 11:48:01 started at Tue Sep 28 21:38:06 2004


Calculated Pseudoword neighbourhood density (2235 / 456975):
1: 100 => 0.489086, d = 1.41421
2: 3750 => 18.3407, d = 2
3: 62500 => 305.679, d = 2.44949
4: 390625 => 1910.49, d = 2.82843

StandardError 1
Average 50
MaxSteps 2000
SelectBestEntries 100
ThresholdCheckGrain 0
VirtualPseudoWordDistance 2
VirtualPseudoWordFrequency 1331.22
NonWordDensity ON
LexFile lex4.txt
CharFile letter_set


type closest: type (1678) at 0 // the closest word in the lexicon to the input,
// its frequency, and its distance from the input.


// first we get the output at the specified response thresholds:

PooledWordThreshold 0.8 0.8 150 50 50 0 0 Yes 50 290.44 No 0 0 TO 0
// yes and no thresholds, minimum steps,
// "Yes 50" - 50 responses were Yes, mean Yes RT, "No 0" - 0 responses were no, mean No RT,
// "T0 0" - 0 time-outs

PooledWordThreshold 0.85 0.85 150 50 50 0 0 Yes 50 313.4 No 0 0 TO 0
PooledWordThreshold 0.9 0.9 150 50 50 0 0 Yes 50 342.38 No 0 0 TO 0
PooledWordThreshold 0.95 0.95 150 50 50 0 0 Yes 50 390.02 No 0 0 TO 0
PooledWordThreshold 0.97 0.97 150 50 50 0 0 Yes 50 425.72 No 0 0 TO 0
PooledWordThreshold 0.99 0.99 150 50 50 0 0 Yes 50 508.14 No 0 0 TO 0



// now we get the output at the various steps:

Step 800, P(i) = 0.000191737 StdErr 0.360477, pnd 50, pi 0
Average # words within 2 stderr of sample mean: 1
LAST Lex entry closest to ID: type, f = 1678, d = 0.331871
LAST Lex entry closest to nw: type, f = 1678, d = 2
Average Odds calculation, P(Yes): 0.707755
Average entropy: 0.00195679


// DistD is distance of each word from the input mean.
// DistErr is the distance in std errors.
// _nw_ refers to the virtual non-word which is always placed
// the same distance from the mean as specified by VirtualPseudoWordDistance

Freq DistD DistErr P(w|i) P(i|w)
type 1678 0.355963 0.987476 0.99948 0.679588
_nw_ 1331 2 5.5482 0.000335008 3.49658e-005
tape 508 1.46932 4.07605 0.000130979 0.000294524
tyre 84 1.45138 4.02627 2.66432e-005 0.000361332
tope 64 1.46433 4.06219 1.74104e-005 0.000310093
time 32093 2.03297 5.63966 4.49098e-006 1.59264e-007
take 13804 2.0329 5.63948 1.91686e-006 1.58868e-007
tyke 3 1.44906 4.01983 9.66515e-007 0.000367703
true 4639 2.02855 5.6274 6.81822e-007 1.66997e-007
hope 3333 2.03566 5.64713 4.36466e-007 1.48989e-007


Step 1000, P(i) = 0.000212466 StdErr 0.322485, pnd 50, pi 0
Average # words within 2 stderr of sample mean: 1
LAST Lex entry closest to ID: type, f = 1678, d = 0.290914
LAST Lex entry closest to nw: type, f = 1678, d = 2
Average Odds calculation, P(Yes): 0.944
Average entropy: 0.000327739
Freq DistD DistErr P(w|i) P(i|w)
type 1678 0.321139 0.995825 0.999929 0.753396
_nw_ 1331 2 6.20184 4.46444e-005 1.71023e-006
tape 508 1.45946 4.52566 1.96376e-005 4.89675e-005
tyre 84 1.44371 4.47683 3.988e-006 5.98682e-005
tope 64 1.45548 4.51333 2.57027e-006 5.06915e-005
tyke 3 1.44027 4.46616 1.4887e-007 6.27665e-005
time 32093 2.02646 6.2839 1.0115e-007 3.98116e-009
take 13804 2.02528 6.28024 4.53871e-008 4.17302e-009
true 4639 2.02192 6.2698 1.54953e-008 4.19612e-009
hope 3333 2.02729 6.28647 9.97565e-009 3.76597e-009
Step 1400, P(i) = 0.000252238 StdErr 0.272529, pnd 50, pi 0
Average # words within 2 stderr of sample mean: 1
LAST Lex entry closest to ID: type, f = 1678, d = 0.239103
LAST Lex entry closest to nw: type, f = 1678, d = 2
Average Odds calculation, P(Yes): 0.998796
Average entropy: 8.75114e-006
Freq DistD DistErr P(w|i) P(i|w)
type 1678 0.270435 0.992317 0.999999 0.894486
_nw_ 1331 2 7.33866 9.10865e-007 2.93307e-009
tape 508 1.45044 5.32213 3.94923e-007 1.16178e-006
tyre 84 1.43662 5.27141 8.51763e-008 1.51444e-006
tope 64 1.44515 5.30273 5.62256e-008 1.30895e-006
tyke 3 1.43512 5.26593 3.1325e-009 1.56996e-006
time 32093 2.02192 7.41911 4.30934e-011 2.01185e-012
take 13804 2.02246 7.42106 1.91007e-011 2.07107e-012
true 4639 2.01792 7.40441 6.75829e-012 2.17532e-012
hope 3333 2.0201 7.4124 4.9892e-012 2.22275e-012











genesis();