seab.envmed.rochester.edu/jeab/articles/2005/jeab-84-03-0581.pdf
masters
« back to results for ""
Below is a cache of http://seab.envmed.rochester.edu/jeab/articles/2005/jeab-84-03-0581.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive.
Yahoo! is not affiliated with the authors of this page or responsible for its content.
_______________________________________________________________________________
LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS
G
REG
S. C
ORRADO
, L
EO
P. S
UGRUE
, H. S
EBASTIAN
S
EUNG
,
AND
W
ILLIAM
T. N
EWSOME
HOWARD HUGHES MEDICAL INSTITUTE,
STANFORD UNIVERSITY SCHOOL OF MEDICINE,
AND MASSACHUSETTS INSTITUTE OF TECHNOLOGY
The equilibrium phenomenon of matching behavior traditionally has been studied in stationary
environments. Here we attempt to uncover the local mechanism of choice that gives rise to matching by
studying behavior in a highly dynamic foraging environment. In our experiments, 2 rhesus monkeys
(Macacca mulatta) foraged for juice rewards by making eye movements to one of two colored icons
presented on a computer monitor, each rewarded on dynamic variable-interval schedules. Using
a generalization of Wiener kernel analysis, we recover a compact mechanistic description of the impact
of past reward on future choice in the form of a Linear-Nonlinear-Poisson model. We validate this
model through rigorous predictive and generative testing. Compared to our earlier work with this same
data set, this model proves to be a better description of choice behavior and is more tightly correlated
with putative neural value signals. Refinements over previous models include hyperbolic (as opposed to
exponential) temporal discounting of past rewards, and differential (as opposed to fractional)
comparisons of option value. Through numerical simulation we find that within this class of strategies,
the model parameters employed by animals are very close to those that maximize reward harvesting
efficiency.
Key words: matching, choice, decision theory, neuroeconomics, reward, LNP models, hyperbolic
discounting, eye movements, monkey
_______________________________________________________________________________
In this journal, over a decade before the
birth of the first two authors of this article,
Richard Herrnstein published a simple obser-
vation about the choice behavior of animals in
a key-pressing task: The relative frequency of
responding on a given key closely approximat-
ed the relative frequency of reinforcement on
that key (Herrnstein, 1961). If, for example,
a pigeon received two thirds of its food-pellet
rewards for pressing a particular key, the
pigeon came to press that key two thirds of
the time. By 1970, this observation had grown
into a general law relating choice behavior to
reward history, now commonly referred to as
Herrnsteins Matching Law, which he also
published here in the most widely cited
scientific article in JEABs history (JEAB,
1993). The matching law asserts that:
r
k
P r
i
~
c
k
P c
i
;
ð1Þ
where r
k
is the number of rewards earned on
any particular option k, c
k
is the number of
choices made to that option, and the summa-
tions in the denominator are over all available
options. In words, this expression states that
the fraction of total choices that an animal
allocates to an option will match the fraction of
total rewards they earn on that option. This
correspondence between reward and choice
fractions is the central prediction of the
matching law. The research presented in this
article follows directly upon Herrnsteins,
testifying to the continuing impact of his
seminal work on animal choice.
Over the intervening decades, most studies
of matching behavior have focused on the
steady stategathering data only after an
animals choice behavior equilibrates to any
manipulation of reward contingencies. As
shown by Davison and Baum (Baum &
Davison, 2004; Davison & Baum, 2000) and
by Gallistel and colleagues (Gallistel, Mark,
King, & Latham, 2001; Mark & Gallistel, 1994),
however, important mechanistic insights can
be gained by examining the dynamics of the
system as it operates in a state of flux. We
therefore designed a dynamic foraging para-
digm wherein animals behavior must adapt to
frequent changes in environmental conditions
in order to gather rewards efficiently. As in the
previous studies by Davison and Baum and by
Greg Corrado, Leo Sugrue, and William Newsome are at
Howard Hughes Medical Institute and Stanford University
School of Medicine. Sebastian Seung is at Howard Hughes
Medical Institute and Massachusetts Institute of Technol-
ogy.
Correspondence should be addressed to G. S. Corrado,
Department of Neurobiology, Stanford University, D200
Fairchild Building, 299 Campus Drive West, Stanford,
California 94309 (e-mail: gcorrado@alumni.princeton.
edu).
doi: 10.1901/jeab.2005.23-05
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR
2005, 84, 581617
NUMBER
3 (
NOVEMBER
)
581
Gallistel and colleagues, our primary goal is to
gain insight into the mechanisms underlying
matching behavior by studying the system as it
operates near the limits of its adaptability.
We collected substantial behavioral data sets
from 2 rhesus monkeys, some of the most
flexible and tenacious reward harvesters in the
animal kingdom (Southwick & Siddiqi, 1985).
In earlier work with this data set, we employed
a local formulation of the matching law,
incorporating leaky integration of reward
history, to model the animals behavior (Su-
grue, Corrado, & Newsome, 2004). This local
matching rule captured the essential features
of the data well and was more than adequate
for the purposes of our earlier analysis, which
focused on the interpretation of neurophysio-
logical data. Our use of that model, however,
was motivated primarily by its simplicity and
formal similarity to the matching law, not by
a principled exploration of possible alterna-
tives.
We now take a very different approach.
Rather than selecting a specific model and
fitting it as well as possible to the data, we allow
the data themselves to suggest the most
appropriate model within a broad class of
possibilities. Thus we aim to infer more
directly the computations underlying choice
behavior from the data themselvesto esti-
mate rather than to fit. Our specific goal is to
capture the dynamics of choice behavior
within the broad framework of Linear-Non-
linear-Poisson (LNP) models (e.g., Chichil-
nisky, 2001). This class of models, which
includes the leaky matching rule from our
previous study, describes choice in terms of
a feed-forward, three-stage process. However,
rather than assume a specific functional form
for each of these stages, here we reconstruct
the function that best describes each stage
directly from the raw data. To accomplish this,
we employ an established sequential estima-
tion procedure based on a general form of
Wiener kernel analysis (Dayan & Abbott,
2001). Although our solution is constrained
to lie within the LNP framework, this frame-
work is far more general than our earlier
casting of the data in the form of a leaky
matching rule.
Linear systems analysis has been applied
successfully to the analysis of reward-choice
relations in elegant work by several research
groups in the past. Linear techniques have
been employed, for example, in studying the
dynamics of extinction (Palya, Walter, Kessel,
&
Lucke,
1996,
2002),
session-to-session
changes in behavior under concurrent vari-
able-interval (VI) reward schedules (Hunter &
Davison, 1985), integration of reward effects
over time (Horner, Staddon, & Lozano, 1997),
and to establish a theoretical basis for the
steady-state matching relation first enunciated
by Herrnstein (McDowell, 1980; McDowell,
Bass, & Kessel, 1983; McDowell & Kessell,
1979). Our use of the more general LNP
framework both extends these methods and
applies them in a new behavioral context.
As we will show, this approach ultimately
recovers an LNP choice model that resembles
our earlier leaky matching rule in several
respects, but that also contains a number of
key differences. These refinements include
hyperbolic (as opposed to exponential) tem-
poral weighting of past rewards, and differen-
tial (as opposed to fractional) comparisons of
option value. We will demonstrate that this
revised model successfully predicts single
behavioral choices and independently gener-
ates realistic synthetic behavior. Finally, we will
show how we can use this model to evaluate
the optimality of our animals behavioral
strategy in terms of net rewards harvested.
EXPERIMENT
Figure 1A depicts our dynamic foraging task.
In this task, two colored icons, or targets,
appear on a computer screen, one red and
one green. A computer monitors the animals
ga