Rational Inattention and State Dependent Stochastic Choice"

Transcription

Rational Inattention and State Dependent Stochastic Choice"
Rational Inattention and State Dependent Stochastic Choice
Andrew Caplinyand Mark Deanz
June 2013
Abstract
Economists are increasingly interested in how attention impacts behavior, and therefore
in models of rational inattention. We characterize choice behaviors associated with a general
model of rational inattention. Our model encompasses all those currently in the literature.
The necessary and su¢ cient conditions on the data are simple and intuitive. We experimentally
elicit “state dependent”stochastic choice data of the form required to perform the corresponding
tests. Rational inattention theory matches these data better than do random utility models that
ignore the link between incentives and attention.
1
Introduction
Limits on attention impact choice. Shoppers may buy unnecessarily expensive products due to
their failure to notice whether or not sales tax is included in stated prices (Chetty et al. [2009]).
Buyers of second-hand cars focus their attention on the leftmost digit of the odometer (Lacetera et
al. [2012]). Purchasers limit their attention to a relatively small number of websites when buying
over the internet (Santos et al. [2012]).
Given its e¤ect on choice, the forces that determine attentional e¤ort are being intensively studied. The rational inattention model (Sims [1998, 2003]) has been particularly in‡uential, capturing
as it does the balancing act between the improved decision quality that attention produces and the
entailed costs in terms of time and e¤ort. Rational inattention theory is being applied to capture a
wide array of phenomena, from price stickiness to consumption dynamics to portfolio choice.1 Yet
understanding the implications of rational inattention theory can be challenging. The information
that a decision maker internalizes is not directly observable, making it di¢ cult to di¤erentiate
between rationally inattentive choices and internally inconsistent mistakes.
In this paper we develop a simple characterization of the stochastic choice behavior consistent
with rational inattention and implement the corresponding tests in a laboratory experiment. We
make no assumption on the nature of informational costs or informational constraints, and so
We thank Roland Benabou, Federico Echenique, Andrew Ellis, Daniel Martin, Filip Matejka, Alisdair McKay,
Stephen Morris, Pietro Ortoleva, Daphna Shohamy, Severine Toussaert, Isabel Trevino, Laura Veldkamp and Michael
Woodford for their constructive contributions. We also thank Samuel Brown for his exceptional research assistance.
y
Center for Experimental Social Science and Department of Economics, New York University. Email: [email protected]
z
Department of Economics, Brown University. Email: [email protected]
1
See Sims [2006], Tutino [2008], Woodford [2008], van Nieuwerburgh and Veldkamp [2009], Mackowiak and Wiederholt [2009], Mackowiak and Wiederholt [2010], Matejka [2010] and Paciello and Wiederholt [2011] for applications
of rational inattention. See Marschak [1971] for an earlier formulation of similar ideas.
1
our model represents a signi…cant generalization of those in the current literature.2 The key to
our approach is the observation that a decision maker’s pattern of stochastic choice can be used to
recover information about their attention strategy. The resulting necessary and su¢ cient conditions
for rational inattention are simple and intuitive. A “no improving action switch”(NIAS) condition
ensures that choices are optimal given what was learned about the state of the world (see Caplin
and Martin [2011]). A “no improving attention cycles”(NIAC) condition ensures that total utility
cannot be raised by reassigning attention strategies across decision problems. These conditions
are robust. One can insist that costs associated with more informative attention strategies (in
the sense of Blackwell) are no lower than those associated with less informative strategies. One
can insist on the feasibility of mixed attention strategies. One can set inattention to be costless.
Even with these additional restrictions, the NIAS and NIAC conditions fully characterize rationally
inattentive behavior.
As with many existing models of rational inattention, our model allows for stochastic choice.
We consider a data set which consists of “state dependent” stochastic choice data in which choice
probabilities depend on the underlying state of the world.34 We assume that, while perfectly
observable to the researcher, it is potentially costly for a decision maker to learn the true state.5
For example, the econometrician may know whether or not sales taxes are included in stated prices
even if consumers do not. As we demonstrate, state dependent data of this form is readily available
in the experimental laboratory. Under appropriate conditions, it can also be extracted from …eld
data.6
Current models of stochastic choice are predominantly based on the concept of random utility, in the manner of McFadden [1974] and Luce and Suppes [1965].7 In Random Utility Models
(RUMs), stochastic choice derives from variance in the tastes of the decision maker. Unlike models
of rational inattention, this underlying uncertainty does not respond to the incentives in a given
decision problem. Matejka and McKay [2011] establish an important link between rational inattention and random utility models. When attention costs are proportionate to Shannon mutual
information, demand functions in the rational inattention model are of a generalized logit form,
and are therefore functionally similar to an important class of RUM.8 Yet there are also important
di¤erences in implied behavior. On the one hand, random utility models can violate NIAS and
NIAC. On the other, as pointed out by Matejka and McKay [2011],9 rational inattention can lead
to violations of a monotonicity condition central to random utility models: when a newly available
act incentivizes additional attention, the resulting knowledge may make it optimal to choose acts
that were previously unchosen.
2
Followings Sims [2003], much of the literature assumes that costs are based on the Shannon mutual information
between prior and posterior information states. Woodford [2012] suggests an alternative information cost function,
based on the concept of a Shannon capacity, which is consistent with certain psychological experiments. Gul et al.
[2012] and Ellis [2012] take a di¤erent approach, assuming limits on the partitional structures of information available
to the decision maker.
3
While little studied in economics, state dependent stochastic choice data has featured in psychometric experiments
on perception dating back at least to Weber (see Murray [1993]).
4
Other authors have used di¤erent data sets to capture the implications of rational inattention. Ellis [2012] uses
state dependent deterministic choice, while Mihm and Ozbek [2012] use choice over menus.
5
This inverts the interpretation that stochasticity in choice arises from the opposite asymmetry: factors that are
unobserved to the econometrician yet observed by the decision maker (Manski [1977]).
6
For example, under a representative agent assumption, the distribution of consumption patterns under di¤erent
tax regimes can be interpreted as state dependent stochastic choice.
7
See also Falmagne [1978], McFadden and Richter [1990], Clark [1996], McFadden [2005] and Gul and Pesendorfer
[2006].
8
Caplin and Dean [2013] characterize stochastic choice behavior for a broad class of entropy-based cost functions.
9
Ergin [2003] presents a similar example.
2
Our ability to distinguish between randomness in choice induced by imperfect attention and
that induced by stochastic utility relates to a long-standing challenge in the literature. Block and
Marschak [1960] stressed both the importance and the di¢ culty of separating out “perceptibility” and “desirability” based explanations for randomness in choice. Our results imply that state
dependent stochastic choice data is su¢ ciently rich to provide just such a separation.
As an application of the NIAS and NIAC conditions, we experimentally elicit state dependent
stochastic choice data in simple decision problems. We present subjects with a screen containing
only red and blue dots, with the proportion of red dots determining the state of the world. They
then choose from a set of available acts, the payo¤s to which depend on the state. We repeatedly
present subjects with the same decision problem to estimate their probability of choosing each act
in each state. Unlike most psychology experiments (for example Navalpakkam and Perona [2009]),
our experiment has no time limit, so that subjects can in principle perfectly determine the state of
the world. The fact that they choose not to results precisely from their internal trade o¤ between
additional input of cognitive e¤ort and time on the one hand and monetary reward on the other.
We use the state dependent stochastic choice data from our experiment to recover subjects’
revealed posterior beliefs and attention strategies. We use these in determining how well the NIAS
and NIAC conditions …t the data. We …nd that aggregate data aligns well with both conditions,
and furthermore that losses due to violations are small at the individual level. We also use our
experiment to discriminate between rational inattention theory and random utility theory. We …nd
behavior that is inconsistent with the latter model as it exhibits monotonicity violations of the type
suggested by Matejka and McKay [2011]. Overall, we conclude that rational inattention theory is
a reasonable starting point for modeling stochasticity in simple choices when attentional e¤ort is
chosen.
To our knowledge, ours is the …rst paper to experimentally collect state dependent stochastic
choice data to test models of rational inattention. In theoretical terms, two papers close in spirit
to ours are Matejka and McKay [2011] and Ellis [2012]. The former analyzes the implications of
rational inattention for state dependent stochastic choice data with costs based on Shannon mutual
information. The latter provides an axiomatic characterization of a model of rational inattention in
which a decision maker is equipped with a set of possible information partitions, and must choose
the most appropriate for a given decision problem. We generalize the results of both papers since
we put no a priori restrictions on the type of costs or information constraints that the decision
maker faces.
Section 2 introduces our formulation of the rational inattention model with an unrestricted cost
function. Section 3 derives the testable implications of this general model for state dependent choice
data. Section 4 shows that this characterization is unchanged by addition of natural restrictions
on the cost function. Section 5 contrasts rational inattention theory with random utility theory.
Section 6 details our experimental design, while section 7 presents experimental results. Section 8
relates our work to the broader literature. Section 9 concludes.
2
2.1
A General Model of Rational Inattention
Model
We consider a decision making environment comprising a set of possible states of the world and a
set of acts, the payo¤s of which are state dependent. A given decision problem consists of a prior
distribution over states of the world and the subset of acts from which the decision maker (DM)
3
must choose.10 The key assumption underlying models of rational inattention is that the true state
of the world is knowable in practice, but obtaining (or processing) information is costly. Thus,
prior to choosing which act to take, the DM chooses an attention strategy. In line with the rational
inattention literature, an attention strategy is described in an abstract manner - as a stochastic
mapping from states of the world to subjective signals, the probabilities of which depend on the
state of the world. Subject only to Bayes’ rule, the DM is free to choose any attention strategy
they wish, but will face a utility cost of doing so. Having selected an attention strategy, the DM
can condition choice of act only on the subjective signal they receive.11 Their payo¤ is then given
by the expected utility of the chosen act less the costs of attention.
Figure 1 illustrates an attention strategy of a …rm that is choosing which of two prices to charge,
high or low. Pro…ts depend on the price chosen and the underlying state of the economy, which
can be good, medium or bad (G, M and B) with probabilities 61 , 12 and 13 respectively.
Figure 1: An Example of an Attentional Strategy
In this example, the …rm has chosen an attention strategy that produces two subjective signals,
R and S. If demand is good they receive signal R for sure, while if demand is bad they receive
signal S for sure. If demand is medium, they receive each signal with probability 12 . Upon receiving
signal R, the …rm sets prices high. Conditional on receiving this signal the probability of state G
is 52 and of state M is 35 . The expected utility of action H is therefore 52 times the utility that H
gives in state G and 35 times the utility that H gives in M . The …rm sets price L upon receipt of
signal S, where the conditional probabilities are 74 M and 37 L. The net bene…t to the …rm from
this attention strategy is this expected “gross bene…t” of the acts chosen net of attention costs.
Formally, a decision making environment is identi…ed by a …nite set of states of the world,
= fm 2 Nj1
m
M g and a …nite set of acts F , with F = 2F =; comprising all non-empty
f
subsets. We take as given a state dependent utility function U :
F ! R and let Um
denote
the utility of act f in state m. De…ne = ( ) as the set of probability distributions over states.
Given 2 , m denotes the probability of state m. A decision problem consists of a prior belief
2 over states of the world and a non-empty set of acts A 2 F from which the DM must choose.
For any prior , de…ne
= fm 2 j m > 0g as the associated support.
10
11
Both of which are assumed known to the DM.
And possibly an independent randomizing device, by which we allow for mixed strategies.
4
An attention strategy maps each state of the world to a probability distribution over subjective
signals. Since we will be characterizing expected utility maximizers, we identify a subjective signal
with its associated posteriors beliefs 2 , which is equivalent to the subjective information state
the DM …nds them in following the receipt of that signal. For a particular prior, the set of feasible
attention strategies is the set of stochastic mappings from objective information states to subjective
signals that satisfy Bayes’law.
De…nition 1 Given 2 ; the set of feasible attention strategies
comprises all mappings
:
! ( ) that have …nite support ( )
and that satisfy Bayes’ law, so that for all m 2
and 2 ( ),
m m( )
,
m = M
X
j j( )
j=1
where
m(
)
(m)(f g). De…ne
=[
2
.
Note that m ( ) can be interpreted as the probability of signal conditional on state of the
world m.12
Rational inattention theory models a DM who chooses attention strategies to maximize gross
payo¤s net of information costs. The gross payo¤ associated with attention strategy 2
in
decision problem ( ; A) is calculated assuming that acts are chosen optimally at each posterior
state. Let G :
F
! R denote the gross payo¤ of using a particular attention strategy in a
particular decision problem:
#
" M
X X
G( ; A; ) =
m m ( ) g( ; A)
2 ( )
where g( ; A) = max
f 2A
m=1
M
X
f
m Um :
m=1
An attention cost function maps priors and attention strategies to the corresponding level of
disutility. We allow costs to be in…nite for some strategies for two reasons. As a technical convenience, it enables us to ignore the Bayesian constraint by setting the cost of strategies inconsistent
with the speci…ed prior to be in…nitely high.13 More importantly, it means that our model nests
those that rule out certain types of information acquisition - for example by putting a hard limit
on the mutual information between priors and posteriors (Sims [2003]), by allowing only partitional information structures (Ellis [2012]), or by allowing only normal signals (Mondria [2010]).
These models can be accommodated in our framework by setting to in…nity the cost of infeasible
strategies.
To avoid triviality we assume that feasible attention strategies exist for all priors.
De…nition 2 An attention cost function is a mapping K :
! R with K( ; ) = 1 for
2
=
and K( ; ) 2 R for some 2
. We let K denote the class of such functions.
We make the standard assumption that attention costs are additively separable from the prizebased utility derived from the acts taken. We let ^ : K
F !
map cost functions and
12
Formally, attention strategies are equivalent to compound lotteries or temporal lotteries as analyzed in models
that capture preferences over the timing of the resolution of uncertainty (Kreps and Porteus [1978]).
13
As in Mihm and Ozbek [2012].
5
decision problems into rationality inattentive strategies. These are the strategies that maximize
gross payo¤s minus attention costs,
^ (K; ; A) = arg sup fG( ; A; )
K( ; )g :
2
2.2
State Dependent Stochastic Choice Data
The idealized data set that we use to test the model of rational inattention is state dependent
stochastic choice data. For a given decision problem ( ; A) we observe the probability of choosing
each act f 2 A in each state of the world m 2
. It is the fact that we observe choice probabilities
in each state that di¤erentiates this from standard stochastic choice data.
Formally, given 2 , let Q be the set of mappings from
to probabilities over F , with
f
Q = [ 2 Q . Given q 2 Q , we let qm denote the probability of the DM choosing act f in state m
and denote as F (q) F the set of acts chosen with non-zero probability in some state of the world
under stochastic choice function q,
o
n
f
:
F (q) = f 2 F jqm
> 0 for some m 2
Data of this general form is standard in psychometric research and can be readily gathered in the
laboratory, as we demonstrate in section 6.
We are interested in when the conditions under which we can …nd an attention cost function
and resulting optimal attention strategy that would generate the pattern of stochastic choice that
we in fact observe.
De…nition 3 Given decision problem ( ; A) 2
F, an attention strategy
with q 2 Q if there exists a choice strategy C : ( ) ! (A), such that:
2
is consistent
1. Final choices are optimal:
f 2 supp (C( )) =)
M
X
f
m Um
m=1
M
X
g
m Um
m=1
all g 2 A.
2. The attention and choice functions match the data,
X
f
f
qm
=
m ( )C ( ):
2 ( )
Note that we do not restrict the DM to pure strategies at any posterior. However, any act
chosen with strictly positive probability must be utility maximizing given beliefs.
We assume that data has been gathered on some …nite set D of decision problems. The main
theoretical aim of the paper is to characterize the conditions under which all such state dependent
stochastic choice data is consistent with rational inattention.
De…nition 4 A state dependent stochastic choice data set (D; q) comprises a …nite set of decision
problems D
F and a function q : D ! Q, with q( ; A) 2 Q and F ( ; A) F [q( ; A)] A.
~ ~ ) if there exists K
~ 2 K and
Such a data set has a rational inattention representation (K;
( ;A)
~ :D!
such that, for all ( ; A) 2 D, ~ ( ; A)
~
is consistent with q( ; A) and satis…es
( ;A)
^
~
~
2 (K; ; A).
6
The existence of such a representation implies that the cost function is well-behaved, in the
sense that an optimal attention strategy exists.
Throughout this paper, we assume that the DM’s expected utility function and prior beliefs
over states of the world are both known to the researcher - only attention strategies and costs are
not directly observable. This assumption is in line with the focus of the paper, but is not central
to our approach. By enriching the data set, we could recover beliefs and preferences from the
choice data of the DM, and use these as a starting point for our representation. In order to recover
utility, we could replace the “Savage style” acts we use in this paper (which map deterministically
from states of the world to prizes) with “Anscombe-Aumann”acts that map states of the world to
probability distributions over the prize space. Assuming the DM does maximize expected utility,
U could then be recovered by observing choices over degenerate acts (i.e. acts whose payo¤ are
state independent).14 If we further add to our data set the choices of the DM over acts before the
state of the world is determined (or at least in a situation in which they cannot exert any e¤ort to
determine that state) then we can also recover the DM’s prior over objective states (again assuming
expected utility maximization).
3
The Characterization
The main theorem of this section establishes two conditions as necessary and su¢ cient for (D; q)
to have a rational inattention representation. We use simple examples to illustrate the role of
these conditions before stating the main theorem, the proof of which is in the appendix. The …rst
condition - which establishes optimality of …nal choice given an attention strategy - applies to each
decision problem separately. The second - which establishes optimality of the attention strategy applies to all decision problems that share a speci…c prior.
3.1
Minimal Attention Strategies
The key to our approach is the observation that, if a DM is rationally inattentive, then one can
learn much about their attention strategy from state dependent stochastic choice data. To begin
with, one can identify the average posterior beliefs that a DM must have had when choosing each
act.
De…nition 5 Given
2
and q 2 Q , de…ne the revealed posteriors r( ; q) : F (q) !
[r( ; q)(f )]m
f
rm
( ; q) =
f
m qm
M
X
by,
:
f
j qj
j=1
f
The revealed posterior rm
( ; q) is just the probability of state of the world m conditional on act
f being chosen given prior belief and state dependent stochastic choice data q. Note that this is
well de…ned whatever decision making procedure the DM is using. If the DM chooses each act in
at most one subjective information state then the revealed posteriors will in fact be the posteriors
that de…ne their attention strategy.15 If they choose the same act in more than one subjective state
14
This is the approach taken by Ellis [2012]. Caplin and Martin [2011] present an alternative approach that allows
for an unknown utility function.
15
As they would do if more informative information structures are more costly.
7
then the revealed posterior will be equivalent to the weighted average belief across all posteriors at
which that act is chosen.
We can use the revealed posteriors to construct a “revealed”attention strategy for each decision
problem. We do so by assuming that any act is chosen in at most one subjective state. Under
this assumption we can identify the resulting attention strategy directly from the state dependent
stochastic choice data. To do so, we …rst associate a subjective signal with each revealed posterior.
The probability of receiving a given signal from an objective state m is calculated by …rst identifying
all acts which have the revealed posterior associated with that signal, then by calculating the
probability of choosing any such act in objective state m.
f
De…nition 6 Given 2 , q 2 Q, m 2
, and = rm
( ; q) for some f 2 F (q), de…ne the
(
;q)
minimal attention strategy
2
to satisfy,
X
f
( ;q)
qm
:
m ( )=
ff 2F (q)jr( ;q)= g
The following examples illustrate the concept of revealed posteriors and the minimal attention
strategy.
Example 1 Consider the following state dependent stochastic choice data generated by a decision
problem in which there are two states with 1 = 0:3, and two chosen acts ff; gg
q1f
q1g
=
0:6
,
0:4
q2f
q2g
=
0:4
0:6
The revealed posterior of state 1 given the choice of f is given by
r1f ( ; q) =
f
1 q1
f
1 q1
f
2 q2
+
=
0:18
9
=
0:46
23
In like manner, we can identify the revealed posteriors associated with each act,
r1f ( ; q)
r2f ( ; q)
=
9
23
14
23
,
r1g ( ; q)
r2g ( ; q)
=
2
9
7
9
The resulting minimal attention strategy would therefore have two posterior states,
2
=
2
9
7
9
1
=
9
23
14
23
and
. The conditional signal probabilities for each posterior state are given by the associated
choice probabilities. For example,
( ;q) 2
( )
1
=
X
fk2F (q)jr( ;q)=
q1k = q1g = 0:4:
2g
Now consider a di¤ erent set of choice data for the same two state problem, in which three acts
are chosen,
0 f 1 0
1 0 f 1 0
1
0:6
0:4
q1
q2
@ q g A = @ 0:3 A , @ q g A = @ 0:45 A
1
2
0:1
0:15
q1h
q2h
In this case, the revealed posteriors for acts g and h are identical and equal to 2 . The minimal
attention strategy therefore again has two posterior states ( 1 and 2 ). The probability of receiving
8
the signal associated with state 2 from any state of the world is equal to the probability of choosing
g or h, as both have 2 as their revealed posterior
X
( ;q) 2
q1k = q1g + q1h = 0:4
(
)
=
1
fk2F (q)jr( ;q)=
2g
If a rationally inattentive DM chose each act in at most one subjective state, and did not
use mixed strategies at any posterior, then the minimal attention strategy would be their true
strategy. More generally, any attention strategy consistent with the data must be weakly more
informative than the minimal attention strategy, in the sense of statistical su¢ ciency. Intuitively,
this means that the minimal attention strategy can be obtained by “adding noise” to the true
attention strategy.
De…nition 7 Attentional strategy 2 is su¢ cient for attention strategy 2 P(equivalently
ij = 1
is a garbling of ) if there exists a j ( )j j ( )j stochastic matrix B 0 with
j2 ( ) b
all i and such that, for all j 2 ( ) and m 2 ,
X
j
bij m ( i ):
m( ) =
i2
( )
Lemma 1 establishes that any consistent attention strategy must be su¢ cient for the minimal
attention strategy.
Lemma 1 If
2
is consistent with ( ; A) 2
F and q 2 Q , then it is su¢ cient for
( ;q) :
In particular, this means that if the DM is indeed rationally inattentive, then their ‘true’
attention strategy must be more informative than their revealed attention strategy.
Blackwell’s theorem establishes the equivalence of the statistical notion of “more informative
than” (su¢ ciency) and the economic notion “more valuable than”. If attention strategy
is
16
su¢ cient for strategy , then it yields (weakly) higher gross payo¤s in any decision problem.
This result plays a signi…cant role in our characterization.
Remark 1 Given decision problem ( ; A) 2
F and ; 2
G( ; A; )
3.2
with
su¢ cient for ,
G( ; A; ):
No Improving Actions Switches
Our …rst condition ensures that the choice of acts at any given posterior is optimal. Consider the
trivial case in which only one decision problem ( ; A) is observed. Suppose that there are two states
with 1 = 2 = 0:5, two available acts, A = ff; gg, and state dependent payo¤s (U1f ; U2f ) = (0; 10)
and (U1g ; U2g ) = (20; 0). Finally suppose that the behavioral data set speci…es:
1
q1g = q1f = ;
2
2 g
1
f
q2 =
; q2 = :
3
3
16
See Cremer [1982] for a statement and simple proof of Blackwell’s theorem.
9
One can readily con…rm that there is no attention strategy consistent with this behavioral
data. This is because, at the revealed posterior when f is chosen, act g would have given higher
utility. The posterior probability that the true state is 1 when f is chosen is
f
1 q1
f
1 q1
+
=
f
2 q2
3
7
Given these beliefs, the payo¤ of taking act g is 73 :20 + 47 :0 = 60
7 , while the payo¤ of act f is
3
4
40
:0
+
:10
=
.
7
7
7
If the minimal attention strategy was in fact the one that the DM used, the revealed posterior
associated with each act would be the only posterior at which this act was chosen. In this case
existence of such an improving switch would be a direct violation of the rationally inattentive
model. With a more general attention strategy, rational inattention implies that f must be weakly
preferred to g at each state in which f is chosen. This means that f must also be weakly preferred
to g at the weighted average of these posteriors which forms the revealed posterior.17
The NIAS condition rules out cases in which there are improving switches of this form. It
speci…es that, when one identi…es in the data the revealed posterior associated with any chosen act,
this act must be optimal at that posterior.
Condition D1 (No Improving Action Switches) Data set (D; q) satis…es NIAS if, for every
( ; A) 2 D and f 2 F ( ; A),
M
X
f
f
rm
( ; q) Um
m=1
M
X
f
g
rm
( ; q) Um
;
m=1
all g 2 A.
The NIAS condition was introduced by Caplin and Martin [2011]. A similar condition is introduced in a game theoretic setting by Bergemann and Morris [2011].
3.3
No Improving Attention Cycles
Our second condition restricts choice of attention strategy across decision problems that share
the same prior. Essentially, it cannot be the case that the total gross utility can be increased
by reassigning attention strategies across decision problems.18 The following example illustrates a
violation of this condition.
Consider again the decision problem above with two equiprobable states and two available acts,
A = ff; gg, and with the state dependent payo¤s,
(U1f ; U2f ) = (10; 0); (U1g ; U2g ) = (0; 20):
Suppose now that the observed choice behavior is as follows (using the choice set A as an argument),
q1A (f ) =
q2A (f ) =
2
=1
3
1
=1
3
17
q1A (g);
q2A (g):
See the proof of theorem 1 for details of this argument.
Because we allow the attention cost function to vary arbitrarily with the prior, we need check only that there are
no such reassignments amongst decision problems with the same prior.
18
10
Now consider a second decision problem di¤ering only in that the act set is B = ff; hg, with
(U1h ; U2h ) = (0; 10), with the corresponding state dependent data set,
3
=1
4
1
=1
4
q1B (f ) =
q2B (f ) =
q1B (g);
q2B (g):
The speci…ed data looks problematic with respect to rational inattention. Act set A provides
greater reward for discriminating between states, yet the DM is more discerning under act set B. To
crystallize the resulting problem, note that, for behavior to be consistent with rational inattention
for some cost function K it must be the case that,
G( ; A;
A
G( ; B;
B
)
)
K( ;
A
K( ;
B
)
)
G( ; A;
B
G( ; B;
A
)
)
K( ;
B
);
K( ;
A
):
While we do not observe attention strategies directly, it is immediate that G( ; i; i ) = G( ; i; i )
for i 2 fA; Bg. Furthermore, as i is su¢ cient for i , Blackwell’s theorem tells us that G( ; i; j )
G( ; i; j ) for i:j 2 fA; Bg (see Remark 1): Thus we can insert the minimal attention strategies in
the calculation of gross bene…ts and the above inequalities must still hold.
Substituting in for minimal attention strategies and rearranging implies
G( ; A;
A
)
G( ; A;
B
)
K( ;
A
)
K( ;
B
)
G( ; B;
A
)
G( ; B;
B
)
Clearly, for any such cost function to exist, it must be the case that the …rst term is greater
than the last term, which in turn implies that
G( ; A;
A
) + G( ; B;
B
)
G( ; A;
B
) + G( ; B;
A
);
(1)
indicating that total gross bene…t must be maximized by the assignment of minimal attention
strategies to the speci…c decision problem with which they are associated in the data. Plugging in
the minimal attention strategies from our example data, we …nd that G( ; A; A ) + G( ; B; B ) =
17 12 , while G( ; A; B ) + G( ; B; A ) = 17 11
12 . Thus, there is no cost function that can be used to
rationalize this data.
The following general assumption ensures that there are no such cycles of gross utility improving
strategy switches.
Condition D2 (No Improving Attention Cycles) Data set (D; q) satis…es NIAC if, for any
2 and any set of decision problems ( ; A1 ); ( ; A2 ); :::::( ; AK ) 2 D with AK = A1 ,
K
X1
G( ; Ak ;
k
)
k=1
where
k
=
K
X1
G( ; Ak ;
k+1
);
k=1
[ ;q( ;Ak )] .
The NIAC condition has certain analogies with acyclicality conditions that are common in
revealed preference theory (such as the Strict Axiom of Revealed Preference). Consider the two
decision problem case described in equation 1. A further rearrangement of this condition implies
11
that
G( ; A;
A
)
G( ; A;
B
) + G( ; B;
B
)
G( ; B;
A
)
0
We can interpret G( ; A; A ) G( ; A; B ) < 0 as B being “revealed more costly” than A :
would have given higher gross value in decision problem ( ; A) than would A , so the fact that
it was not chosen must mean that it is more costly. Thus, this condition can be interpreted as
saying that if B has been revealed more costly than A , we cannot also have that A is revealed
more costly that B . In fact, the condition says something more than that because, unlike in the
standard revealed preference exercise, we have information about how much more costly B is than
A in terms of expected utility. Therefore if we have data revealing that B is more costly than
A by at least some amount x, we cannot also have information that implies that this di¤erence
must be less that x.
B
3.4
Characterization
Our main result is that NIAC and NIAS together are necessary and su¢ cient for a data set to have
a rational inattention representation.
Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es
NIAS and NIAC.
The key step in the proof that the NIAS and NIAC conditions are su¢ cient for (D; q) to have
a rational inattention representation is to connect this model with the linear allocation problem
analyzed by Koopmans and Beckmann [1957]. The cost function that we introduce is based on the
shadow prices that decentralize the optimal allocation in that model. Our result is also strongly
related to rationalizability conditions for quasi-linear preferences in the mechanism design literature
(see Rochet [1987]).
4
Monotonicity, Mixtures and Normalization
Theorem 1 states only that, if NIAS and NIAC hold, we can …nd some attention cost function that
rationalizes the data. No restrictions are placed on the form of the function. In this section we
consider three natural restrictions on attention cost functions: weak monotonicity with respect to
su¢ ciency; feasibility of mixed strategies; and costless inattention. In principle these restrictions
might tighten requirements for rationalizability of stochastic choice data, since they constrain costs
of unchosen strategies. Theorem 2 establishes that this is not the case: if state dependent stochastic choice is rationalizable, then it is rationalizable by a cost function that satis…es these three
conditions.
4.1
Weak Monotonicity
A partial ranking of the informativeness of attention strategies is provided by the notion of statistical
su¢ ciency (see de…nition 7). A natural condition for an attention cost function is that more
information is (weakly) more costly.
Condition K1 K 2 K satis…es weak monotonicity in information if, for any
; 2
with su¢ cient for ,
K( ; )
12
K( ; ):
2
and
Free disposal of information would imply this property, as would a ranking based on Shannon
mutual information (see also Mihm and Ozbek [2012] and Yang [2011]).19
4.2
Mixture Feasibility
In addition to using pure attention strategies, it may be feasible for the DM to mix these strategies
using some randomizing device.
De…nition 8 Given
+ (1
)
2 , attention strategies
2
is de…ned by,
m(
all 1
m
M and
)=
m(
;
2
2 [0; 1], the mixture strategy
, and
) + (1
)
m(
);
2 ( ) [ ( ).
The de…nition implies that the mixing is not of the posteriors themselves, but of the odds of the
given posteriors. To illustrate, consider again a case with two equiprobable states. Let attention
strategy be equally likely to produce posteriors (:3; :7) and (:7; :3), with equally likely to produce
posteriors (:1; :9) and (:9; :1). Then the mixture strategy 0:5
+ 0:5
is equally likely to produce
all four posteriors.
A natural assumption is that DMs can choose to mix attention strategies and pay the corresponding expected costs. They could ‡ip a coin and choose strategy if the coin comes down heads
and strategy if it comes down tails. In expectation the cost of this strategy would be half that of
and half that of . Allowing such mixtures puts an upper bound on the cost of the strategy 0:5
+ 0:5
. However, it does not pin down the cost precisely, since there may be a more e¢ cient
way of constructing the mixed attention strategy.
Condition K2 Mixture Feasibility: for all
2 (0; 1), the cost of the mixture strategy
K( ; )
4.3
2
=
and for any two strategies ; 2
+ (1
)
2
satis…es,
K( ; ) + (1
and
)K( ; ):
Normalization
It is typical in the applied literature to allow inattention at no cost, and otherwise to have costs be
non-negative. Given weak monotonicity, non-negativity of the entire function follows immediately
if one ensures that inattention is costless.
Condition K3 Given 2 , de…ne I 2
as the strategy in which m ( ) = 1 for 1 m M .
Attentional cost function K 2 K satis…es normalization if it is non-negative where realvalued, with K(I ) = 0 all 2 .
4.4
Theorem 2
Theorem 2 states that, whenever a rational inattention representation exists, one also exists in
which the cost function satis…es conditions K1 through K3. Whatever one thinks of the above
19
While in many ways intuitively attractive, this assumption may not be universally valid. In a world with discrete
signals it may be very costly or even impossible to generate continous changes in information. Moreover the DM may
be restricted to some …xed set of signals in which case less informative structures are essentially disallowed. It may
not be possible to automatically and freely dispose of information once learned.
13
assumptions on intuitive grounds, even if any one or all of them are in fact false, any data set that
can be rationalized can equally be rationalized by a function that satis…es them all.
Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention
representation with conditions K1 to K3 satis…ed.
This result has the ‡avor of the Afriat characterization of rationality of choice from budget sets
(Afriat [1967]), which states that choices can be rationalized by some utility function if and only if
they can be rationalized by a non-satiated, continuous, monotone, and concave utility function.
Note that the representation need not be unique. Conditions for recoverability are left for future
research.
4.5
No Strong Blackwell
Not all restrictions on the form of the cost function can be so readily absorbed as K1 through
K3. For example, there are data sets satisfying NIAS and NIAC yet for which there exists no cost
function that produces a rational inattention representation with a cost function that is strictly
monotonic with the informativeness of the information structure (i.e. if is su¢ cient for 0 but 0
is not su¢ cient for , then K( ; ) > K( ; 0 )).
A simple example with data on one decision problem with two equally likely states illustrates
that one cannot further strengthen the result in this dimension. Suppose that there are three
available acts A = ff; g; hg with corresponding utilities,
(U1f ; U2f ) = (10; 0) ; (U1g ; U2g ) = (0; 10) ; (U1h ; U2h ) = (7:5; 7:5) :
Consider the following state dependent stochastic choice data in which the only two chosen acts
are f and g,
3
q1f = q2g = = 1 q1g = 1 q2f :
4
Note that this data satis…es NIAS; given posterior beliefs when f is chosen, f is superior to g
and indi¤erent to h, and when g is chosen it is superior to f and indi¤erent to h. It trivially satis…es
NIAC since there is only one decision problem observed. We know from theorem 2 that is has a
rational inattention representation with the cost of the minimal attention strategy K ( ; ) 0 and
that of the inattentive strategy being zero, K( ; I ) = 0. Note that is su¢ cient for I but not
vice versa, hence any strictly monotone cost function would have to satisfy K ( ; ) > 0. In fact
it is not possible to …nd a representation with this property. To see this, note that both strategies
have the same gross utility,
G( ; A; ) =
1
2
3
4
10 +
1
2
3
4
10 = 1 7:5 = G( ; A; I );
where we use the fact that the inattentive strategy involves picking act h for sure. In order to
rationalize selection of the inattentive strategy, it must therefore be that is no more expensive
than I , contradicting strict monotonicity.
5
Rational Inattention and Random Utility
In this section we compare the rational inattention model with alternative models in which stochasticity in choice stems from randomness in the utility function (e.g. McFadden [1974], Loomes and
14
Sugden [1995]).20 Such random utility models (RUMs) take as given a probability measure over
some family of utility functions. Prior to making a choice, one utility function gets drawn from this
set according to the speci…ed measure. The DM then chooses to maximize this utility function.21
There is a …rst order di¤erence between standard RUMs and the rational inattention model.
While the latter produces choice behavior that di¤ers across states of the world, the former typically
conditions out all observable states, as a result giving rise to state independent choice (e.g. Falmagne
[1978], McFadden and Richter [1991], Clark [1996], McFadden [2005], and Gul and Pesendorfer
[2006]). In order to compare our general model of rational inattention to RUMs one must therefore
take a stand on what the DM knows about the true state of the world when they make their choice.
One extreme assumption is that the DM receives no signal about the objective state of the world,
and so makes choices based only on their prior beliefs, a model we refer to as the “Uninformed
RUM”. Formally, we de…ne the class of utility functions U to be all functions :
F ! R, with
22
( ; f ) the utility assigned by 2 U to act f 2 A at prior 2 . Letting denote a probability
measure over U, a data set (D; q) generated by an uninformed RUM is de…ned by,
f
qm
( ; A) =
f 2 Uj ( ; f ) > ( ; g) 8 g 2 Ag ; 23
for every ( ; A) 2 D and m 2 .
Such a model is easily di¤erentiated from the rational inattention model, as the probability of
choosing an act is invariant with the true state of the world: generally this is not the case in the
rational inattention model. On the other hand, as we demonstrate in the supplementary materiel,
the uninformed RUM allows for violations of NIAS. This is obvious from the fact that such a
DM has a single posterior belief, yet chooses many acts with distinct utilities according to any
non-trivial utility function.
A second extreme assumption is that the DM is fully informed about the state of the world
and chooses in each state according to a RUM. Letting m 2
be the degenerate probability
distribution specifying state m for sure, a perfectly informed RUM generates data of the following
form:
f
qm
( ; A) = f 2 Uj ( m ; f ) > ( m ; g) 8 g 2 Ag :
Again, the perfectly informed RUM can be easily di¤erentiated from the rational inattention
model.24 It implies that the probability of choosing an act in a particular state depends only on its
payo¤ in that state, and is independent of the payo¤s of any act in other states of the world and of
prior beliefs. This is generally not true of the rational inattention model. On the other hand, in the
20
Random choice models have been used in psychology since the work of Thurstone [1927] and Zermelo [1929].
In the case of choice over lotteries, the family of utility functions can be over the lotteries themselves or, following
Gul and Pesendorfer [2006], over the underlying prize space, with the utility of a lottery equal to its expectation
according to the selected utility function.
22
Note that random expected utility in the manner of Gul and Pesendorfer [2006] is a special case in which the
utility can be computed by weighting some underlying utilities on a prize space according to the prior probabilities.
23
In the case of utility functions in which the utility maximizing elements of A are not unique, a tie breaking rule
is needed. See Gul and Pesendorfer [2006].
24
Matejka and McKay [2011] identify an intriguing connection between the perfectly informed RUM and rational
inattention. One heavily used variant of the RUM is the logit model, which assumes that
is constructed using a
…xed “base” function 2 U which is then perturbed by an error term distributed according to an extreme value
type-I distribution. Applying this assumption to the perfectly informed RUM leads to a state dependent stochastic
choice function of the form,
21
f
qm
( ; A) = P
e
( m ;f )
g2A
e
( m ;g)
:
MM show that one can provide an analogous characterization of the state dependent stochastic choice associated
15
supplemental materiel we show how perfectly informed RUMs can violate both NIAS and NIAC.
In the former case this is because the DM can still choose inferior options with positive probability
under the RUM. In the latter, it is because the perfectly informed RUM can require the DM to
acquire a large amount of information even when that information is not instrumental for choice.
An intermediate version of the RUM is one in which the DM receives an exogenous and imperfect
signal about the state of the world before a utility function is randomly drawn. In formal terms,
starting with prior 2 A, the DM receives a …xed informative signal :
! ( ) that determines
the information at the point that the random utility function is drawn. In this case the generated
data set for any ( ; A) 2 D will be of the form,
X
f
qm
( ; A) =
m ( ) f 2 Ujv( ; f ) > ( ; g) 8 g 2 Ag ;
2 ( )
where ( ) is the set of possible posteriors associated with signal .
The partially informed RUM does not imply either of the obvious restrictions on the data that
we have so far discussed: choices can vary with the underlying state, and the payo¤s of an act in
one state can a¤ect behavior in another. This makes di¤erentiating between the partially informed
RUM and rational inattention more subtle. We pursue two approaches in the experiment that
follows, as now detailed.
Our …rst approach involves testing a simple monotonicity axiom that is an essentially universal
feature of all RUMs, including those in which signals are received prior to making a decision. This
axiom states that the addition of a new act to the set of available choices cannot increase the
probability that one of the pre-existing options will be chosen (Gul and Pesendorfer [2006] and
Luce and Suppes [1965]).
Monotonicity Axiom Given ( ; A) 2
F, h 2 F nA and m 2 ,
f
qm
( ; A)
f
qm
( ; A [ h) :
As Matejka and McKay [2011] show, the rational inattention model can lead to robust violations
of monotonicity. Their canonical example involves two states of the world, with prior 1 on state
1. There are three acts that may be available, with the following payo¤s in expected utility units,
U1f ; U2f
= (0; 1) ; (U1g ; U2g ) =
1 1
;
2 2
; and
U1h ; U2h = (Y; Y ) ;
where Y > 0. Assuming attention costs based on Shannon mutual information, Matejka and McKay
[2011] show that it is optimal to pay no attention and choose the safe act g when only f and g
are available, provided 1 is high enough. It is simply too expensive to bother trying to overturn
the prior. However, with h available also, it becomes more important to learn the true state
- increasingly so the higher is Y: The rationally inattentive agent may therefore select a more
informative attention strategy. If this learning suggests to the DM that state 2 is more likely, then
with the rational inattention model when attention costs are linear in Shannon mutual information,
f
q~m
( ; A) = P
Pfe
g2A
f
Um
k
P ge
g
Um
k
;
where P f is the unconditional probability of choosing act f and k > 0 scales the disutility of attention relative to
prizes. Thus, rational inattention with Shannon costs looks in certain respects like a state dependent logistic RUM.
16
it is optimal to choose act f , producing a violation. In section 7.4 we report on an experiment
designed to capture the intuition of this example.
The second method for distinguishing the partially informed RUM from the rational inattention
model involves identifying cases in which all utility functions have the same ranking over acts. Such
cases allow us to identify the signal underlying the partially informed RUM, which by assumption
should not vary with incentives. To make this precise, note that in many re…nements of the general
RUM (such as the random expected utility model of Gul and Pesendorfer [2006]), all possible utility
functions have the property that choice between stochastically ranked alternatives is deterministic.
Moreover, many simple decision problems generate posteriors in which acts are typically so ranked.
Consider simple cases with two states and two acts ff; gg with variation in the incentive to learn:
(U1f ; U2f ) = (0; Y ) and (U1g ; U2g ) = (Y; 0) for Y > 0. For any posterior belief 1 > 0:5, act f
stochastically dominates g, while for 1 < 0:5, act g dominates f , regardless of the reward value, Y:
Thus, if all utility functions obey stochastic dominance, any randomness in choice must be due to
the signal which, by assumption, is invariant to Y . Thus data generated by a partially informed
RUM should not respond, for example, to doubling the value of Y . General a rationally inattentive
agent will respond to such changes.
6
Experimental Design
6.1
Overview
We introduce an experimental design that produces state dependent stochastic choice data at the
subject level. Subjects are shown a screen on which there are displayed 100 balls, some of which
are red and some of which are blue. The state of the world is determined by the number of red
balls on the screen. Prior to seeing the screen, subjects are informed of the probability distribution
over states. Having seen the screen they choose from a number of di¤erent acts whose payo¤s are
state dependent. There is no external limit (such as a time constraint) on a subject’s ability to
collect information about the state of the world. At the same time, there is no extrinsic cost to
the subject of gathering information. Therefore the extent to which subjects fail to discern the true
state of the world is due to their unwillingness to trade cognitive e¤ort for monetary reward.
A decision problem is de…ned by the prior information and the set of available acts, as it is in
section 2.1. A subject faces each decision problem 50 times, allowing us to approximate their state
dependent stochastic choice function.25 In any given experiment, the subject faces four di¤erent
problems. All occurrences of the same problem are grouped, but the order of the problems is
block-randomized. In estimating the state dependent stochastic choice function we treat the 50
times that a subject faces the same decision making environment as 50 independent repetitions of
the same event.
6.2
The Experiments
Our study consists of four di¤erent experiments each run on between 23 and 44 subjects recruited
from the New York University student population.26 Each subject answered 200 questions as well
as 1 practice question. At the end of the session, one question was selected at random for payment,
25
To prevent subjects from learning to recognize patterns, we randomize the position of the balls. The implicit
assumption is that the perceptual cost of determining the state is the same for each possible con…guration of balls.
26
29 subjects took part in experiment 1, 33 in experiment 2, 24 in experiment 3 and 44 in experiment 4. Each
subject took part in only one experiment.
17
which reward was added to the show up fee of $10. Subjects took on average about 45 minutes to
complete a session. Instructions are included in the supplemental materiel.
The …rst three experiments were designed to study NIAC and NIAS in di¤erent types of decision
problem. Experiment 1 comprises four decision problems with two equally likely states: in state
1 there are 48 red balls and in state 2 there are 52. In each decision problem there are two acts
available ff i ; g i g with i indexing the decision problem. In each case, f is superior in state 1 while
g is superior in state 2. Across decision problems the reward for making the correct choice in state
1 (U1f U1g ) varies, while the reward for making the correct choice in state 2 (U2g U2f ) remains
…xed. This experiment therefore studies NIAS and NIAC in the context of asymmetric changes to
the value of attention. Table 1 describes the available acts in the four decision problems in this
experiment (payo¤s are in US$).
Table
Decision
Problem
1
2
3
4
1: Experiment 1
Payo¤s
U1f U2f U1g
10
0
0
20
0
0
10
0
5
30
0
0
Table
Decision
Problem
5
6
7
8
U2g
10
10
10
10
2: Experiment 2
Payo¤s
U1f U2f U1g
10
0
0
2
0
0
20
0
0
30
0
0
U2g
10
2
20
30
Experiment 2 also has four decision problems, each with two equally likely states (48 and 52
red balls) and two available acts ff i ; g i g, whose payo¤s vary between decision problems. The
key distinction with experiment 1 is that the reward to correctly identifying the state changes
symmetrically across decision problems, as described in table 2.
Experiment 3 studies whether subjects can focus their attention where it is most valuable. All
decision problems in this experiment involve four equally likely states comprising two identi…able
groupings. States 1 and 2 are perceptually hard to distinguish from one another, being de…ned
respectively by 29 and 31 red balls. States 3 and 4 are also hard to distinguish from another, being
de…ned respectively by 69 and 71 red balls. There are four decision problems with two possible
acts, still labelled f and g. The decision problems di¤er according to whether it is important to
di¤erentiate between states 1 and 2 (problem 10), states 3 and 4 (problem 9), neither (problem
11), or both (problem 12).
Table 3: Experiment 3
Payo¤s
f
f
f
Decision Problem U1 U2 U3 U4f U1g
9
1
0
10
0
0
10
10
0
1
0
0
11
1
0
1
0
0
12
10
0
10
0
0
U2g
1
10
1
10
U3g
0
0
0
0
U4g
10
1
1
10
Experiment 4 sets up the possible counter-example to monotonicity suggested by Matejka and
McKay [2011]. As in experiments 1 and 2 there are two equiprobable states, in this case with 49
red balls in state 1 and 51 in state 2.27 However there are now three acts, which we label fa; b; cg.
In decision problem 13 the subject must choose between safe act a which pays out 23 in each state
and act b which pays out 25 in state 2, but only 20 in state 1. Decision problem 14 introduces
27
We used 49 and 51 red balls to increase the cost of attentiveness in decision problem 13, making the increase in
attentiveness from adding act c more marked.
18
a third alternative c which pays out 30 in state 1 and 10 in state 2. Decision problems 15 and
16 increase the payo¤ of act c in state 1 but decrease it in state 2, further increasing the value of
attention.
Table 4: Experiment 4
Payo¤s
a
a
Decision Problem U1 U2 U1b U2b
U1c
U2c
13
23 23 20 25 n/a n/a
14
23 23 20 25
30
10
15
23 23 20 25
35
5
16
23 23 20 25
40
0
6.3
Preliminary Data Analysis
We use the simple two act, two state structure of experiments 1 and 2 to establish some key
properties of our data. First, our subjects produce choice data that is both stochastic and state
dependent. In aggregate subjects made mistakes (choosing the wrong act on 35% of all trials), yet
also make signi…cantly di¤erent choices in the two di¤erent states (act f was chosen 68% of the
time in state 1 and 38% of the time in state 2 - the hypothesis that choice behavior is the same
in both states can be rejected at the 0.001% level).28 These results suggest that our subjects are
absorbing some information about the state of the world, but are not fully informed when they
make their choice. Second, our data does not exhibit strong order e¤ects that might be associated
with learning or fatigue. Averaging over decision problems, the percentage of correct responses
in blocks 1-4 was 68%, 65%, 66%, and 65% respectively.29 Hence we ignore order e¤ects in the
remaining analysis.
7
Experimental Results
Overall, the results are supportive of NIAC and NIAS. Both condition are extremely close to holding
in the aggregate data. At the individual level, monetary losses due to violations of both conditions
are small. In contrast, we …nd that the monotonicity condition central to the RUM model is violated
in the manner predicted by rational inattention.
7.1
Experiment 1
In the two state/two act set up of experiment 1, NIAS implies existence of a cuto¤ posterior
probability of state 1 that determines the optimal act. For higher posteriors, act f is optimal,
while for lower posteriors, act g is optimal. Assuming risk neutrality, this cuto¤ is 12 in decision
problem (DP) 1, 13 in DP 2; 23 in DP 3; and 14 DP 4. These are shown in …gure 2 together with the
revealed posteriors associated with the choice of act f and act g at the aggregate level (treating all
data as if it was generated by a single subject). The data satisfy risk neutral NIAS in all but one
28
Estimated using a linear probability model with individual-level …xed e¤ects and standard errors clustered at the
individual level. All statistical tests reported use this method. These patterns hold true at the individual level. Of
the 62 subjects that took part in experiments 1 and 2, only 6% made mistakes in less than 10% of questions, while
84% had choice behavior that was signi…cantly di¤erent between the two states at the 10% level.
29
Regressing a binary variable indicating whether the correct choice was made on decision problem dummies and
a dummy indicating the order in which the treatment was seen by the subject suggests that these di¤erences are not
signi…cant: A test of the linear restriction that the block dummies are simultaneously equal to zero fails to reject the
null hypothesis at the 17% level.
19
case (the posterior of state 1 is 1% too high when g is chosen in DP 2).30 Figure 2 also demonstrates
that subjects’behavior varies signi…cantly with the decision problem.
Figure 2
Figure 3
At the individual level, while only 42% of subjects satisfy NIAS for every decision, the violations
that we observe lead to small losses. Figure 4 shows the distribution of costs of NIAS failures
in experiment 1. For each subject it measures the actual expected value of their choices minus
the expected value of the optimal choices given their revealed posteriors across all four decision
problems. As a benchmark, these losses are compared to those that would have been observed from
a population of subjects choosing at random.31 The observed distribution is signi…cantly di¤erent
from the simulated distribution at the 0.01% level (Kolmogorov-Smirnov test) (equivalent …gures
30
In order to calculate posterior beliefs we combine the conditional probabilities of choosing each act from each
state estimated from the data with prior beliefs about the likelhood of each state, rather than the empirical likelihood
of each state.
31
The use of random benchmarks has been discussed by, for example, Beatty and Crawford [2011]. The precise
procedure used to construct the random behavior is as follows: for each decision problem and for each state, a random
number is drawn for each available act. The probability of choosing each act from that state is then calculated as
the value of the random number associated with that act divided by the sum of all random numbers.
20
for experiments 2-4 are shown in the supplemental materiel.).
Figure 4
Figure 5
The binary version of the NIAC condition (i.e. considering cycles consisting of only two decision
problems) in this experiment relates the change in incentives between decision problems to the
change in i the percentage of time that the correct decision is taken in state i = 1; 2 (act f in
state 1, act g in state 2),
1
(U1f
U1g ) +
2
(U2g
U2f )
0,
(2)
where (x) indicates the change in x between the two decision problems. This expression has a
natural interpretation. The …rst term is the change in the probability of choosing the correct act
in state 1 multiplied by the change in the bene…t of choosing the right act - i.e. the di¤erence
between the payo¤ of act f and g in that state. The second term is the change in the probability
of choosing the right act in state 2 multiplied by the bene…t of so doing.
Experiment 1 is designed so that (U2g U2f ) = 0 for every pair of decision problems. Thus
this condition implies a ranking of j1 across decision problems 1 j 4,
4
1
2
1
1
1
3
1:
Figure 3 shows that this ranking holds true in the aggregate. The di¤erence between decision
problems is signi…cant at the 1% level.32
At the individual level, 48% of subjects satisfy NIAC, and again the losses due to NIAC violations are small. Figure 5 plots the distribution of actual surplus minus the maximal surplus
possible by reassigning attention strategies to decision problems for each individual. The NIAC
condition demands this number to be zero. As a comparator, we show the distribution obtained
from random choice. Again, the observed distribution is signi…cantly di¤erent from the simulated
distribution at the 0.01% level.
32
This test considers only bilateral comparisons of attention strategies. The NIAC condition requires more than this:
it must be impossible to increase total surplus through any reassignment of attention strategies between problems.
This condition holds on the aggregate data conditional on actual choice at each posterior, but not given optimal
choice at each posterior, as NIAS is violated in DP 2.
21
7.2
Experiment 2
Experiment 2 explores subjects’responses to symmetric changes in the value of making the correct
decision. Symmetry implies that, for each decision problem, the threshold posterior probability of
state 1 above which it is optimal to choose act f is 0.5. Table s1 in the supplemental materiel
shows that these conditions hold in aggregate. At the individual level, 67% of subjects obey NIAS,
the losses of those that do not are again relatively small (see …gure s1) and signi…cantly di¤erent
from those of the random benchmark (signi…cant at the 0.01% level).
With regard to NIAC, experiment 2 has the symmetry property whereby for every pair of acts
in each decision problem,
(U1f U1g ) = (U2g U2f ):
Applying this observation to equation 2 implies a ranking for
choosing the right action across both states,
8
1
+
8
2
7
1
+
7
2
5
1
+
5
2
6
1
1
+
+
2,
or the total probability of
6
2:
Figure 6 shows that these implications broadly hold in the aggregate data. The total proportion of
correct responses in DP 8 is higher than in DP 7 which is in turn higher than in DP 5. The total
proportion of errors in DP 6 is slightly higher than that in DP 5 (by 0.5%) but the di¤erence is
not statistically signi…cant.33
While the ordering of decision problems according to the proportion of correct choices is in line
with the theory, the elasticity in the response of information gathering to monetary incentives is
quite low. The probability of making a correct choice rises from 65% in the $2 treatment to 70%
in the $30 treatment (signi…cant at the 10% level). We discuss the low degree of responsiveness
further in Caplin and Dean [2013].
Figure 6
At the individual level, the distribution of losses due to NIAC violations in the experimental data
is shown in …gure s2 in the supplemental materiel. Again it is signi…cantly di¤erent from that of
the random benchmark at the 0.01% level.
33
Again errors bar that are shown take into account clustering at the subject level.
22
7.3
Experiment 3
The NIAS conditions take a somewhat di¤erent form in experiment 3. Act f is always superior in
states 1 and 3 and act g is superior in states 2 and 4. For it to be optimal to choose option f , it
must be the case that,
U1f 1 + U3f 3 U2g 2 + U4g 4 :
Given that U1f = U2g and U3f = U4g in experiment 3, provided
of f requires,
U1f
1
2
:
U3f
4
3
1
>
2
when f is chosen, optimality
Table s2 in the supplemental materiel shows that this condition holds at the aggregate level. At the
individual level, 58% of subjects obey NIAS and, as shown in …gure s1, the losses amongst those
that do not are again small and signi…cantly di¤erent from the random benchmark at the 0.01%
level.
With regard to the NIAC condition, the equivalent of condition 2 for experiment 3 is,
1
(U1f
U1g ) +
(U3f
3
U3g ) +
2
(U2g
U2f ) +
(U4g
4
U4f )
0:
This equation generates a set of six inequalities based on binary comparisons of the 4 decision
problems. For example, comparing decision problem 9 and 10 we have,
10
1
+
10
2
+
9
3
+
9
4
10
3
+
10
4
+
9
1
+
9
2
;
meaning that the increase in the proportion of correct choices in states 1 and 2 must be bigger than
the increase in proportion of correct choices in state 3 and 4 when shifting from decision problem
9 to 10. This makes sense, as relative to problem 9, problem 10 has higher rewards for correct
decisions in the former two states than in the latter two states, meaning that this is where subjects
should focus their attention. In the aggregate data, the average proportion of correct responses on
the left hand side of the equation is 72.8% compared to 64.9% on the right hand side, signi…cantly
di¤erent at the 1% level. Table s3 in the supplemental materiel shows that the other 5 conditions
also hold. Moreover losses from NIAC violations are again signi…cantly below those associated with
the random benchmark at the 0.01% level (see …gure s2).
7.4
Experiment 4
The primary purpose of experiment 4 was to test for violations of monotonicity in the manner
described by Matejka and McKay [2011]: we would expect that the additional incentive to discover
the true state generated by the introduction of act c would increase the likelihood of choosing
act b in state 2: Table 7 shows that this is fact the case. The introduction of act c increases the
probability of choosing act b in state 2 from 23% to an average of 35% across decision problems
14-16. While small, this increase is signi…cant at the 5% level.
Table 7
Decision problem
13
14
15
16
23
q1 (b)
21%
16%
20%
17%
q2 (b)
23%
30%
33%
42%
The supplemental materiel describes the NIAC and NIAS tests for experiment 4. At the aggregate level, NIAC holds, while NIAS holds apart from for the choice of b in which case the revealed
posterior puts somewhat too high a weight on state 1.
Further evidence against RUMs comes from our other experiments. Uninformed RUMs imply
that choice probabilities should be independent of the state, which is patently false in all cases.
Perfectly informed RUMs imply that choice probabilities in each state should be independent of
payo¤s in other states. Hence in experiment 1 choice probabilities in state 2 should be identical
across decision problems, when in fact they range from 15% to 74% (see …gure 3). Further evidence
against the partially informed RUM derives from tests of stochastic dominance. We tested 24
subject in experiment 1 for violations of stochastic dominance. We asked 10 question in which they
were to choose between act f which paid $10 only in state 1 and g in which they were paid $10 only
in state 2. The prior probability of state 1 varied from 10% to 90%. However, for these questions,
subjects did not get to see the screen with the balls, and had to make their decisions based purely
on the prior. Thus, subjects obeyed stochastic dominance if and only if they chose act f when the
prior on state 1 was greater than or equal to 50%. We found that indeed 83% of subjects obeyed
stochastic dominance, suggesting that for the majority of subjects all randomness we observe would
have to be due to the exogenous signal. This cannot be squared with the increase in attentiveness
as rewards increase in experiment 2: by assumption, the signal in the partially informed RUM is
exogenous.
8
Existing Literature
Much of the work on rational inattention in economics can be traced back to Sims [1998, 2003] which
characterize the behavioral impact of constraints on information processing in linear quadratic control problems.34 The rate of information ‡ow is measured by Shannon mutual information. Sims
[2003] shows that such a constraint generates behavior similar to that generated by an agent who
observes the state of the world only noisily. However, the type of noise is determined endogenously,
based on the incentives in the environment. Following this paper, mutual information constraints
have been incorporated in an increasing number of economic settings, including consumptionsavings problems (Sims [2006], Tutino [2008], and Mackowiak and Wiederholt [2010]), pricing
problems (Mackowiak and Wiederholt [2009], Matejka [2010], Martin [2013]), monetary policy
(Paciello and Wiederholt [2011]), and portfolio choice (Mondria [2010]).
In part, the focus on mutual information to measure the cost of information is justi…ed by
its central position in the information theory literature. The Shannon mutual information of two
random variables is related to the expected length in bits of the optimally encoded signal needed to
generate one from the other. It also has an axiomatic characterization which shows that information
costs must be of this form if they are to obey certain intuitive properties (see for example Csiszár
[2008]). Shannon mutual information costs also have interesting properties from an economic
standpoint. As discussed in the text, Matejka and McKay [2011] demonstrate a strong relationship
between mutual information based rational inattention and logit-style random choice. Cabrales et
al. [2013] demonstrate a further interesting link between mutual information and economic behavior.
They consider a “ruin averse”investor facing a class of no-arbitrage investment problems and show
that the ranking of information structures based on willingness to pay is equivalent to that provided
by mutual information.
While much of the rational inattention literature has focussed on mutual information costs,
34
Although the study of costly information aquisition in economics goes back much further - for example Stigler
[1961], Marschak [1971] and Milgrom [1981].
24
a variety of other cost functions and constraints have been studied. Woodford [2012] points out
that mutual information does not imply that less attention will be paid to rare events (as such
attention is cheap in expectation), in violation of experimental results due to Shaw and Shaw
[1977]. He therefore proposes an alternative measure in which the cost of an information structure
is evaluated according to the related concept of the Shannon capacity. Gul et al. [2012] consider
the behavior of households who are restricted to having “crude”consumption plans i.e. plans that
are restricted to having at most n realizations. van Nieuwerburgh and Veldkamp [2009] consider a
more general information cost function, based on the distance between prior and posterior variance.
Saint-Paul [2011] considers the case in which DMs face Shannon cost, but can only choose discrete
policy functions (essentially combining the approaches of Sims [2003] and Gul et al. [2012]). Reis
[2006] considers the case of a binary information choice: in any given period either attention can be
paid, and the state is fully revealed, or not, in which case no information is gathered. Even many
of the articles that use mutual information costs e¤ectively restrict the decision maker to choose
Gaussian signals, implying additional constraints (see Sims [2006] for a discussion).
A key strength of our approach is that our model nests all of the above costs functions. The costs
of allowable attention strategies can be captured by K, while the cost of inadmissible strategies
can be set to in…nity. The NIAS and NIAC conditions therefore provide a test of the entire class
of rational inattention models currently in use.35
A recent wave of decision theoretic literature shares the goal of capturing the observable implications of inattention, both rational and otherwise. Closest in spirit to our work is Ellis [2012], who
works with a data set similar to ours - state dependent choice functions. Ellis [2012] asks under
what conditions can such data be rationalized by a model in which a DM has a set of available
partitions on the state space, and selects the best partition for the decision problem at hand. The
characterization is based on the identi…cation of cells in the partition with all objective states in
which the same choice is made - an approach similar to that taken in our paper. This allows for
the identi…cation of the preferences revealed by choices.
There are three key di¤erences between the theoretical section of our paper and Ellis [2012].
On the one hand, he places weaker requirements on the data: unlike our approach, the DM’s utility
function and prior beliefs are derived from behavior rather than directly observable. On the other
hand he considers a more restrictive class of information restrictions: DMs e¤ectively face a cost
function which is zero for allowable partitions and in…nity for all other information structures. This
restriction rules out any stochasticity in choice, as well as many commonly used information cost
functions (such as those based on Shannon mutual information). Moreover, the conditions of Ellis
[2012] require the data to be observed from essentially all decision problems to be both su¢ cient
and necessary. In contrast, our tests work on data collected from any arbitrary collection of decision
problems.
A second decision theoretic approach to identifying rational attention is to examine choice over
menus. Ergin and Sarver [2010] consider a model in which a DM makes choices within choice sets by
optimally selecting a partition on (subjective and unobservable) states of the world, then choosing
the best action conditional on that partition.36 They characterize the implications of such a model
for choices between choice sets. Costly contemplation is characterized by an aversion to contingent
planning: an agent would prefer to …nd out which set they are choosing from and then choose from
that set, rather that have to make contingent plans. Mihm and Ozbek [2012] extend this approach
to the case in which there are observable states of the world, resulting in a representation similar
to that considered in this paper.
35
Note that we consider only the instrumental value of information, not any intrinsic value that information might
have as in Grant et al. [1998].
36
Ortoleva [2013] provides an alternative model of ‘thinking aversion’.
25
Our work forms part of an ongoing project aimed at characterizing choice behavior when the
internal information state of the agent is not directly observable. Block and Marschak [1960] early
on stressed the di¢ culty in separating out theories of stochastic choice. van Zandt [1996] provided
an explicit negative result in this regard, showing that any choice behavior is rationalizable in
a model that allows for unobserved costly information acquisition if the state of the world is
not observable. Caplin and Dean [2011] and Caplin et al. [2011] consider the case of sequential
information search, using an extended data set to derive behavioral restrictions of search of this
kind as well as of satis…cing behavior. Gabaix et al. [2006] use Mouselab data to test a near-optimal
model of sequential information search. Caplin and Martin [2011] introduce the NIAS condition to
characterize subjective rationality in a single decision problem. Masatlioglu et al. [2012] characterize
“revealed attention”, using the identifying assumption that removing an unattended item from the
choice set does not a¤ect attention. Dillenberger et al. [2012] consider a dynamic problem in which
the DM receives information in each period which is externally unobservable, characterizing the
resulting preference over menus. Fudenberg and Strzalecki [2012] also consider a dynamic problem,
characterizing dynamic stochastic choice rules that are consistent with rational inattention and
Shannon mutual information costs.
In the psychology literature, theories to which we are close in spirit are signal detection theory
(Green and Swets [1966]) and categorization theory. A common feature is that the DM receives a
signal and must choose the optimal action at each resulting posterior. These theories are connected
to enormous experimental literatures in psychology that capture state dependent stochastic choice
data.37 The chief distinction is that, unlike the rational inattention model, signal detection theory
generally …xes the attention strategy independent of the incentive to learn implied by the decision
making environment. These models are therefore a special case of our general formulation.
Despite the powerful psychological precedents, there is little experimental work on state dependent stochastic choice data within economics. There is so far as we know no work in either …eld
that tests NIAC and NIAS directly. One related paper is Cheremukhin et al. [2011], which uses
a formulation similar to Matejka and McKay [2011] to estimate a rationally inattentive model on
subject’s choice over lotteries. However they do not analyze the state dependence in the resulting
stochastic choice data.
9
Conclusions
As economists increasingly focus on attentional constraints, so the importance of rational inattention theory has grown. We characterize a general model of rational inattention which encompasses
all models currently in the literature. The necessary and su¢ cient conditions are simple and readily testable. We …nd the model to do a qualitatively good job of explaining subject behavior in a
simple experimental implementation. In contrast, traditional random utility models fail to capture
important data features.
In addition to further investigating the comparison with random utility models, we are currently
exploring the behavioral content of more structured models of attention costs, in particular the
Shannon model.38 We are also exploring the implications of the nature of attentional costs for
economic applications.
37
38
We do not attempt to summarize the literature here - see Verghese [2003] for a review.
See Caplin and Dean [2013].
26
References
S. N. Afriat. The Construction of Utility Functions from Expenditure Data. International Economic
Review, 8(1), 1967.
Timothy K. M. Beatty and Ian A. Crawford. How demanding is the revealed preference approach
to demand? American Economic Review, 101(6):2782–95, October 2011.
Dirk Bergemann and Stephen Morris. Correlated equilibrium in games with incomplete information.
Cowles Foundation Discussion Papers 1822, Cowles Foundation for Research in Economics, Yale
University, October 2011.
H. D. Block and J. Marschak. Contributions to Probability and Statistics, chapter Random Orderings and Stochastic Theories of Response. Stanford University Press, Stanford, 1960.
Antonio Cabrales, Olivier Gossner, and Roberto Serrano. Entropy and the value of information for
investors. American Economic Review, 103(1):360–77, February 2013.
Andrew Caplin and Mark Dean. Search, choice, and revealed preference. Theoretical Economics,
6(1), January 2011.
Andrew Caplin and Mark Dean. Rational inattention, entropy, and choice: The posterior-based
approach. Mimeo, Center for Experimental Social Science, New York University, 2013.
Andrew Caplin and Daniel Martin. A testable theory of imperfect perception. NBER Working
Papers 17163, National Bureau of Economic Research, Inc, June 2011.
Andrew Caplin, Mark Dean, and Daniel Martin. Search and satis…cing. American Economic
Review, 101(7):2899–2922, December 2011.
Anton Cheremukhin, Anna Popova, and Antonella Tutino. Experimental evidence on rational
inattention. Technical report, 2011.
Raj Chetty, Adam Looney, and Kory Kroft. Salience and taxation: Theory and evidence. American
Economic Review, 99(4):1145–77, September 2009.
Stephen A. Clark. The random utility model with an in…nite choice space. Economic Theory,
7:179–189, 1996.
Jacques Cremer. A simple proof of blackwell’s comparison of experiments theorem. Journal of
Economic Theory, 27(2):439–443, August 1982.
Imre Csiszár. Axiomatic Characterizations of Information Measures. Entropy, 10:261–273, 2008.
David Dillenberger, Juan Sebastian Lleras, Philipp Sadowski, and Norio Takeoka. A theory of
subjective learning. Technical report, 2012.
Andrew Ellis. Foundations for optimal attention. Mimeo, Boston University, 2012.
Haluk Ergin and Todd Sarver. A Unique Costly Contemplation Representation. Econometrica,
78(4):1285–1339, 2010.
Haluk Ergin. Costly Contemplation. 2003.
27
J C Falmagne. A Representation Theorem for Random Finite Scale Systems. Journal of Mathematical Psychology, 18:52–72, 1978.
Drew Fudenberg and Tomasz Strzalecki. Recursive stochastic choice. Mimeo, Harvard University,
2012.
Xavier Gabaix, David Laibson, Guillermo Moloche, and Stephen Weinberg. Costly information
acquisition: Experimental analysis of a boundedly rational model. American Economic Review,
96(4):1043–1068, September 2006.
Simon Grant, Atsushi Kajii, and Ben Polak. Intrinsic preference for information. Journal of
Economic Theory, 83(2):233–259, December 1998.
D. M. Green and J. A. Swets. Signal detection theory and psychophysics. Wiley, New York, 1966.
Faruk Gul and Wolfgang Pesendorfer. Random Expected Utility. Econometrica, 74(1):121–146,
2006.
Faruk Gul, Wolfgang Pesendorfer, and Tomasz Strzalecki. Behavioral competitive equilibrium and
extreme prices. Mimeo, Princeton University, 2012.
Tjalling C. Koopmans and Martin Beckmann. Assignment problems and the location of economic
activities. Econometrica, 25(1):pp. 53–76, 1957.
D.M. Kreps and E.L. Porteus. Temporal resolution of uncertainty and dynamic choice theory.
Econometrica, 46(1):185–200, 1978.
Nicola Lacetera, Devin G. Pope, and Justin R. Sydnor. Heuristic thinking and limited attention in
the car market. American Economic Review, 102(5):2206–36, September 2012.
Graham Loomes and Robert Sugden. Incorporating a stochastic element into decision theories.
European Economic Review, 39(3-4):641–648, April 1995.
R. Luce and P. Suppes. Handbook of Mathematical Psychology Vol. III, chapter Preference, utility,
and subjective probability. Wiley, New York, 1965.
Bartosz Mackowiak and Mirko Wiederholt. Optimal sticky prices under rational inattention. American Economic Review, 99(3):769–803, June 2009.
Bartosz Adam Mackowiak and Mirko Wiederholt. Business cycle dynamics under rational inattention. CEPR Discussion Papers 7691, C.E.P.R. Discussion Papers, February 2010.
Charles F. Manski. The structure of random utility models. Theory and Decision, 8:229–254, 1977.
Jacob Marschak. Economics of information systems. Journal of the American Statistical Association, 66(333):192–219, March 1971.
Daniel Martin. Strategic pricing and rational inattention to quality. Mimeo, New York University,
2013.
Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y. Ozbay. Revealed attention. American
Economic Review, 102(5):2183–2205, August 2012.
28
Filip Matejka and Alisdair McKay. Rational inattention to discrete choices: A new foundation
for the multinomial logit model. CERGE-EI Working Papers wp442, The Center for Economic
Research and Graduate Education - Economic Institute, Prague, June 2011.
Filip Matejka. Rationally inattentive seller: Sales and discrete pricing. CERGE-EI Working Papers
wp408, The Center for Economic Research and Graduate Education - Economic Institute, Prague,
March 2010.
Daniel McFadden and Marcel Richter. Preferences, Uncertainty and Rationality, chapter Stochastic
Rationality and Revealed Stochastic Preference. Westview Press, 1991.
Daniel McFadden. Frontiers in Econometrics, chapter Conditional logit analysis of qualitative
choice behavior. Academic Press, New York, 1974.
Daniel McFadden. Revealed stochastic preference: a synthesis. Economic Theory, 26(2):245–264,
08 2005.
Maximilian Mihm and M. Kemal Ozbek. Decision making with rational inattention. Working
paper, Social Science Research Network, 2012.
Paul R Milgrom. Rational expectations, information acquisition, and competitive bidding. Econometrica, 49(4):921–43, June 1981.
Jordi Mondria. Portfolio choice, attention allocation, and price comovement. Journal of Economic
Theory, 145(5):1837–1864, September 2010.
David J. Murray. A perspective for viewing the history of psychophysics. Behavioral and Brain
Sciences, 16:115–137, 2 1993.
Koch C. Navalpakkam, V. and P Perona. Homo economicus in visual search. Journal of Vision,
9(1):1–16, January 2009.
Pietro Ortoleva. The price of ‡exibility: Towards a theory of thinking aversion. J. Economic
Theory, 148(3):903–934, 2013.
Luigi Paciello and Mirko Wiederholt. Exogenous information, endogenous information and optimal
monetary policy. Technical report, 2011.
Ricardo Reis. Inattentive Consumers. Journal of Monetary Economics, 53(8):1761–1800, 2006.
Jean-Charles Rochet. A necessary and su¢ cient condition for rationalizability in a quasi-linear
context. Journal of Mathematical Economics, 16(2):191–200, April 1987.
Gilles Saint-Paul. A "quantized" approach to rational inattention. TSE Working Papers 10-144,
Toulouse School of Economics (TSE), 2011.
Babur De Los Santos, Ali Hortacsu, and Matthijs R. Wildenbeest. Testing models of consumer
search using data on web browsing and purchasing behavior. American Economic Review,
102(6):2955–80, October 2012.
M. L. Shaw and P. Shaw. Optimal allocation of cognitive resources to spatial locations. J Exp
Psychol Hum Percept Perform, 3(2):201–211, May 1977.
29
Christopher A. Sims. Stickiness. Carnegie-Rochester Conference Series on Public Policy, 49(1):317–
356, December 1998.
Christopher Sims. Implications of Rational Inattention. Journal of Monetary Economics, 50(3):665–
690, 2003.
Christopher A. Sims. Rational inattention: Beyond the linear-quadratic case. American Economic
Review, 96(2):158–163, May 2006.
George J. Stigler. The economics of information. Journal of Political Economy, 69:213, 1961.
Antonella Tutino. The rigidity of choice: Lifecycle savings with information-processing limits.
Technical report, 2008.
Stijn van Nieuwerburgh and Laura Veldkamp. Information immobility and the home bias puzzle.
Journal of Finance, 64(3):1187–1215, 06 2009.
Timothy van Zandt. Hidden information acquisition and static choice. Theory and Decision,
40:235–247, 1996.
Preeti Verghese. Visual search and attention: A signal detection theory approach.
31(4):523–535, 2003.
Neuron,
Michael Woodford. Information-constrained state-dependent pricing. NBER Working Papers 14620,
National Bureau of Economic Research, Inc, December 2008.
Michael Woodford. Inattentive valuation and reference-dependent choice. Mimeo, Columbia University, 2012.
Ming Yang. Coordination with rational inattention. Technical report, 2011.
30
10
10.1
Appendix A: Proofs
Lemma 1
Lemma 1 If
2
is consistent with ( ; A) 2
F and q 2 Q , then it is su¢ cient for
( ;q) :
Proof. Let 2
be an attention strategy that is consistent with q 2 Q in decision problem
( ; A) . First, we list in order all distinct posteriors i 2 ( ) for 1 i j ( )j. Given that
is consistent with q, there exists a corresponding optimal choice strategy C : f1; :::; Ig !
(A), with C i (f ) denoting the probability of choosing act f 2 F (q) with posterior i , such
that the attention and choice functions match the data,
f
qm
=
I
X
i
m(
)C i (f ):
i=1
We also list in order all possible posteriors j 2
all chosen acts that are associated with posterior
Fj
( ( ;q) ), 1
as F j ,
j
ff 2 F jrf ( ; q) =
j
j
j (
( ;q) )j,
and identify
g:
The garbling matrix bij sets the probability of j 2 given
all choices associated with acts f 2 F j .
X
bij =
C i (f ):
i
2 ( ) as the probability of
f 2F j
Note that this is indeed a j ( )j j ( )j stochastic matrix B
Given j 2 ( ) and m 2 , note that,
I
X
bij
i=1
i
m( ) =
I
X
i
m( )
i=1
X
C i (f ) =
f 2F j
0 with
X
PJ
j=1 b
ij
= 1 all i.
f
qm
;
f 2F j
by the data matching property. It is de…nitional that m ( j ) is precisely equal to this, as the
observed probability of all acts associated with posterior j 2 . Hence,
j
m( ) =
I
X
bij
m(
i
);
i=1
as required for su¢ ciency.
10.2
Theorem 1
Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es NIAS
and NIAC.
Proof of Necessity. To con…rm that existence of a rational inattention representation (K; ) of
(D; q) implies that the NIAS condition is satis…ed, consider ( ; A) 2 D and a corresponding choice
31
strategy C : (
( ;A) )
!
(A) such that …nal choices are optimal and that matches the data,
X
f
f
qm
=
m ( )C ( ):
( ;A) )
2 (
Given f 2 q (
;A) ,
for which C f ( ) > 0,
n
o
(f ) =
(( ( ; A))jC f ( ) > 0 :
consider all posteriors
Note that it must be the case that
f
2
M
X
M
X
f
m Um
m=1
g
m Um
m=1
all g 2 A for all posteriors
2 ,
Note also that we can rewrite the revealed posterior beliefs as a weighted average of the posterior
beliefs in the underlying attention strategy,
2
3
X
f
4
m ( )C ( )5
m
f
qm
2 ( ( ;A) )
f
=:
rm
( ; q) = M m
M
X
X
f
f
j qj
j qj
j=1
=
X
2 (
(
j=1
2
6
6
6
6
6
;A) ) 4
M
X
j=1
m
3
)C f ( ) 7
7
7
7=
M
7
X
f
5
q
j j
j m(
X
2 (
mP
f
( ):
( ;A) )
j=1
The second line above is obtained by dividing and multiplying each term in the sum by
M
X
j m(
)C f ( ),
j=1
or the probability of state , and P f ( ) indicates the probability of state
The value of choosing act f at its revealed posterior is therefore,
M
X
f
rm
(
f
; q) Um
=
m=1
X
2 (
=
M
X
m=1
P ( )
( ;A) )
M
X
f
m Um
m=1
( ;A) )
X
2 (
f
given that f was chosen
Pf( )
M
X
g
m Um
m=1
f
g
rm
( ; q) Um
8 g 2 A:
This establishes NIAS. The middle inequality above stems from the fact that
M
X
m=1
f
m Um
M
X
g
m Um
m=1
all g 2 A for every 2 ( ( ;A) ).
Next we show that, if there exists a rational attention representation (K; ) of (D; q), then
NIAC must hold For any sequence ( ; A1 ); ( ; A2 ); :::::( ; AJ ) 2 D with AJ = A1 , it must be the
32
case that,
J
X1
G( ; Aj ;
( ;Aj )
)
K( ;
( ;Aj )
J
X1
)
G( ; Aj ;
( ;Aj+1 )
)
K( ;
( ;Aj+1 )
j=1
j=1
J
X1
G( ; Aj ;
( ;Aj )
)
G( ; Aj ;
( ;Aj+1 )
J
X1
)
j=1
K( ;
( ;Aj )
)
K( ;
( ;Aj+1 )
))
) = 0:
j=1
where the last equality stems from the fact that K( ;
J
X1
j=1
G( ; Aj ;
( ;Aj )
)
J
X1
( ;A1 ) )
G( ; Aj ;
= K( ;
( ;Aj+1 )
( ;AJ ) ).
This implies that
)
j=1
It is immediate that G( ; Aj ; ( ;Aj ) ) = G( ; Aj ; ( ;Aj ) ) 8 j, as the minimal attention strategy
implies the same pattern of stochastic choice as does the original attention strategy. Moreover,
by lemma 1, we know that ( ;Aj ) is su¢ cient for ( ;Aj ) 8 j, and so, by Blackwell’s theorem
G( ; Aj ; ( ;Aj+1 ) )
G( ; Aj ; ( ;Aj+1 ) ) 8 j (see remark 1). Thus the true attention strategies
( ;Aj ) can be replaced by the minimal attention strategies ( ;Aj ) in the above expression and the
inequality will still hold, implying NIAC.
Proof of Su¢ ciency. There are three steps in the proof that the NIAS and NIAC conditions
are su¢ cient for (D; q) to have a rational inattention representation. The …rst step is to establish
that the NIAC conditions ensures that there is no global reassignment of the minimal attention
strategies observed in the data to decision problems ( ; A) 2 D that raises total gross surplus.
The second step is use this observation to de…ne a candidate cost function on attention strategies,
K:
! R [ 1. The key is to note that, as the solution to the classical allocation problem of
Koopmans and Beckmann [1957], this assignment is supported by “prices” set in expected utility
units. It is these prices that de…ne the proposed cost function. The …nal step is to apply the NIAS
represents a rational inattention representation of (D; q), where
conditions to show that K;
comprises minimal attention strategies.
Consider any prior 2
such that there exists two or more sets A such that ( ; A) 2 D.
Enumerate these decision problems as ( ; Aj ) for 1 j J and let D be the set of such decision
problems. De…ne the corresponding minimal attention strategies j for 1 j
J as revealed in
(
;A)
the corresponding data and let
[( ;A)2D
be the set of all such decision problems, with
a slight caveat to ensure that there are precisely as many strategies as there are decision problems.
If all minimal attention strategies are di¤erent, the set as just de…ned will have cardinality J. If
there is repetition, then retain the decision problem index with which they are associated so as
to make them distinct, and thereby to ensure that the resulting set
has precisely J elements.
j
Index elements
2
in order of the decision problem ( ; Aj ) with which they are associated.
We now allow for arbitrary matchings of attention strategies to decision problems. First, let bjl
denote the gross utility of decision problem j combined with minimal attention strategy l,
bjl = G( ; Aj ;
l
);
with B the corresponding matrix. De…ne M to be the set of all matching functions m : f1; ::; Jg !
33
f1; ::; Jg that are 1-1 and onto and identify the corresponding sum of gross payo¤s,
S(m) =
J
X
bjm(j) :
j=1
It is simple to see that the NIAC condition implies that the identify map mI (j) = j maximizes
the sum over all matching functions m 2 M. Suppose to the contrary that there exists some
alternative matching function that achieves a strictly higher sum, and denote this match m 2 M.
In this case construct a …rst sub-cycle as follows: start with the lowest index j1 such that m (j1 ) 6=
j1 . De…ne m (j1 ) = j2 and now …nd m(j2 ), noting by construction that m(j2 ) 6= j2 . Given that
the domain is …nite, this process will terminate after some K J steps with m (jK ) = j1 . If it is
the case that m (j) = j outside of the set [K
k=1 jk , then we know the increase in the value of the
sum is associated only with this cycle, hence,
K
X1
bjk jk <
K
X1
bjk jk+1 ;
j=1
k=1
directly in contradiction to NIAC. If this inequality does not hold strictly, then we know that there
0
0
exists some j 0 outside of the set [K
k=1 jk such that m (j ) 6= j . We can therefore iterate the process,
knowing that the above strict inequality must be true for at least one such cycle to explain the
strict increase in overall gross utility. Hence the identity map mI (j) = j indeed maximizes S(m)
amongst all matching functions m 2 M.
With this, we have established that the identity map solves an allocation problem of precisely
the form analyzed by Koopmans and Beckmann [1957]. They characterize precisely those matching
functions m : f1; ::Jg ! f1; ::Jg that maximize the sum of payo¤s de…ned by a square payo¤ matrix
such as B that identi…es the reward to matching objects of one set (decision problems in our case)
to a corresponding number of objects in a second set (minimal attention strategies in our case).
They show that the solution is the same as that of the linear program obtained by ignoring integer
constraints,
J
J
X
X
X
xjl = 1:
max
bjl xjl s.t.
xjl =
xjl 0
j=1
j;l
l=1
Standard duality theory implies that the optimal assignment mI (j) = j is associated with a system
of prices on minimal attention strategies such that the increment in net payo¤ from any move of
any decision problem is not more than the increment in the cost of the attention strategy.
De…ning these prices as Kj , their result implies that,
bjl
bjj = G( ; Aj ;
l
)
G( ; Aj ;
j
)
Kl
Kj ;
or,
G( ; Aj ;
j
)
Kj
G( ; Aj ;
l
)
Kl :
The result of Koopmans and Beckmann therefore implies existence of a function K :
!R
that decentralizes the problem from the viewpoint of the owner of the decision problems, seeking
to identify surplus maximizing attention strategies to match to their particular problems. Note
that if there are two distinct decision problems with the same revealed posterior, the result directly
implies that they must have the same cost, so that one can in fact ignore the reference to the
decision problem and retain only the posterior in the domain. To complete the de…nition of the
34
cost function, consider all 2 such that there exist a unique decision problem ( ; A) 2 D, and
( ;A) = 0 Now complete
set the cost of the corresponding minimal attention strategy to zero K
the function across all 2 such that there exist one or more ( ; A) 2 D by setting K( ; ) = 1
for 6= ( ;A) . Finally, for all 2 such that there is no ( ; A) 2 D, set K( ) = 0 all 2
and
K( ; ) = 1 for 2
=
.
Note that we have now completed construction of a qualifying cost function K :
! R[1
that satis…es K( ; ) = 1 for 2
=
and K( ; ) 2 R for some 2
. The entire construction
was aimed at ensuring that the observed attention strategy choices were always maximal, ( ;A) 2
^ (K; ; A) for all ( ; A) 2 D. It remains to prove that ( ;A) is consistent with q( ; A) for all
( ; A) 2 D. This requires us to show that, for all ( ; A) 2 D, the choice rule that associates with
each 2 ( ( ;A) ) the certainty of choosing the associated act f 2 F ( ; A) as observed in the
data is both optimal and matches the data. That it is optimal is the precise content of the NIAS
constraint,
M
M
X
X
f
f
f
g
rm ( ; A)Um
rm
( ; A)Um
;
m=1
m=1
for all g 2 A. That this choice rule and the corresponding minimal attention function match the
data holds by construction.
10.3
Theorem 2
Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention
representation with conditions K1 to K3 satis…ed.
Proof. The proof of necessity is immediate from theorem 1. The proof of su¢ ciency proceeds in
four steps, starting with a rational inattention representation K;
of (D; q) of the form produced
in theorem 1 based on satisfaction of the NIAS and NIAC conditions. A key feature of this function
is that, given 2 such that ( ; A) 2 D some A 2 F, the function K is real-valued only on the
minimal information strategies
f ( ;A) j ; A) 2 Dg associated with all corresponding decision
problems, otherwise being in…nite. The …rst step is the proof is to construct for each such 2 a
larger domain
on which K is real-valued to satisfy four properties: to include all minimal
attention strategies,
; to include the inattentive strategy, I 2
; to be closed under
mixtures so that ; 2
and 2 (0; 1) implies
(1
)
2
; and to be “closed under
garbling,” so that if 2
is su¢ cient for attention strategy 2
, then 2
. The second
step is, for each 2 , to de…ne a new function K that preserves the essential elements of K
while being real-valued on the larger domain
, and thereby to construct the full candidate
cost function K :
! R [ 1. The third step is to con…rm that K 2 K and that K satis…es
the required conditions K1 through K3. The …nal step is to con…rm that K;
forms a rational
inattention representation of (D; q).
Given 2 such that ( ; A) 2 D some A 2 F, we construct the domain
in two stages. First,
we de…ne all attention strategies for which some minimal attention strategy 2
is su¢ cient;
S
=f 2
j9 2
su¢ cient for g:
Note that this is a superset of
and that it contains I . The second step is to identify
as the
smallest mixture set containing S : this is itself a mixture set since the arbitrary intersection of
mixture sets is itself a mixture set.
By construction,
has three of the four desired properties: it is closed under mixing; it
contains
, and it contains the inattentive strategy. The only condition that needs to be checked
35
is that it retains the property of being closed under su¢ ciency:
2
su¢ cient for
2
=)
2
To establish this, it is useful …rst to establish certain properties of
S is closed under garbling:
2
S
su¢ cient for
2
=)
2
.
S
and of
. The …rst is that
S.
Intuitively, this is because the garbling of a garbling is a garbling. In technical terms, the product
of the corresponding garbling matrices is itself a garbling matrix. The second is that one can
explicitly express
as the set of all …nite mixtures of elements on S ,
8
9
J
<
=
X
j
J 1
j
=
=
jJ
2
N;
(
;
::
)
2
S
;
2
j
1
J
S; ;
:
j=1
where S J 1 is the unit simplex in RJ: To make this identi…cation, note that the set as de…ned on
the RHS certainly contains S and is a mixture set, hence is a superset of
. Note moreover that
all elements in the RHS set are necessarily contained in any mixture set containing S by a process
of iteration, making is also a subset of
, hence …nally one and the same set.
We now establish that if 2
is a garbling of some 2
, then indeed 2
. The …rst
step is to express 2
as an appropriate convex combination of elements of S as we now know
we can,
J
X
j
=
:
j
j=1
with all weights strictly positive, j > 0 all j. Lemma 2 below establishes that in this case there
exist garblings j of j 2 S such that,
=
J
X
j
j
;
j=1
establishing that indeed 2
since, with S closed under garbling, j 2 S and j a garbling
j
j
of
implies
2 S.
Given 2 such that ( ; A) 2 D some A 2 F, we de…ne the function K on
in three stages.
First we de…ne the function KS on the domain S by identifying for any 2 S the corresponding
set of minimal attention strategies 2
of which is a garbling, and assigning to it the lowest
such cost. Formally, given 2 S ,
KS ( )
min
f 2
j
K( ):
su¢ cient for g
Note that KS ( ) = K( ) all 2
. To see this, consider ( ; A); ( ; A0 ) 2 D with
(
;A)
su¢ cient for
. By the Blackwell property, expected utility is at least as high using (
(
;A)
using
for which it is su¢ cient,
G( ; A;
( ;A0 )
)
G( ; A;
36
( ;A)
):
( ;A0 )
;A0 )
as
At the same time, since K;
( ;A) 2 ^ (K; ; A), so that,
G( ; A;
is a rational attention representation of (D; q), we know that
( ;A)
)
K(
( ;A)
)
( ;A0 )
G( ; A;
)
K(
( ;A0 )
):
0
Together these imply that K( ( ;A) ) K( ( ;A ) ), which in turn implies that KS ( ) = K( ) all
2
.
Note that KS ( ) also satis…es weak monotonicity on S on this domain, since if we are given
; 2 S with su¢ cient for , then we know that any strategy 2
that is su¢ cient for is
also su¢ cient for , so that the minimum de…ning KS ( ) can be no lower than that de…ning KS ( ).
The second stage in the construction is extend the domain of the cost function from S to
.
As noted above, this set comprises all …nite mixtures of elements of S ,
8
9
J
<
=
X
j
J 1
j
=
=
jJ
2
N;
(
;
::
)
2
S
;
and
2
j
1
J
S; :
:
j=1
Given 2
, we take the set of all such mixtures that generate it and de…ne K ( ) to be the
corresponding in…mum,
K ( )=8
>
<
>
:
inf
X
J
J2N; 2S J
1 ;f j gJ
j=1 2
Sj
=
j
j
9
>
= j=1
>
;
j=1
J
X
j KS (
j
):
Note that this function is well de…ned since KS is bounded below by the cost of inattentive strategies
and the feasible set is non-empty by de…nition of
. We establish in Lemma 3 that the in…mum
is achieved. Hence, given
2
, there exists J 2 N; 2 S J 1 ; and elements j 2 S with
J
X
j such that,
=
j
j=1
K ( )=
J
X
j KS (
j
):
j=1
We show now that K satis…es K2, mixture feasibility. Consider distinct strategies 6= 2
.
;
;
;
We know by Lemma 3 that we can …nd J
2 N; corresponding probability weights
2S
J
J
X
X
j,
j , and such that,
and elements j ; j 2 S with =
=
j
j
j=1
K ( ) =
j=1
J
X
j KS (
j
j KS (
j
);
j=1
K ( ) =
J
X
):
j=1
Given
2 (0; 1), consider now the mixture strategy de…ned by taking each strategy j with
j with probability (1
probability
) j . By construction, this mixture
j and each strategy
37
strategy generates
K ( ) that,
K ( )
=[
J
X
+ (1
j KS (
j
)
)+
j=1
J
X
] 2
and hence we know by the in…mum feature of
(1
)
j
j KS (
) = K ( ) + (1
)K ( );
j=1
con…rming mixture feasibility.
We show also that K satis…es K3, weak monotonicity in information. Consider ; 2
with
su¢ cient for . We know by Lemma 3 that we can …nd J 2 N; 2 S J 1 ; and corresponding
J
X
J
j
j
j and such that,
elements
2 S with …xed range ( ) = ( ) such that =
j
j=1
j=1
K ( )=
J
X
j KS (
j
):
j=1
j J
j=1
We know also from Lemma 2 that we can construct
such that each
on its domain
j
is a garbling of the corresponding
S , we conclude that,
KS (
j
j.
2
S
such that
=
J
X
j
j
and
j=1
Given that KS satis…es weak monotonicity
KS ( j ):
)
By the in…mum feature of K ( ) we therefore know that,
K ( )
J
X
j KS (
j
J
X
)
j=1
j KS (
j
) = K ( );
j=1
con…rming weak monotonicity.
a rational inattention
We show now that we have retained the properties that made K;
representation of (D; q) for prior
2 . It is immediate that
and the choice function that
involves picking act f 2 F ( ; A) for sure in revealed posterior r( ;A) (f ) is consistent with the data,
since this was part of the initial de…nition. What needs to be con…rmed is only that the revealed
minimal attention strategies are optimal. Suppose to the contrary that there exists ( ; A) 2 D
such that,
G( ; A; ) K ( ) > G( ; A; ( ;A) ) K ( ( ;A) );
for some
2
. By Lemma 3 we can …nd J 2 N; a strictly positive vector
J
X
j J
j and such that,
2
,
such
that
=
corresponding elements
j
S
j=1
j=1
K ( )=
J
X
j=1
38
j KS (
j
):
2 SJ
1;
and
By the fact that
=
J
X
j
j
and by construction of the mixture strategy,
j=1
G( ; A; ) =
J
X
j G(
j
; A;
);
j=1
so that,
J
X
j
G( ; A;
j
)
j
KS (
) > G( ; A;
( ;A)
)
( ;A)
K (
):
j=1
We conclude that there exists j such that,
j
G( ; A;
)
KS (
j
) > G( ; A;
( ;A)
)
K (
( ;A)
):
Note that each j 2 S inherits its cost KS ( j ) from an element j 2
that is the lowest
cost minimal attention strategy according to K on set
that is su¢ cient for j ,
KS (
j
)=K (
j
);
where the last equality stems from the fact (established above) that KS ( ) = K ( ) on 2
.
Note by the Blackwell property that each strategy j 2
o¤ers at least as high gross value as the
strategy
for which it is su¢ cient, so that overall,
G( ; A;
j
)
K (
j
)
G( ; A;
j
)
KS (
j
) > G( ; A;
( ;A)
)
K (
( ;A)
):
To complete the proof it is su¢ cient to show that,
K ( ) = K ( );
on
2
: With this we derive the contradiction that,
G( ; A;
j
)
K (
j
) > G( ; A;
( ;A)
)
K (
( ;A)
);
in contradiction to the assumption that K;
formed a rational inattention representation of
(D; q).
To establish that K ( ) = K ( ) on 2
, note that we know already that KS ( ) = K ( )
on 2
. If this did not extend to K ( ), then we would be able to identify a mixture strategy
2
su¢ cient for ( ;A) with strictly lower expected costs, K ( ) < K ( ). This see that this is
not possible, note …rst from Lemma 1 that all strategies that are consistent with ( ; A) and q( ; A)
are su¢ cient for ( ;A) . Weak monotonicity of K on
then implies that the cost K ( ) of any
mixture strategy su¢ cient for ( ;A) satis…es K ( ) K ( ), as required.
The …nal and most trivial stage of the proof is to ensure that normalization holds. We …rst
normalize each function K ( ) for those 2 for which ( ; A) 2 D for some A 2 F. In such
cases we note that I 2 S , so that KS (I ) 2 R according to the rule immediately above. If we
renormalize this function by subtracting KS (I ) from the cost function for all attention strategies
associated with this prior then we impact no margin of choice and do not interfere with mixture
feasibility, weak monotonicity, or whether or not we have a rational inattention representation.
Hence we can avoid pointless complication by assuming that K (I ) = 0 from the outset so that
39
this normalization is vacuous. We have now fully-speci…ed a cost function K :
! R for all
2 such that ( ; A) 2 D some A 2 F.
All that remains to complete our de…nition of the candidate cost function K is to expand the
domain to include all inattentive strategies for the irrelevant priors 2 for which there is no
corresponding ( ; A) 2 D. We de…ne
= I in such cases and set and K (I ) = 0. Note that
this single element domain is trivially closed under garbling and under mixtures. With this, we
de…ne the candidate cost function K :
! R [ 1 by patching together the set of prior-based
cost functions as de…ned above,
K ( ) if 2
1 if 2
=
:
K( ; ) =
Note that weak monotonicity implies that the function is non-negative on its entire domain.
It is immediate that K 2 K, since K( ; ) = 1 for 2
=
and all domains
for 2 contain
the corresponding inattentive strategy I on which K( ; ) is real-valued. It is also immediate that
K satis…es K3, since K (I ) = 0 by construction. It also satis…es K1 and K2, and represents a
rational inattention representation, completing the proof.
Lemma 2 If
=
J
X
1
any garbling
j
j
2 SJ
with J 2 N;
of , there exist garblings
j
j
of
=
1
2
J
X
j
with
j
> 0 all j, and
j J
j=1
2
, then for
such that,
j
;
j=1
Proof. By assumption, there exists a j ( )j j ( )j matrix B with
for all k 2 ( ),
X
k
bik m ( i ):
m( ) =
i2
Since
=
J
X
j
j,
we know that (
j)
P
k
bik = 1 all i and such that,
( )
( ). Now de…ne compressed matrix B j as the unique
1
submatrix of B obtained by …rst deleting all rows corresponding to posteriors i 2 ( )n ( j ),
and then deleting all columns corresponding to posteriors k such that bik = 0 all i 2 ( )n ( j ).
De…ne j 2 to be the strategy that has as its support the set of all posteriors that are possible
given the garbling j of j ,
( j) = f
k
2 ( )jbik > 0 some
i
2 (
j
)g;
and in which state dependent probabilities of all posteriors are generated by the compressed matrix
Bj ,
X
j
k
(
)
=
bik jm ( i );
m
i2
(
j)
for all k 2 ( j ).
j
j 2 ,
Note by construction that each attention strategy
P ik is a garbling iof the corresponding
j
j
since each B is itself a garbing matrix for which k b = 1 for all 2 ( ). It remains only to
40
verify that
=
J
X
j
j.
This follows since,
j=1
m(
k
)=
X
i2
b
ik
m(
i
X
)=
i2
( )
Lemma 3 Given
2
b
ik
J
X
j
i
j m( )
=
j=1
( )
J
X
j
j=1
, there exists J 2 N;
2
SJ 1,
X
i2
(
bik jm ( i )
=
J
X
j
k
j m ( ):
j=1
j)
j
and elements
2
S
with
=
K ( )=
j KS (
j
J
X
j KS (
j)
j
j
j=1
such that,
J
X
J
X
):
j=1
Proof. By de…nition K ( ) is the in…mum of
over all lists
j J
j=1
j=1
=
J
X
j
j.
2
S
such that
We now consider a sequence of such lists, indicating the order in this sequence
j=1
j (n) J(n) ,
j=1
in parentheses,
J(n)
with
=
X
j (n)
j (n)
such that in all cases there are corresponding weights (n) 2 S J(n)
1
and that achieve a value that is heading in the limit to the in…mum,
j=1
J(n)
lim
n !1
X
j (n)KS (
j
(n)) = K ( ).
j=1
A …rst issue that we wish to avoid is limitless growth in the cardinality J(n). The …rst key
observation is that, by Charateodory’s theorem, we can reduce the number of strictly positive
J (n)
X
j (n) to have cardinality J (n)
weights in a convex combination =
M + 1. We
j (n)
j=1
J (n)
con…rm now that we can do this without raising the corresponding costs,
X
j (n)KS (
j (n)).
j=1
Suppose that there is some integer n such that the original set of attention strategies has strictly
higher cardinality J(n) > M + 1. Suppose further that the …rst selection of J 1 (n)
M +1
such posteriors for which there exists a strictly positive probability weights 1j (n) such that =
J 1 (n)
X
1
j (n)
j (n)
has higher such costs (note WLOG that we are treating these as the …rst J 1 (n)
j=1
attention strategies in the original list). It is convenient to de…ne
so that we can express this inequality in the simplest terms,
J(n)
X
1
j (n)
= 0 for J 1 (n)+1
J(n)
1
j
j (n)KS ( (n))
>
X
j=1
j=1
41
j (n)KS (
j
(n):
j
J(n)
1
This inequality sets up an iteration. We …rst take the smallest scalar
1 1
j (n)
=
j (n):
J 1 (n)
That such a scalar exists follows from the fact that
2 (0; 1) such that,
X
J(n)
1
j (n)
=
j=1
X
j (n)
= 1, with all components
j=1
in both sums strictly positive and with J(n) > J 1 (n). We now de…ne a second set of probability
weights 2j (n),
1 1 (n)
j (n)
j
2
:
j (n) =
1
1
J(n)
for 1
j
J(n). Note that these weights have the property that
=
X
2
j (n)
j (n)
and that,
j=1
J(n)
J(n)
X
j
2
j (n)KS ( (n))
=
X
j=1
j=1
"
1 1 (n)
j
1
j (n)
1
#
J(n)
KS (
j
(n)) <
X
j (n)KS (
j
(n):
j=1
By construction, note that we have reduced the number of strictly positive weights 2j (n) by at
least one to J(n) 1 or less. Iterating the process establishes that indeed there exists a set of no
more than M + 1 posteriors such that a mixture produces that …rst strategy and in which this
mixture has no higher weighted average costs than the original strategy. Given this, there is no
loss of generality in assuming that J(n) M + 1 in our original sequence.
With this bound on cardinality, we know that we can …nd a subsequence of attention strategies
j (n) which all have precisely the same cardinality J(n) = J
M + 1 all n. Going further, we
j (n) 1 . First, we can select
can impose properties on all of the J corresponding sequences
n=1
subsequences in which the ranges of all corresponding attention functions have the same cardinality
independent of n,
( j (n)) = K j
for 1
j
J. Note we can do this because, for all j and n, the number of posteriors in the
attention strategy j (n) is bounded above by the number of posteriors in the strategy , which is
…nite
k
K j and
With this, we can index the possible posteriors jk (n) 2 ( j (n)) in order, 1
then select further subsequences in which these posteriors themselves converge to limit posteriors,
jk
(L) = lim
n!1
jk
(n) 2 :
which is possible because the sequence of posteriors lives in a compact set, and so have a convergent
subsequence.
We ensure also that both the associated state dependent probabilities themselves and the weights
J(n)
X
j (n) converge,
=
j (n) in the expression
j (n)
j=1
lim
n!1
m
lim
jk
n!1
(n)
=
jk
m (L);
j (n)
=
j (L):
42
Again, this is possible because the state dependent probabilities and weights live in a compact
set.
The …nal selection of a subsequence ensures that, given 1
j
J, each j (n) 2 S has its
j
value de…ned by precisely the same minimal attention strategy
2 as the least expensive among
those that were su¢ cient for it and hence whose cost it was assigned in function KS . Technically,
for each 1 j J,
KS ( j (n)) = K ( j );
for 1 n 1: this is possible because the data set and hence the number of minimal attention
strategies is …nite.
We …rst use these limit properties to construct a list of limit attention strategies j (L) 2 S
J
X
j for 1
with =
j J. Strategy j (L) has range,
j
j=1
(
j
j
(L)) = [K
k=1
jk
(L);
with state dependent probabilities,
j
(L)
jk
Note that the construction ensures that
=
jk
m (L):
(L) =
m
J
X
j (L).
j (L)
To complete the proof we must
j=1
establish only that,
K ( )=
J
X
j
j (L)KS (
(L)):
j=1
We know from the construction that, for each n;
J
X
j (n)KS (
j
(n)) =
J
X
j (n)K
(
j
):
j=1
j=1
Hence the result is established provided only,
KS (
j
(L))
K (
j
);
which is true provided j being su¢ cient for all j (n) implies that j is su¢ cient for the corresponding limit vector j (L). That this is so follows by de…ning B j (L) = [bik (L)]j to be the limit of
any subsequence of the j ( j )j K j stochastic matrices B j (n) = [bik (n)]j which have the de…ning
property of su¢ ciency,
X
j
i
(n) m ( jk (n)) =
[bik (n)]j
m ( );
i2
(
j)
for all jk (n) 2 ( j (n)) and 1 m M . It is immediate that the equality holds up in the limit,
establishing that indeed j is su¢ cient for each corresponding limit vector j (L), con…rming …nally
that KS ( j (L)) K ( j ) and with it establishing the Lemma.
43