Rational Inattention and State Dependent Stochastic Choice"
Transcription
Rational Inattention and State Dependent Stochastic Choice"
Rational Inattention and State Dependent Stochastic Choice Andrew Caplinyand Mark Deanz June 2013 Abstract Economists are increasingly interested in how attention impacts behavior, and therefore in models of rational inattention. We characterize choice behaviors associated with a general model of rational inattention. Our model encompasses all those currently in the literature. The necessary and su¢ cient conditions on the data are simple and intuitive. We experimentally elicit “state dependent”stochastic choice data of the form required to perform the corresponding tests. Rational inattention theory matches these data better than do random utility models that ignore the link between incentives and attention. 1 Introduction Limits on attention impact choice. Shoppers may buy unnecessarily expensive products due to their failure to notice whether or not sales tax is included in stated prices (Chetty et al. [2009]). Buyers of second-hand cars focus their attention on the leftmost digit of the odometer (Lacetera et al. [2012]). Purchasers limit their attention to a relatively small number of websites when buying over the internet (Santos et al. [2012]). Given its e¤ect on choice, the forces that determine attentional e¤ort are being intensively studied. The rational inattention model (Sims [1998, 2003]) has been particularly in‡uential, capturing as it does the balancing act between the improved decision quality that attention produces and the entailed costs in terms of time and e¤ort. Rational inattention theory is being applied to capture a wide array of phenomena, from price stickiness to consumption dynamics to portfolio choice.1 Yet understanding the implications of rational inattention theory can be challenging. The information that a decision maker internalizes is not directly observable, making it di¢ cult to di¤erentiate between rationally inattentive choices and internally inconsistent mistakes. In this paper we develop a simple characterization of the stochastic choice behavior consistent with rational inattention and implement the corresponding tests in a laboratory experiment. We make no assumption on the nature of informational costs or informational constraints, and so We thank Roland Benabou, Federico Echenique, Andrew Ellis, Daniel Martin, Filip Matejka, Alisdair McKay, Stephen Morris, Pietro Ortoleva, Daphna Shohamy, Severine Toussaert, Isabel Trevino, Laura Veldkamp and Michael Woodford for their constructive contributions. We also thank Samuel Brown for his exceptional research assistance. y Center for Experimental Social Science and Department of Economics, New York University. Email: [email protected] z Department of Economics, Brown University. Email: [email protected] 1 See Sims [2006], Tutino [2008], Woodford [2008], van Nieuwerburgh and Veldkamp [2009], Mackowiak and Wiederholt [2009], Mackowiak and Wiederholt [2010], Matejka [2010] and Paciello and Wiederholt [2011] for applications of rational inattention. See Marschak [1971] for an earlier formulation of similar ideas. 1 our model represents a signi…cant generalization of those in the current literature.2 The key to our approach is the observation that a decision maker’s pattern of stochastic choice can be used to recover information about their attention strategy. The resulting necessary and su¢ cient conditions for rational inattention are simple and intuitive. A “no improving action switch”(NIAS) condition ensures that choices are optimal given what was learned about the state of the world (see Caplin and Martin [2011]). A “no improving attention cycles”(NIAC) condition ensures that total utility cannot be raised by reassigning attention strategies across decision problems. These conditions are robust. One can insist that costs associated with more informative attention strategies (in the sense of Blackwell) are no lower than those associated with less informative strategies. One can insist on the feasibility of mixed attention strategies. One can set inattention to be costless. Even with these additional restrictions, the NIAS and NIAC conditions fully characterize rationally inattentive behavior. As with many existing models of rational inattention, our model allows for stochastic choice. We consider a data set which consists of “state dependent” stochastic choice data in which choice probabilities depend on the underlying state of the world.34 We assume that, while perfectly observable to the researcher, it is potentially costly for a decision maker to learn the true state.5 For example, the econometrician may know whether or not sales taxes are included in stated prices even if consumers do not. As we demonstrate, state dependent data of this form is readily available in the experimental laboratory. Under appropriate conditions, it can also be extracted from …eld data.6 Current models of stochastic choice are predominantly based on the concept of random utility, in the manner of McFadden [1974] and Luce and Suppes [1965].7 In Random Utility Models (RUMs), stochastic choice derives from variance in the tastes of the decision maker. Unlike models of rational inattention, this underlying uncertainty does not respond to the incentives in a given decision problem. Matejka and McKay [2011] establish an important link between rational inattention and random utility models. When attention costs are proportionate to Shannon mutual information, demand functions in the rational inattention model are of a generalized logit form, and are therefore functionally similar to an important class of RUM.8 Yet there are also important di¤erences in implied behavior. On the one hand, random utility models can violate NIAS and NIAC. On the other, as pointed out by Matejka and McKay [2011],9 rational inattention can lead to violations of a monotonicity condition central to random utility models: when a newly available act incentivizes additional attention, the resulting knowledge may make it optimal to choose acts that were previously unchosen. 2 Followings Sims [2003], much of the literature assumes that costs are based on the Shannon mutual information between prior and posterior information states. Woodford [2012] suggests an alternative information cost function, based on the concept of a Shannon capacity, which is consistent with certain psychological experiments. Gul et al. [2012] and Ellis [2012] take a di¤erent approach, assuming limits on the partitional structures of information available to the decision maker. 3 While little studied in economics, state dependent stochastic choice data has featured in psychometric experiments on perception dating back at least to Weber (see Murray [1993]). 4 Other authors have used di¤erent data sets to capture the implications of rational inattention. Ellis [2012] uses state dependent deterministic choice, while Mihm and Ozbek [2012] use choice over menus. 5 This inverts the interpretation that stochasticity in choice arises from the opposite asymmetry: factors that are unobserved to the econometrician yet observed by the decision maker (Manski [1977]). 6 For example, under a representative agent assumption, the distribution of consumption patterns under di¤erent tax regimes can be interpreted as state dependent stochastic choice. 7 See also Falmagne [1978], McFadden and Richter [1990], Clark [1996], McFadden [2005] and Gul and Pesendorfer [2006]. 8 Caplin and Dean [2013] characterize stochastic choice behavior for a broad class of entropy-based cost functions. 9 Ergin [2003] presents a similar example. 2 Our ability to distinguish between randomness in choice induced by imperfect attention and that induced by stochastic utility relates to a long-standing challenge in the literature. Block and Marschak [1960] stressed both the importance and the di¢ culty of separating out “perceptibility” and “desirability” based explanations for randomness in choice. Our results imply that state dependent stochastic choice data is su¢ ciently rich to provide just such a separation. As an application of the NIAS and NIAC conditions, we experimentally elicit state dependent stochastic choice data in simple decision problems. We present subjects with a screen containing only red and blue dots, with the proportion of red dots determining the state of the world. They then choose from a set of available acts, the payo¤s to which depend on the state. We repeatedly present subjects with the same decision problem to estimate their probability of choosing each act in each state. Unlike most psychology experiments (for example Navalpakkam and Perona [2009]), our experiment has no time limit, so that subjects can in principle perfectly determine the state of the world. The fact that they choose not to results precisely from their internal trade o¤ between additional input of cognitive e¤ort and time on the one hand and monetary reward on the other. We use the state dependent stochastic choice data from our experiment to recover subjects’ revealed posterior beliefs and attention strategies. We use these in determining how well the NIAS and NIAC conditions …t the data. We …nd that aggregate data aligns well with both conditions, and furthermore that losses due to violations are small at the individual level. We also use our experiment to discriminate between rational inattention theory and random utility theory. We …nd behavior that is inconsistent with the latter model as it exhibits monotonicity violations of the type suggested by Matejka and McKay [2011]. Overall, we conclude that rational inattention theory is a reasonable starting point for modeling stochasticity in simple choices when attentional e¤ort is chosen. To our knowledge, ours is the …rst paper to experimentally collect state dependent stochastic choice data to test models of rational inattention. In theoretical terms, two papers close in spirit to ours are Matejka and McKay [2011] and Ellis [2012]. The former analyzes the implications of rational inattention for state dependent stochastic choice data with costs based on Shannon mutual information. The latter provides an axiomatic characterization of a model of rational inattention in which a decision maker is equipped with a set of possible information partitions, and must choose the most appropriate for a given decision problem. We generalize the results of both papers since we put no a priori restrictions on the type of costs or information constraints that the decision maker faces. Section 2 introduces our formulation of the rational inattention model with an unrestricted cost function. Section 3 derives the testable implications of this general model for state dependent choice data. Section 4 shows that this characterization is unchanged by addition of natural restrictions on the cost function. Section 5 contrasts rational inattention theory with random utility theory. Section 6 details our experimental design, while section 7 presents experimental results. Section 8 relates our work to the broader literature. Section 9 concludes. 2 2.1 A General Model of Rational Inattention Model We consider a decision making environment comprising a set of possible states of the world and a set of acts, the payo¤s of which are state dependent. A given decision problem consists of a prior distribution over states of the world and the subset of acts from which the decision maker (DM) 3 must choose.10 The key assumption underlying models of rational inattention is that the true state of the world is knowable in practice, but obtaining (or processing) information is costly. Thus, prior to choosing which act to take, the DM chooses an attention strategy. In line with the rational inattention literature, an attention strategy is described in an abstract manner - as a stochastic mapping from states of the world to subjective signals, the probabilities of which depend on the state of the world. Subject only to Bayes’ rule, the DM is free to choose any attention strategy they wish, but will face a utility cost of doing so. Having selected an attention strategy, the DM can condition choice of act only on the subjective signal they receive.11 Their payo¤ is then given by the expected utility of the chosen act less the costs of attention. Figure 1 illustrates an attention strategy of a …rm that is choosing which of two prices to charge, high or low. Pro…ts depend on the price chosen and the underlying state of the economy, which can be good, medium or bad (G, M and B) with probabilities 61 , 12 and 13 respectively. Figure 1: An Example of an Attentional Strategy In this example, the …rm has chosen an attention strategy that produces two subjective signals, R and S. If demand is good they receive signal R for sure, while if demand is bad they receive signal S for sure. If demand is medium, they receive each signal with probability 12 . Upon receiving signal R, the …rm sets prices high. Conditional on receiving this signal the probability of state G is 52 and of state M is 35 . The expected utility of action H is therefore 52 times the utility that H gives in state G and 35 times the utility that H gives in M . The …rm sets price L upon receipt of signal S, where the conditional probabilities are 74 M and 37 L. The net bene…t to the …rm from this attention strategy is this expected “gross bene…t” of the acts chosen net of attention costs. Formally, a decision making environment is identi…ed by a …nite set of states of the world, = fm 2 Nj1 m M g and a …nite set of acts F , with F = 2F =; comprising all non-empty f subsets. We take as given a state dependent utility function U : F ! R and let Um denote the utility of act f in state m. De…ne = ( ) as the set of probability distributions over states. Given 2 , m denotes the probability of state m. A decision problem consists of a prior belief 2 over states of the world and a non-empty set of acts A 2 F from which the DM must choose. For any prior , de…ne = fm 2 j m > 0g as the associated support. 10 11 Both of which are assumed known to the DM. And possibly an independent randomizing device, by which we allow for mixed strategies. 4 An attention strategy maps each state of the world to a probability distribution over subjective signals. Since we will be characterizing expected utility maximizers, we identify a subjective signal with its associated posteriors beliefs 2 , which is equivalent to the subjective information state the DM …nds them in following the receipt of that signal. For a particular prior, the set of feasible attention strategies is the set of stochastic mappings from objective information states to subjective signals that satisfy Bayes’law. De…nition 1 Given 2 ; the set of feasible attention strategies comprises all mappings : ! ( ) that have …nite support ( ) and that satisfy Bayes’ law, so that for all m 2 and 2 ( ), m m( ) , m = M X j j( ) j=1 where m( ) (m)(f g). De…ne =[ 2 . Note that m ( ) can be interpreted as the probability of signal conditional on state of the world m.12 Rational inattention theory models a DM who chooses attention strategies to maximize gross payo¤s net of information costs. The gross payo¤ associated with attention strategy 2 in decision problem ( ; A) is calculated assuming that acts are chosen optimally at each posterior state. Let G : F ! R denote the gross payo¤ of using a particular attention strategy in a particular decision problem: # " M X X G( ; A; ) = m m ( ) g( ; A) 2 ( ) where g( ; A) = max f 2A m=1 M X f m Um : m=1 An attention cost function maps priors and attention strategies to the corresponding level of disutility. We allow costs to be in…nite for some strategies for two reasons. As a technical convenience, it enables us to ignore the Bayesian constraint by setting the cost of strategies inconsistent with the speci…ed prior to be in…nitely high.13 More importantly, it means that our model nests those that rule out certain types of information acquisition - for example by putting a hard limit on the mutual information between priors and posteriors (Sims [2003]), by allowing only partitional information structures (Ellis [2012]), or by allowing only normal signals (Mondria [2010]). These models can be accommodated in our framework by setting to in…nity the cost of infeasible strategies. To avoid triviality we assume that feasible attention strategies exist for all priors. De…nition 2 An attention cost function is a mapping K : ! R with K( ; ) = 1 for 2 = and K( ; ) 2 R for some 2 . We let K denote the class of such functions. We make the standard assumption that attention costs are additively separable from the prizebased utility derived from the acts taken. We let ^ : K F ! map cost functions and 12 Formally, attention strategies are equivalent to compound lotteries or temporal lotteries as analyzed in models that capture preferences over the timing of the resolution of uncertainty (Kreps and Porteus [1978]). 13 As in Mihm and Ozbek [2012]. 5 decision problems into rationality inattentive strategies. These are the strategies that maximize gross payo¤s minus attention costs, ^ (K; ; A) = arg sup fG( ; A; ) K( ; )g : 2 2.2 State Dependent Stochastic Choice Data The idealized data set that we use to test the model of rational inattention is state dependent stochastic choice data. For a given decision problem ( ; A) we observe the probability of choosing each act f 2 A in each state of the world m 2 . It is the fact that we observe choice probabilities in each state that di¤erentiates this from standard stochastic choice data. Formally, given 2 , let Q be the set of mappings from to probabilities over F , with f Q = [ 2 Q . Given q 2 Q , we let qm denote the probability of the DM choosing act f in state m and denote as F (q) F the set of acts chosen with non-zero probability in some state of the world under stochastic choice function q, o n f : F (q) = f 2 F jqm > 0 for some m 2 Data of this general form is standard in psychometric research and can be readily gathered in the laboratory, as we demonstrate in section 6. We are interested in when the conditions under which we can …nd an attention cost function and resulting optimal attention strategy that would generate the pattern of stochastic choice that we in fact observe. De…nition 3 Given decision problem ( ; A) 2 F, an attention strategy with q 2 Q if there exists a choice strategy C : ( ) ! (A), such that: 2 is consistent 1. Final choices are optimal: f 2 supp (C( )) =) M X f m Um m=1 M X g m Um m=1 all g 2 A. 2. The attention and choice functions match the data, X f f qm = m ( )C ( ): 2 ( ) Note that we do not restrict the DM to pure strategies at any posterior. However, any act chosen with strictly positive probability must be utility maximizing given beliefs. We assume that data has been gathered on some …nite set D of decision problems. The main theoretical aim of the paper is to characterize the conditions under which all such state dependent stochastic choice data is consistent with rational inattention. De…nition 4 A state dependent stochastic choice data set (D; q) comprises a …nite set of decision problems D F and a function q : D ! Q, with q( ; A) 2 Q and F ( ; A) F [q( ; A)] A. ~ ~ ) if there exists K ~ 2 K and Such a data set has a rational inattention representation (K; ( ;A) ~ :D! such that, for all ( ; A) 2 D, ~ ( ; A) ~ is consistent with q( ; A) and satis…es ( ;A) ^ ~ ~ 2 (K; ; A). 6 The existence of such a representation implies that the cost function is well-behaved, in the sense that an optimal attention strategy exists. Throughout this paper, we assume that the DM’s expected utility function and prior beliefs over states of the world are both known to the researcher - only attention strategies and costs are not directly observable. This assumption is in line with the focus of the paper, but is not central to our approach. By enriching the data set, we could recover beliefs and preferences from the choice data of the DM, and use these as a starting point for our representation. In order to recover utility, we could replace the “Savage style” acts we use in this paper (which map deterministically from states of the world to prizes) with “Anscombe-Aumann”acts that map states of the world to probability distributions over the prize space. Assuming the DM does maximize expected utility, U could then be recovered by observing choices over degenerate acts (i.e. acts whose payo¤ are state independent).14 If we further add to our data set the choices of the DM over acts before the state of the world is determined (or at least in a situation in which they cannot exert any e¤ort to determine that state) then we can also recover the DM’s prior over objective states (again assuming expected utility maximization). 3 The Characterization The main theorem of this section establishes two conditions as necessary and su¢ cient for (D; q) to have a rational inattention representation. We use simple examples to illustrate the role of these conditions before stating the main theorem, the proof of which is in the appendix. The …rst condition - which establishes optimality of …nal choice given an attention strategy - applies to each decision problem separately. The second - which establishes optimality of the attention strategy applies to all decision problems that share a speci…c prior. 3.1 Minimal Attention Strategies The key to our approach is the observation that, if a DM is rationally inattentive, then one can learn much about their attention strategy from state dependent stochastic choice data. To begin with, one can identify the average posterior beliefs that a DM must have had when choosing each act. De…nition 5 Given 2 and q 2 Q , de…ne the revealed posteriors r( ; q) : F (q) ! [r( ; q)(f )]m f rm ( ; q) = f m qm M X by, : f j qj j=1 f The revealed posterior rm ( ; q) is just the probability of state of the world m conditional on act f being chosen given prior belief and state dependent stochastic choice data q. Note that this is well de…ned whatever decision making procedure the DM is using. If the DM chooses each act in at most one subjective information state then the revealed posteriors will in fact be the posteriors that de…ne their attention strategy.15 If they choose the same act in more than one subjective state 14 This is the approach taken by Ellis [2012]. Caplin and Martin [2011] present an alternative approach that allows for an unknown utility function. 15 As they would do if more informative information structures are more costly. 7 then the revealed posterior will be equivalent to the weighted average belief across all posteriors at which that act is chosen. We can use the revealed posteriors to construct a “revealed”attention strategy for each decision problem. We do so by assuming that any act is chosen in at most one subjective state. Under this assumption we can identify the resulting attention strategy directly from the state dependent stochastic choice data. To do so, we …rst associate a subjective signal with each revealed posterior. The probability of receiving a given signal from an objective state m is calculated by …rst identifying all acts which have the revealed posterior associated with that signal, then by calculating the probability of choosing any such act in objective state m. f De…nition 6 Given 2 , q 2 Q, m 2 , and = rm ( ; q) for some f 2 F (q), de…ne the ( ;q) minimal attention strategy 2 to satisfy, X f ( ;q) qm : m ( )= ff 2F (q)jr( ;q)= g The following examples illustrate the concept of revealed posteriors and the minimal attention strategy. Example 1 Consider the following state dependent stochastic choice data generated by a decision problem in which there are two states with 1 = 0:3, and two chosen acts ff; gg q1f q1g = 0:6 , 0:4 q2f q2g = 0:4 0:6 The revealed posterior of state 1 given the choice of f is given by r1f ( ; q) = f 1 q1 f 1 q1 f 2 q2 + = 0:18 9 = 0:46 23 In like manner, we can identify the revealed posteriors associated with each act, r1f ( ; q) r2f ( ; q) = 9 23 14 23 , r1g ( ; q) r2g ( ; q) = 2 9 7 9 The resulting minimal attention strategy would therefore have two posterior states, 2 = 2 9 7 9 1 = 9 23 14 23 and . The conditional signal probabilities for each posterior state are given by the associated choice probabilities. For example, ( ;q) 2 ( ) 1 = X fk2F (q)jr( ;q)= q1k = q1g = 0:4: 2g Now consider a di¤ erent set of choice data for the same two state problem, in which three acts are chosen, 0 f 1 0 1 0 f 1 0 1 0:6 0:4 q1 q2 @ q g A = @ 0:3 A , @ q g A = @ 0:45 A 1 2 0:1 0:15 q1h q2h In this case, the revealed posteriors for acts g and h are identical and equal to 2 . The minimal attention strategy therefore again has two posterior states ( 1 and 2 ). The probability of receiving 8 the signal associated with state 2 from any state of the world is equal to the probability of choosing g or h, as both have 2 as their revealed posterior X ( ;q) 2 q1k = q1g + q1h = 0:4 ( ) = 1 fk2F (q)jr( ;q)= 2g If a rationally inattentive DM chose each act in at most one subjective state, and did not use mixed strategies at any posterior, then the minimal attention strategy would be their true strategy. More generally, any attention strategy consistent with the data must be weakly more informative than the minimal attention strategy, in the sense of statistical su¢ ciency. Intuitively, this means that the minimal attention strategy can be obtained by “adding noise” to the true attention strategy. De…nition 7 Attentional strategy 2 is su¢ cient for attention strategy 2 P(equivalently ij = 1 is a garbling of ) if there exists a j ( )j j ( )j stochastic matrix B 0 with j2 ( ) b all i and such that, for all j 2 ( ) and m 2 , X j bij m ( i ): m( ) = i2 ( ) Lemma 1 establishes that any consistent attention strategy must be su¢ cient for the minimal attention strategy. Lemma 1 If 2 is consistent with ( ; A) 2 F and q 2 Q , then it is su¢ cient for ( ;q) : In particular, this means that if the DM is indeed rationally inattentive, then their ‘true’ attention strategy must be more informative than their revealed attention strategy. Blackwell’s theorem establishes the equivalence of the statistical notion of “more informative than” (su¢ ciency) and the economic notion “more valuable than”. If attention strategy is 16 su¢ cient for strategy , then it yields (weakly) higher gross payo¤s in any decision problem. This result plays a signi…cant role in our characterization. Remark 1 Given decision problem ( ; A) 2 F and ; 2 G( ; A; ) 3.2 with su¢ cient for , G( ; A; ): No Improving Actions Switches Our …rst condition ensures that the choice of acts at any given posterior is optimal. Consider the trivial case in which only one decision problem ( ; A) is observed. Suppose that there are two states with 1 = 2 = 0:5, two available acts, A = ff; gg, and state dependent payo¤s (U1f ; U2f ) = (0; 10) and (U1g ; U2g ) = (20; 0). Finally suppose that the behavioral data set speci…es: 1 q1g = q1f = ; 2 2 g 1 f q2 = ; q2 = : 3 3 16 See Cremer [1982] for a statement and simple proof of Blackwell’s theorem. 9 One can readily con…rm that there is no attention strategy consistent with this behavioral data. This is because, at the revealed posterior when f is chosen, act g would have given higher utility. The posterior probability that the true state is 1 when f is chosen is f 1 q1 f 1 q1 + = f 2 q2 3 7 Given these beliefs, the payo¤ of taking act g is 73 :20 + 47 :0 = 60 7 , while the payo¤ of act f is 3 4 40 :0 + :10 = . 7 7 7 If the minimal attention strategy was in fact the one that the DM used, the revealed posterior associated with each act would be the only posterior at which this act was chosen. In this case existence of such an improving switch would be a direct violation of the rationally inattentive model. With a more general attention strategy, rational inattention implies that f must be weakly preferred to g at each state in which f is chosen. This means that f must also be weakly preferred to g at the weighted average of these posteriors which forms the revealed posterior.17 The NIAS condition rules out cases in which there are improving switches of this form. It speci…es that, when one identi…es in the data the revealed posterior associated with any chosen act, this act must be optimal at that posterior. Condition D1 (No Improving Action Switches) Data set (D; q) satis…es NIAS if, for every ( ; A) 2 D and f 2 F ( ; A), M X f f rm ( ; q) Um m=1 M X f g rm ( ; q) Um ; m=1 all g 2 A. The NIAS condition was introduced by Caplin and Martin [2011]. A similar condition is introduced in a game theoretic setting by Bergemann and Morris [2011]. 3.3 No Improving Attention Cycles Our second condition restricts choice of attention strategy across decision problems that share the same prior. Essentially, it cannot be the case that the total gross utility can be increased by reassigning attention strategies across decision problems.18 The following example illustrates a violation of this condition. Consider again the decision problem above with two equiprobable states and two available acts, A = ff; gg, and with the state dependent payo¤s, (U1f ; U2f ) = (10; 0); (U1g ; U2g ) = (0; 20): Suppose now that the observed choice behavior is as follows (using the choice set A as an argument), q1A (f ) = q2A (f ) = 2 =1 3 1 =1 3 17 q1A (g); q2A (g): See the proof of theorem 1 for details of this argument. Because we allow the attention cost function to vary arbitrarily with the prior, we need check only that there are no such reassignments amongst decision problems with the same prior. 18 10 Now consider a second decision problem di¤ering only in that the act set is B = ff; hg, with (U1h ; U2h ) = (0; 10), with the corresponding state dependent data set, 3 =1 4 1 =1 4 q1B (f ) = q2B (f ) = q1B (g); q2B (g): The speci…ed data looks problematic with respect to rational inattention. Act set A provides greater reward for discriminating between states, yet the DM is more discerning under act set B. To crystallize the resulting problem, note that, for behavior to be consistent with rational inattention for some cost function K it must be the case that, G( ; A; A G( ; B; B ) ) K( ; A K( ; B ) ) G( ; A; B G( ; B; A ) ) K( ; B ); K( ; A ): While we do not observe attention strategies directly, it is immediate that G( ; i; i ) = G( ; i; i ) for i 2 fA; Bg. Furthermore, as i is su¢ cient for i , Blackwell’s theorem tells us that G( ; i; j ) G( ; i; j ) for i:j 2 fA; Bg (see Remark 1): Thus we can insert the minimal attention strategies in the calculation of gross bene…ts and the above inequalities must still hold. Substituting in for minimal attention strategies and rearranging implies G( ; A; A ) G( ; A; B ) K( ; A ) K( ; B ) G( ; B; A ) G( ; B; B ) Clearly, for any such cost function to exist, it must be the case that the …rst term is greater than the last term, which in turn implies that G( ; A; A ) + G( ; B; B ) G( ; A; B ) + G( ; B; A ); (1) indicating that total gross bene…t must be maximized by the assignment of minimal attention strategies to the speci…c decision problem with which they are associated in the data. Plugging in the minimal attention strategies from our example data, we …nd that G( ; A; A ) + G( ; B; B ) = 17 12 , while G( ; A; B ) + G( ; B; A ) = 17 11 12 . Thus, there is no cost function that can be used to rationalize this data. The following general assumption ensures that there are no such cycles of gross utility improving strategy switches. Condition D2 (No Improving Attention Cycles) Data set (D; q) satis…es NIAC if, for any 2 and any set of decision problems ( ; A1 ); ( ; A2 ); :::::( ; AK ) 2 D with AK = A1 , K X1 G( ; Ak ; k ) k=1 where k = K X1 G( ; Ak ; k+1 ); k=1 [ ;q( ;Ak )] . The NIAC condition has certain analogies with acyclicality conditions that are common in revealed preference theory (such as the Strict Axiom of Revealed Preference). Consider the two decision problem case described in equation 1. A further rearrangement of this condition implies 11 that G( ; A; A ) G( ; A; B ) + G( ; B; B ) G( ; B; A ) 0 We can interpret G( ; A; A ) G( ; A; B ) < 0 as B being “revealed more costly” than A : would have given higher gross value in decision problem ( ; A) than would A , so the fact that it was not chosen must mean that it is more costly. Thus, this condition can be interpreted as saying that if B has been revealed more costly than A , we cannot also have that A is revealed more costly that B . In fact, the condition says something more than that because, unlike in the standard revealed preference exercise, we have information about how much more costly B is than A in terms of expected utility. Therefore if we have data revealing that B is more costly than A by at least some amount x, we cannot also have information that implies that this di¤erence must be less that x. B 3.4 Characterization Our main result is that NIAC and NIAS together are necessary and su¢ cient for a data set to have a rational inattention representation. Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es NIAS and NIAC. The key step in the proof that the NIAS and NIAC conditions are su¢ cient for (D; q) to have a rational inattention representation is to connect this model with the linear allocation problem analyzed by Koopmans and Beckmann [1957]. The cost function that we introduce is based on the shadow prices that decentralize the optimal allocation in that model. Our result is also strongly related to rationalizability conditions for quasi-linear preferences in the mechanism design literature (see Rochet [1987]). 4 Monotonicity, Mixtures and Normalization Theorem 1 states only that, if NIAS and NIAC hold, we can …nd some attention cost function that rationalizes the data. No restrictions are placed on the form of the function. In this section we consider three natural restrictions on attention cost functions: weak monotonicity with respect to su¢ ciency; feasibility of mixed strategies; and costless inattention. In principle these restrictions might tighten requirements for rationalizability of stochastic choice data, since they constrain costs of unchosen strategies. Theorem 2 establishes that this is not the case: if state dependent stochastic choice is rationalizable, then it is rationalizable by a cost function that satis…es these three conditions. 4.1 Weak Monotonicity A partial ranking of the informativeness of attention strategies is provided by the notion of statistical su¢ ciency (see de…nition 7). A natural condition for an attention cost function is that more information is (weakly) more costly. Condition K1 K 2 K satis…es weak monotonicity in information if, for any ; 2 with su¢ cient for , K( ; ) 12 K( ; ): 2 and Free disposal of information would imply this property, as would a ranking based on Shannon mutual information (see also Mihm and Ozbek [2012] and Yang [2011]).19 4.2 Mixture Feasibility In addition to using pure attention strategies, it may be feasible for the DM to mix these strategies using some randomizing device. De…nition 8 Given + (1 ) 2 , attention strategies 2 is de…ned by, m( all 1 m M and )= m( ; 2 2 [0; 1], the mixture strategy , and ) + (1 ) m( ); 2 ( ) [ ( ). The de…nition implies that the mixing is not of the posteriors themselves, but of the odds of the given posteriors. To illustrate, consider again a case with two equiprobable states. Let attention strategy be equally likely to produce posteriors (:3; :7) and (:7; :3), with equally likely to produce posteriors (:1; :9) and (:9; :1). Then the mixture strategy 0:5 + 0:5 is equally likely to produce all four posteriors. A natural assumption is that DMs can choose to mix attention strategies and pay the corresponding expected costs. They could ‡ip a coin and choose strategy if the coin comes down heads and strategy if it comes down tails. In expectation the cost of this strategy would be half that of and half that of . Allowing such mixtures puts an upper bound on the cost of the strategy 0:5 + 0:5 . However, it does not pin down the cost precisely, since there may be a more e¢ cient way of constructing the mixed attention strategy. Condition K2 Mixture Feasibility: for all 2 (0; 1), the cost of the mixture strategy K( ; ) 4.3 2 = and for any two strategies ; 2 + (1 ) 2 satis…es, K( ; ) + (1 and )K( ; ): Normalization It is typical in the applied literature to allow inattention at no cost, and otherwise to have costs be non-negative. Given weak monotonicity, non-negativity of the entire function follows immediately if one ensures that inattention is costless. Condition K3 Given 2 , de…ne I 2 as the strategy in which m ( ) = 1 for 1 m M . Attentional cost function K 2 K satis…es normalization if it is non-negative where realvalued, with K(I ) = 0 all 2 . 4.4 Theorem 2 Theorem 2 states that, whenever a rational inattention representation exists, one also exists in which the cost function satis…es conditions K1 through K3. Whatever one thinks of the above 19 While in many ways intuitively attractive, this assumption may not be universally valid. In a world with discrete signals it may be very costly or even impossible to generate continous changes in information. Moreover the DM may be restricted to some …xed set of signals in which case less informative structures are essentially disallowed. It may not be possible to automatically and freely dispose of information once learned. 13 assumptions on intuitive grounds, even if any one or all of them are in fact false, any data set that can be rationalized can equally be rationalized by a function that satis…es them all. Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention representation with conditions K1 to K3 satis…ed. This result has the ‡avor of the Afriat characterization of rationality of choice from budget sets (Afriat [1967]), which states that choices can be rationalized by some utility function if and only if they can be rationalized by a non-satiated, continuous, monotone, and concave utility function. Note that the representation need not be unique. Conditions for recoverability are left for future research. 4.5 No Strong Blackwell Not all restrictions on the form of the cost function can be so readily absorbed as K1 through K3. For example, there are data sets satisfying NIAS and NIAC yet for which there exists no cost function that produces a rational inattention representation with a cost function that is strictly monotonic with the informativeness of the information structure (i.e. if is su¢ cient for 0 but 0 is not su¢ cient for , then K( ; ) > K( ; 0 )). A simple example with data on one decision problem with two equally likely states illustrates that one cannot further strengthen the result in this dimension. Suppose that there are three available acts A = ff; g; hg with corresponding utilities, (U1f ; U2f ) = (10; 0) ; (U1g ; U2g ) = (0; 10) ; (U1h ; U2h ) = (7:5; 7:5) : Consider the following state dependent stochastic choice data in which the only two chosen acts are f and g, 3 q1f = q2g = = 1 q1g = 1 q2f : 4 Note that this data satis…es NIAS; given posterior beliefs when f is chosen, f is superior to g and indi¤erent to h, and when g is chosen it is superior to f and indi¤erent to h. It trivially satis…es NIAC since there is only one decision problem observed. We know from theorem 2 that is has a rational inattention representation with the cost of the minimal attention strategy K ( ; ) 0 and that of the inattentive strategy being zero, K( ; I ) = 0. Note that is su¢ cient for I but not vice versa, hence any strictly monotone cost function would have to satisfy K ( ; ) > 0. In fact it is not possible to …nd a representation with this property. To see this, note that both strategies have the same gross utility, G( ; A; ) = 1 2 3 4 10 + 1 2 3 4 10 = 1 7:5 = G( ; A; I ); where we use the fact that the inattentive strategy involves picking act h for sure. In order to rationalize selection of the inattentive strategy, it must therefore be that is no more expensive than I , contradicting strict monotonicity. 5 Rational Inattention and Random Utility In this section we compare the rational inattention model with alternative models in which stochasticity in choice stems from randomness in the utility function (e.g. McFadden [1974], Loomes and 14 Sugden [1995]).20 Such random utility models (RUMs) take as given a probability measure over some family of utility functions. Prior to making a choice, one utility function gets drawn from this set according to the speci…ed measure. The DM then chooses to maximize this utility function.21 There is a …rst order di¤erence between standard RUMs and the rational inattention model. While the latter produces choice behavior that di¤ers across states of the world, the former typically conditions out all observable states, as a result giving rise to state independent choice (e.g. Falmagne [1978], McFadden and Richter [1991], Clark [1996], McFadden [2005], and Gul and Pesendorfer [2006]). In order to compare our general model of rational inattention to RUMs one must therefore take a stand on what the DM knows about the true state of the world when they make their choice. One extreme assumption is that the DM receives no signal about the objective state of the world, and so makes choices based only on their prior beliefs, a model we refer to as the “Uninformed RUM”. Formally, we de…ne the class of utility functions U to be all functions : F ! R, with 22 ( ; f ) the utility assigned by 2 U to act f 2 A at prior 2 . Letting denote a probability measure over U, a data set (D; q) generated by an uninformed RUM is de…ned by, f qm ( ; A) = f 2 Uj ( ; f ) > ( ; g) 8 g 2 Ag ; 23 for every ( ; A) 2 D and m 2 . Such a model is easily di¤erentiated from the rational inattention model, as the probability of choosing an act is invariant with the true state of the world: generally this is not the case in the rational inattention model. On the other hand, as we demonstrate in the supplementary materiel, the uninformed RUM allows for violations of NIAS. This is obvious from the fact that such a DM has a single posterior belief, yet chooses many acts with distinct utilities according to any non-trivial utility function. A second extreme assumption is that the DM is fully informed about the state of the world and chooses in each state according to a RUM. Letting m 2 be the degenerate probability distribution specifying state m for sure, a perfectly informed RUM generates data of the following form: f qm ( ; A) = f 2 Uj ( m ; f ) > ( m ; g) 8 g 2 Ag : Again, the perfectly informed RUM can be easily di¤erentiated from the rational inattention model.24 It implies that the probability of choosing an act in a particular state depends only on its payo¤ in that state, and is independent of the payo¤s of any act in other states of the world and of prior beliefs. This is generally not true of the rational inattention model. On the other hand, in the 20 Random choice models have been used in psychology since the work of Thurstone [1927] and Zermelo [1929]. In the case of choice over lotteries, the family of utility functions can be over the lotteries themselves or, following Gul and Pesendorfer [2006], over the underlying prize space, with the utility of a lottery equal to its expectation according to the selected utility function. 22 Note that random expected utility in the manner of Gul and Pesendorfer [2006] is a special case in which the utility can be computed by weighting some underlying utilities on a prize space according to the prior probabilities. 23 In the case of utility functions in which the utility maximizing elements of A are not unique, a tie breaking rule is needed. See Gul and Pesendorfer [2006]. 24 Matejka and McKay [2011] identify an intriguing connection between the perfectly informed RUM and rational inattention. One heavily used variant of the RUM is the logit model, which assumes that is constructed using a …xed “base” function 2 U which is then perturbed by an error term distributed according to an extreme value type-I distribution. Applying this assumption to the perfectly informed RUM leads to a state dependent stochastic choice function of the form, 21 f qm ( ; A) = P e ( m ;f ) g2A e ( m ;g) : MM show that one can provide an analogous characterization of the state dependent stochastic choice associated 15 supplemental materiel we show how perfectly informed RUMs can violate both NIAS and NIAC. In the former case this is because the DM can still choose inferior options with positive probability under the RUM. In the latter, it is because the perfectly informed RUM can require the DM to acquire a large amount of information even when that information is not instrumental for choice. An intermediate version of the RUM is one in which the DM receives an exogenous and imperfect signal about the state of the world before a utility function is randomly drawn. In formal terms, starting with prior 2 A, the DM receives a …xed informative signal : ! ( ) that determines the information at the point that the random utility function is drawn. In this case the generated data set for any ( ; A) 2 D will be of the form, X f qm ( ; A) = m ( ) f 2 Ujv( ; f ) > ( ; g) 8 g 2 Ag ; 2 ( ) where ( ) is the set of possible posteriors associated with signal . The partially informed RUM does not imply either of the obvious restrictions on the data that we have so far discussed: choices can vary with the underlying state, and the payo¤s of an act in one state can a¤ect behavior in another. This makes di¤erentiating between the partially informed RUM and rational inattention more subtle. We pursue two approaches in the experiment that follows, as now detailed. Our …rst approach involves testing a simple monotonicity axiom that is an essentially universal feature of all RUMs, including those in which signals are received prior to making a decision. This axiom states that the addition of a new act to the set of available choices cannot increase the probability that one of the pre-existing options will be chosen (Gul and Pesendorfer [2006] and Luce and Suppes [1965]). Monotonicity Axiom Given ( ; A) 2 F, h 2 F nA and m 2 , f qm ( ; A) f qm ( ; A [ h) : As Matejka and McKay [2011] show, the rational inattention model can lead to robust violations of monotonicity. Their canonical example involves two states of the world, with prior 1 on state 1. There are three acts that may be available, with the following payo¤s in expected utility units, U1f ; U2f = (0; 1) ; (U1g ; U2g ) = 1 1 ; 2 2 ; and U1h ; U2h = (Y; Y ) ; where Y > 0. Assuming attention costs based on Shannon mutual information, Matejka and McKay [2011] show that it is optimal to pay no attention and choose the safe act g when only f and g are available, provided 1 is high enough. It is simply too expensive to bother trying to overturn the prior. However, with h available also, it becomes more important to learn the true state - increasingly so the higher is Y: The rationally inattentive agent may therefore select a more informative attention strategy. If this learning suggests to the DM that state 2 is more likely, then with the rational inattention model when attention costs are linear in Shannon mutual information, f q~m ( ; A) = P Pfe g2A f Um k P ge g Um k ; where P f is the unconditional probability of choosing act f and k > 0 scales the disutility of attention relative to prizes. Thus, rational inattention with Shannon costs looks in certain respects like a state dependent logistic RUM. 16 it is optimal to choose act f , producing a violation. In section 7.4 we report on an experiment designed to capture the intuition of this example. The second method for distinguishing the partially informed RUM from the rational inattention model involves identifying cases in which all utility functions have the same ranking over acts. Such cases allow us to identify the signal underlying the partially informed RUM, which by assumption should not vary with incentives. To make this precise, note that in many re…nements of the general RUM (such as the random expected utility model of Gul and Pesendorfer [2006]), all possible utility functions have the property that choice between stochastically ranked alternatives is deterministic. Moreover, many simple decision problems generate posteriors in which acts are typically so ranked. Consider simple cases with two states and two acts ff; gg with variation in the incentive to learn: (U1f ; U2f ) = (0; Y ) and (U1g ; U2g ) = (Y; 0) for Y > 0. For any posterior belief 1 > 0:5, act f stochastically dominates g, while for 1 < 0:5, act g dominates f , regardless of the reward value, Y: Thus, if all utility functions obey stochastic dominance, any randomness in choice must be due to the signal which, by assumption, is invariant to Y . Thus data generated by a partially informed RUM should not respond, for example, to doubling the value of Y . General a rationally inattentive agent will respond to such changes. 6 Experimental Design 6.1 Overview We introduce an experimental design that produces state dependent stochastic choice data at the subject level. Subjects are shown a screen on which there are displayed 100 balls, some of which are red and some of which are blue. The state of the world is determined by the number of red balls on the screen. Prior to seeing the screen, subjects are informed of the probability distribution over states. Having seen the screen they choose from a number of di¤erent acts whose payo¤s are state dependent. There is no external limit (such as a time constraint) on a subject’s ability to collect information about the state of the world. At the same time, there is no extrinsic cost to the subject of gathering information. Therefore the extent to which subjects fail to discern the true state of the world is due to their unwillingness to trade cognitive e¤ort for monetary reward. A decision problem is de…ned by the prior information and the set of available acts, as it is in section 2.1. A subject faces each decision problem 50 times, allowing us to approximate their state dependent stochastic choice function.25 In any given experiment, the subject faces four di¤erent problems. All occurrences of the same problem are grouped, but the order of the problems is block-randomized. In estimating the state dependent stochastic choice function we treat the 50 times that a subject faces the same decision making environment as 50 independent repetitions of the same event. 6.2 The Experiments Our study consists of four di¤erent experiments each run on between 23 and 44 subjects recruited from the New York University student population.26 Each subject answered 200 questions as well as 1 practice question. At the end of the session, one question was selected at random for payment, 25 To prevent subjects from learning to recognize patterns, we randomize the position of the balls. The implicit assumption is that the perceptual cost of determining the state is the same for each possible con…guration of balls. 26 29 subjects took part in experiment 1, 33 in experiment 2, 24 in experiment 3 and 44 in experiment 4. Each subject took part in only one experiment. 17 which reward was added to the show up fee of $10. Subjects took on average about 45 minutes to complete a session. Instructions are included in the supplemental materiel. The …rst three experiments were designed to study NIAC and NIAS in di¤erent types of decision problem. Experiment 1 comprises four decision problems with two equally likely states: in state 1 there are 48 red balls and in state 2 there are 52. In each decision problem there are two acts available ff i ; g i g with i indexing the decision problem. In each case, f is superior in state 1 while g is superior in state 2. Across decision problems the reward for making the correct choice in state 1 (U1f U1g ) varies, while the reward for making the correct choice in state 2 (U2g U2f ) remains …xed. This experiment therefore studies NIAS and NIAC in the context of asymmetric changes to the value of attention. Table 1 describes the available acts in the four decision problems in this experiment (payo¤s are in US$). Table Decision Problem 1 2 3 4 1: Experiment 1 Payo¤s U1f U2f U1g 10 0 0 20 0 0 10 0 5 30 0 0 Table Decision Problem 5 6 7 8 U2g 10 10 10 10 2: Experiment 2 Payo¤s U1f U2f U1g 10 0 0 2 0 0 20 0 0 30 0 0 U2g 10 2 20 30 Experiment 2 also has four decision problems, each with two equally likely states (48 and 52 red balls) and two available acts ff i ; g i g, whose payo¤s vary between decision problems. The key distinction with experiment 1 is that the reward to correctly identifying the state changes symmetrically across decision problems, as described in table 2. Experiment 3 studies whether subjects can focus their attention where it is most valuable. All decision problems in this experiment involve four equally likely states comprising two identi…able groupings. States 1 and 2 are perceptually hard to distinguish from one another, being de…ned respectively by 29 and 31 red balls. States 3 and 4 are also hard to distinguish from another, being de…ned respectively by 69 and 71 red balls. There are four decision problems with two possible acts, still labelled f and g. The decision problems di¤er according to whether it is important to di¤erentiate between states 1 and 2 (problem 10), states 3 and 4 (problem 9), neither (problem 11), or both (problem 12). Table 3: Experiment 3 Payo¤s f f f Decision Problem U1 U2 U3 U4f U1g 9 1 0 10 0 0 10 10 0 1 0 0 11 1 0 1 0 0 12 10 0 10 0 0 U2g 1 10 1 10 U3g 0 0 0 0 U4g 10 1 1 10 Experiment 4 sets up the possible counter-example to monotonicity suggested by Matejka and McKay [2011]. As in experiments 1 and 2 there are two equiprobable states, in this case with 49 red balls in state 1 and 51 in state 2.27 However there are now three acts, which we label fa; b; cg. In decision problem 13 the subject must choose between safe act a which pays out 23 in each state and act b which pays out 25 in state 2, but only 20 in state 1. Decision problem 14 introduces 27 We used 49 and 51 red balls to increase the cost of attentiveness in decision problem 13, making the increase in attentiveness from adding act c more marked. 18 a third alternative c which pays out 30 in state 1 and 10 in state 2. Decision problems 15 and 16 increase the payo¤ of act c in state 1 but decrease it in state 2, further increasing the value of attention. Table 4: Experiment 4 Payo¤s a a Decision Problem U1 U2 U1b U2b U1c U2c 13 23 23 20 25 n/a n/a 14 23 23 20 25 30 10 15 23 23 20 25 35 5 16 23 23 20 25 40 0 6.3 Preliminary Data Analysis We use the simple two act, two state structure of experiments 1 and 2 to establish some key properties of our data. First, our subjects produce choice data that is both stochastic and state dependent. In aggregate subjects made mistakes (choosing the wrong act on 35% of all trials), yet also make signi…cantly di¤erent choices in the two di¤erent states (act f was chosen 68% of the time in state 1 and 38% of the time in state 2 - the hypothesis that choice behavior is the same in both states can be rejected at the 0.001% level).28 These results suggest that our subjects are absorbing some information about the state of the world, but are not fully informed when they make their choice. Second, our data does not exhibit strong order e¤ects that might be associated with learning or fatigue. Averaging over decision problems, the percentage of correct responses in blocks 1-4 was 68%, 65%, 66%, and 65% respectively.29 Hence we ignore order e¤ects in the remaining analysis. 7 Experimental Results Overall, the results are supportive of NIAC and NIAS. Both condition are extremely close to holding in the aggregate data. At the individual level, monetary losses due to violations of both conditions are small. In contrast, we …nd that the monotonicity condition central to the RUM model is violated in the manner predicted by rational inattention. 7.1 Experiment 1 In the two state/two act set up of experiment 1, NIAS implies existence of a cuto¤ posterior probability of state 1 that determines the optimal act. For higher posteriors, act f is optimal, while for lower posteriors, act g is optimal. Assuming risk neutrality, this cuto¤ is 12 in decision problem (DP) 1, 13 in DP 2; 23 in DP 3; and 14 DP 4. These are shown in …gure 2 together with the revealed posteriors associated with the choice of act f and act g at the aggregate level (treating all data as if it was generated by a single subject). The data satisfy risk neutral NIAS in all but one 28 Estimated using a linear probability model with individual-level …xed e¤ects and standard errors clustered at the individual level. All statistical tests reported use this method. These patterns hold true at the individual level. Of the 62 subjects that took part in experiments 1 and 2, only 6% made mistakes in less than 10% of questions, while 84% had choice behavior that was signi…cantly di¤erent between the two states at the 10% level. 29 Regressing a binary variable indicating whether the correct choice was made on decision problem dummies and a dummy indicating the order in which the treatment was seen by the subject suggests that these di¤erences are not signi…cant: A test of the linear restriction that the block dummies are simultaneously equal to zero fails to reject the null hypothesis at the 17% level. 19 case (the posterior of state 1 is 1% too high when g is chosen in DP 2).30 Figure 2 also demonstrates that subjects’behavior varies signi…cantly with the decision problem. Figure 2 Figure 3 At the individual level, while only 42% of subjects satisfy NIAS for every decision, the violations that we observe lead to small losses. Figure 4 shows the distribution of costs of NIAS failures in experiment 1. For each subject it measures the actual expected value of their choices minus the expected value of the optimal choices given their revealed posteriors across all four decision problems. As a benchmark, these losses are compared to those that would have been observed from a population of subjects choosing at random.31 The observed distribution is signi…cantly di¤erent from the simulated distribution at the 0.01% level (Kolmogorov-Smirnov test) (equivalent …gures 30 In order to calculate posterior beliefs we combine the conditional probabilities of choosing each act from each state estimated from the data with prior beliefs about the likelhood of each state, rather than the empirical likelihood of each state. 31 The use of random benchmarks has been discussed by, for example, Beatty and Crawford [2011]. The precise procedure used to construct the random behavior is as follows: for each decision problem and for each state, a random number is drawn for each available act. The probability of choosing each act from that state is then calculated as the value of the random number associated with that act divided by the sum of all random numbers. 20 for experiments 2-4 are shown in the supplemental materiel.). Figure 4 Figure 5 The binary version of the NIAC condition (i.e. considering cycles consisting of only two decision problems) in this experiment relates the change in incentives between decision problems to the change in i the percentage of time that the correct decision is taken in state i = 1; 2 (act f in state 1, act g in state 2), 1 (U1f U1g ) + 2 (U2g U2f ) 0, (2) where (x) indicates the change in x between the two decision problems. This expression has a natural interpretation. The …rst term is the change in the probability of choosing the correct act in state 1 multiplied by the change in the bene…t of choosing the right act - i.e. the di¤erence between the payo¤ of act f and g in that state. The second term is the change in the probability of choosing the right act in state 2 multiplied by the bene…t of so doing. Experiment 1 is designed so that (U2g U2f ) = 0 for every pair of decision problems. Thus this condition implies a ranking of j1 across decision problems 1 j 4, 4 1 2 1 1 1 3 1: Figure 3 shows that this ranking holds true in the aggregate. The di¤erence between decision problems is signi…cant at the 1% level.32 At the individual level, 48% of subjects satisfy NIAC, and again the losses due to NIAC violations are small. Figure 5 plots the distribution of actual surplus minus the maximal surplus possible by reassigning attention strategies to decision problems for each individual. The NIAC condition demands this number to be zero. As a comparator, we show the distribution obtained from random choice. Again, the observed distribution is signi…cantly di¤erent from the simulated distribution at the 0.01% level. 32 This test considers only bilateral comparisons of attention strategies. The NIAC condition requires more than this: it must be impossible to increase total surplus through any reassignment of attention strategies between problems. This condition holds on the aggregate data conditional on actual choice at each posterior, but not given optimal choice at each posterior, as NIAS is violated in DP 2. 21 7.2 Experiment 2 Experiment 2 explores subjects’responses to symmetric changes in the value of making the correct decision. Symmetry implies that, for each decision problem, the threshold posterior probability of state 1 above which it is optimal to choose act f is 0.5. Table s1 in the supplemental materiel shows that these conditions hold in aggregate. At the individual level, 67% of subjects obey NIAS, the losses of those that do not are again relatively small (see …gure s1) and signi…cantly di¤erent from those of the random benchmark (signi…cant at the 0.01% level). With regard to NIAC, experiment 2 has the symmetry property whereby for every pair of acts in each decision problem, (U1f U1g ) = (U2g U2f ): Applying this observation to equation 2 implies a ranking for choosing the right action across both states, 8 1 + 8 2 7 1 + 7 2 5 1 + 5 2 6 1 1 + + 2, or the total probability of 6 2: Figure 6 shows that these implications broadly hold in the aggregate data. The total proportion of correct responses in DP 8 is higher than in DP 7 which is in turn higher than in DP 5. The total proportion of errors in DP 6 is slightly higher than that in DP 5 (by 0.5%) but the di¤erence is not statistically signi…cant.33 While the ordering of decision problems according to the proportion of correct choices is in line with the theory, the elasticity in the response of information gathering to monetary incentives is quite low. The probability of making a correct choice rises from 65% in the $2 treatment to 70% in the $30 treatment (signi…cant at the 10% level). We discuss the low degree of responsiveness further in Caplin and Dean [2013]. Figure 6 At the individual level, the distribution of losses due to NIAC violations in the experimental data is shown in …gure s2 in the supplemental materiel. Again it is signi…cantly di¤erent from that of the random benchmark at the 0.01% level. 33 Again errors bar that are shown take into account clustering at the subject level. 22 7.3 Experiment 3 The NIAS conditions take a somewhat di¤erent form in experiment 3. Act f is always superior in states 1 and 3 and act g is superior in states 2 and 4. For it to be optimal to choose option f , it must be the case that, U1f 1 + U3f 3 U2g 2 + U4g 4 : Given that U1f = U2g and U3f = U4g in experiment 3, provided of f requires, U1f 1 2 : U3f 4 3 1 > 2 when f is chosen, optimality Table s2 in the supplemental materiel shows that this condition holds at the aggregate level. At the individual level, 58% of subjects obey NIAS and, as shown in …gure s1, the losses amongst those that do not are again small and signi…cantly di¤erent from the random benchmark at the 0.01% level. With regard to the NIAC condition, the equivalent of condition 2 for experiment 3 is, 1 (U1f U1g ) + (U3f 3 U3g ) + 2 (U2g U2f ) + (U4g 4 U4f ) 0: This equation generates a set of six inequalities based on binary comparisons of the 4 decision problems. For example, comparing decision problem 9 and 10 we have, 10 1 + 10 2 + 9 3 + 9 4 10 3 + 10 4 + 9 1 + 9 2 ; meaning that the increase in the proportion of correct choices in states 1 and 2 must be bigger than the increase in proportion of correct choices in state 3 and 4 when shifting from decision problem 9 to 10. This makes sense, as relative to problem 9, problem 10 has higher rewards for correct decisions in the former two states than in the latter two states, meaning that this is where subjects should focus their attention. In the aggregate data, the average proportion of correct responses on the left hand side of the equation is 72.8% compared to 64.9% on the right hand side, signi…cantly di¤erent at the 1% level. Table s3 in the supplemental materiel shows that the other 5 conditions also hold. Moreover losses from NIAC violations are again signi…cantly below those associated with the random benchmark at the 0.01% level (see …gure s2). 7.4 Experiment 4 The primary purpose of experiment 4 was to test for violations of monotonicity in the manner described by Matejka and McKay [2011]: we would expect that the additional incentive to discover the true state generated by the introduction of act c would increase the likelihood of choosing act b in state 2: Table 7 shows that this is fact the case. The introduction of act c increases the probability of choosing act b in state 2 from 23% to an average of 35% across decision problems 14-16. While small, this increase is signi…cant at the 5% level. Table 7 Decision problem 13 14 15 16 23 q1 (b) 21% 16% 20% 17% q2 (b) 23% 30% 33% 42% The supplemental materiel describes the NIAC and NIAS tests for experiment 4. At the aggregate level, NIAC holds, while NIAS holds apart from for the choice of b in which case the revealed posterior puts somewhat too high a weight on state 1. Further evidence against RUMs comes from our other experiments. Uninformed RUMs imply that choice probabilities should be independent of the state, which is patently false in all cases. Perfectly informed RUMs imply that choice probabilities in each state should be independent of payo¤s in other states. Hence in experiment 1 choice probabilities in state 2 should be identical across decision problems, when in fact they range from 15% to 74% (see …gure 3). Further evidence against the partially informed RUM derives from tests of stochastic dominance. We tested 24 subject in experiment 1 for violations of stochastic dominance. We asked 10 question in which they were to choose between act f which paid $10 only in state 1 and g in which they were paid $10 only in state 2. The prior probability of state 1 varied from 10% to 90%. However, for these questions, subjects did not get to see the screen with the balls, and had to make their decisions based purely on the prior. Thus, subjects obeyed stochastic dominance if and only if they chose act f when the prior on state 1 was greater than or equal to 50%. We found that indeed 83% of subjects obeyed stochastic dominance, suggesting that for the majority of subjects all randomness we observe would have to be due to the exogenous signal. This cannot be squared with the increase in attentiveness as rewards increase in experiment 2: by assumption, the signal in the partially informed RUM is exogenous. 8 Existing Literature Much of the work on rational inattention in economics can be traced back to Sims [1998, 2003] which characterize the behavioral impact of constraints on information processing in linear quadratic control problems.34 The rate of information ‡ow is measured by Shannon mutual information. Sims [2003] shows that such a constraint generates behavior similar to that generated by an agent who observes the state of the world only noisily. However, the type of noise is determined endogenously, based on the incentives in the environment. Following this paper, mutual information constraints have been incorporated in an increasing number of economic settings, including consumptionsavings problems (Sims [2006], Tutino [2008], and Mackowiak and Wiederholt [2010]), pricing problems (Mackowiak and Wiederholt [2009], Matejka [2010], Martin [2013]), monetary policy (Paciello and Wiederholt [2011]), and portfolio choice (Mondria [2010]). In part, the focus on mutual information to measure the cost of information is justi…ed by its central position in the information theory literature. The Shannon mutual information of two random variables is related to the expected length in bits of the optimally encoded signal needed to generate one from the other. It also has an axiomatic characterization which shows that information costs must be of this form if they are to obey certain intuitive properties (see for example Csiszár [2008]). Shannon mutual information costs also have interesting properties from an economic standpoint. As discussed in the text, Matejka and McKay [2011] demonstrate a strong relationship between mutual information based rational inattention and logit-style random choice. Cabrales et al. [2013] demonstrate a further interesting link between mutual information and economic behavior. They consider a “ruin averse”investor facing a class of no-arbitrage investment problems and show that the ranking of information structures based on willingness to pay is equivalent to that provided by mutual information. While much of the rational inattention literature has focussed on mutual information costs, 34 Although the study of costly information aquisition in economics goes back much further - for example Stigler [1961], Marschak [1971] and Milgrom [1981]. 24 a variety of other cost functions and constraints have been studied. Woodford [2012] points out that mutual information does not imply that less attention will be paid to rare events (as such attention is cheap in expectation), in violation of experimental results due to Shaw and Shaw [1977]. He therefore proposes an alternative measure in which the cost of an information structure is evaluated according to the related concept of the Shannon capacity. Gul et al. [2012] consider the behavior of households who are restricted to having “crude”consumption plans i.e. plans that are restricted to having at most n realizations. van Nieuwerburgh and Veldkamp [2009] consider a more general information cost function, based on the distance between prior and posterior variance. Saint-Paul [2011] considers the case in which DMs face Shannon cost, but can only choose discrete policy functions (essentially combining the approaches of Sims [2003] and Gul et al. [2012]). Reis [2006] considers the case of a binary information choice: in any given period either attention can be paid, and the state is fully revealed, or not, in which case no information is gathered. Even many of the articles that use mutual information costs e¤ectively restrict the decision maker to choose Gaussian signals, implying additional constraints (see Sims [2006] for a discussion). A key strength of our approach is that our model nests all of the above costs functions. The costs of allowable attention strategies can be captured by K, while the cost of inadmissible strategies can be set to in…nity. The NIAS and NIAC conditions therefore provide a test of the entire class of rational inattention models currently in use.35 A recent wave of decision theoretic literature shares the goal of capturing the observable implications of inattention, both rational and otherwise. Closest in spirit to our work is Ellis [2012], who works with a data set similar to ours - state dependent choice functions. Ellis [2012] asks under what conditions can such data be rationalized by a model in which a DM has a set of available partitions on the state space, and selects the best partition for the decision problem at hand. The characterization is based on the identi…cation of cells in the partition with all objective states in which the same choice is made - an approach similar to that taken in our paper. This allows for the identi…cation of the preferences revealed by choices. There are three key di¤erences between the theoretical section of our paper and Ellis [2012]. On the one hand, he places weaker requirements on the data: unlike our approach, the DM’s utility function and prior beliefs are derived from behavior rather than directly observable. On the other hand he considers a more restrictive class of information restrictions: DMs e¤ectively face a cost function which is zero for allowable partitions and in…nity for all other information structures. This restriction rules out any stochasticity in choice, as well as many commonly used information cost functions (such as those based on Shannon mutual information). Moreover, the conditions of Ellis [2012] require the data to be observed from essentially all decision problems to be both su¢ cient and necessary. In contrast, our tests work on data collected from any arbitrary collection of decision problems. A second decision theoretic approach to identifying rational attention is to examine choice over menus. Ergin and Sarver [2010] consider a model in which a DM makes choices within choice sets by optimally selecting a partition on (subjective and unobservable) states of the world, then choosing the best action conditional on that partition.36 They characterize the implications of such a model for choices between choice sets. Costly contemplation is characterized by an aversion to contingent planning: an agent would prefer to …nd out which set they are choosing from and then choose from that set, rather that have to make contingent plans. Mihm and Ozbek [2012] extend this approach to the case in which there are observable states of the world, resulting in a representation similar to that considered in this paper. 35 Note that we consider only the instrumental value of information, not any intrinsic value that information might have as in Grant et al. [1998]. 36 Ortoleva [2013] provides an alternative model of ‘thinking aversion’. 25 Our work forms part of an ongoing project aimed at characterizing choice behavior when the internal information state of the agent is not directly observable. Block and Marschak [1960] early on stressed the di¢ culty in separating out theories of stochastic choice. van Zandt [1996] provided an explicit negative result in this regard, showing that any choice behavior is rationalizable in a model that allows for unobserved costly information acquisition if the state of the world is not observable. Caplin and Dean [2011] and Caplin et al. [2011] consider the case of sequential information search, using an extended data set to derive behavioral restrictions of search of this kind as well as of satis…cing behavior. Gabaix et al. [2006] use Mouselab data to test a near-optimal model of sequential information search. Caplin and Martin [2011] introduce the NIAS condition to characterize subjective rationality in a single decision problem. Masatlioglu et al. [2012] characterize “revealed attention”, using the identifying assumption that removing an unattended item from the choice set does not a¤ect attention. Dillenberger et al. [2012] consider a dynamic problem in which the DM receives information in each period which is externally unobservable, characterizing the resulting preference over menus. Fudenberg and Strzalecki [2012] also consider a dynamic problem, characterizing dynamic stochastic choice rules that are consistent with rational inattention and Shannon mutual information costs. In the psychology literature, theories to which we are close in spirit are signal detection theory (Green and Swets [1966]) and categorization theory. A common feature is that the DM receives a signal and must choose the optimal action at each resulting posterior. These theories are connected to enormous experimental literatures in psychology that capture state dependent stochastic choice data.37 The chief distinction is that, unlike the rational inattention model, signal detection theory generally …xes the attention strategy independent of the incentive to learn implied by the decision making environment. These models are therefore a special case of our general formulation. Despite the powerful psychological precedents, there is little experimental work on state dependent stochastic choice data within economics. There is so far as we know no work in either …eld that tests NIAC and NIAS directly. One related paper is Cheremukhin et al. [2011], which uses a formulation similar to Matejka and McKay [2011] to estimate a rationally inattentive model on subject’s choice over lotteries. However they do not analyze the state dependence in the resulting stochastic choice data. 9 Conclusions As economists increasingly focus on attentional constraints, so the importance of rational inattention theory has grown. We characterize a general model of rational inattention which encompasses all models currently in the literature. The necessary and su¢ cient conditions are simple and readily testable. We …nd the model to do a qualitatively good job of explaining subject behavior in a simple experimental implementation. In contrast, traditional random utility models fail to capture important data features. In addition to further investigating the comparison with random utility models, we are currently exploring the behavioral content of more structured models of attention costs, in particular the Shannon model.38 We are also exploring the implications of the nature of attentional costs for economic applications. 37 38 We do not attempt to summarize the literature here - see Verghese [2003] for a review. See Caplin and Dean [2013]. 26 References S. N. Afriat. The Construction of Utility Functions from Expenditure Data. International Economic Review, 8(1), 1967. Timothy K. M. Beatty and Ian A. Crawford. How demanding is the revealed preference approach to demand? American Economic Review, 101(6):2782–95, October 2011. Dirk Bergemann and Stephen Morris. Correlated equilibrium in games with incomplete information. Cowles Foundation Discussion Papers 1822, Cowles Foundation for Research in Economics, Yale University, October 2011. H. D. Block and J. Marschak. Contributions to Probability and Statistics, chapter Random Orderings and Stochastic Theories of Response. Stanford University Press, Stanford, 1960. Antonio Cabrales, Olivier Gossner, and Roberto Serrano. Entropy and the value of information for investors. American Economic Review, 103(1):360–77, February 2013. Andrew Caplin and Mark Dean. Search, choice, and revealed preference. Theoretical Economics, 6(1), January 2011. Andrew Caplin and Mark Dean. Rational inattention, entropy, and choice: The posterior-based approach. Mimeo, Center for Experimental Social Science, New York University, 2013. Andrew Caplin and Daniel Martin. A testable theory of imperfect perception. NBER Working Papers 17163, National Bureau of Economic Research, Inc, June 2011. Andrew Caplin, Mark Dean, and Daniel Martin. Search and satis…cing. American Economic Review, 101(7):2899–2922, December 2011. Anton Cheremukhin, Anna Popova, and Antonella Tutino. Experimental evidence on rational inattention. Technical report, 2011. Raj Chetty, Adam Looney, and Kory Kroft. Salience and taxation: Theory and evidence. American Economic Review, 99(4):1145–77, September 2009. Stephen A. Clark. The random utility model with an in…nite choice space. Economic Theory, 7:179–189, 1996. Jacques Cremer. A simple proof of blackwell’s comparison of experiments theorem. Journal of Economic Theory, 27(2):439–443, August 1982. Imre Csiszár. Axiomatic Characterizations of Information Measures. Entropy, 10:261–273, 2008. David Dillenberger, Juan Sebastian Lleras, Philipp Sadowski, and Norio Takeoka. A theory of subjective learning. Technical report, 2012. Andrew Ellis. Foundations for optimal attention. Mimeo, Boston University, 2012. Haluk Ergin and Todd Sarver. A Unique Costly Contemplation Representation. Econometrica, 78(4):1285–1339, 2010. Haluk Ergin. Costly Contemplation. 2003. 27 J C Falmagne. A Representation Theorem for Random Finite Scale Systems. Journal of Mathematical Psychology, 18:52–72, 1978. Drew Fudenberg and Tomasz Strzalecki. Recursive stochastic choice. Mimeo, Harvard University, 2012. Xavier Gabaix, David Laibson, Guillermo Moloche, and Stephen Weinberg. Costly information acquisition: Experimental analysis of a boundedly rational model. American Economic Review, 96(4):1043–1068, September 2006. Simon Grant, Atsushi Kajii, and Ben Polak. Intrinsic preference for information. Journal of Economic Theory, 83(2):233–259, December 1998. D. M. Green and J. A. Swets. Signal detection theory and psychophysics. Wiley, New York, 1966. Faruk Gul and Wolfgang Pesendorfer. Random Expected Utility. Econometrica, 74(1):121–146, 2006. Faruk Gul, Wolfgang Pesendorfer, and Tomasz Strzalecki. Behavioral competitive equilibrium and extreme prices. Mimeo, Princeton University, 2012. Tjalling C. Koopmans and Martin Beckmann. Assignment problems and the location of economic activities. Econometrica, 25(1):pp. 53–76, 1957. D.M. Kreps and E.L. Porteus. Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46(1):185–200, 1978. Nicola Lacetera, Devin G. Pope, and Justin R. Sydnor. Heuristic thinking and limited attention in the car market. American Economic Review, 102(5):2206–36, September 2012. Graham Loomes and Robert Sugden. Incorporating a stochastic element into decision theories. European Economic Review, 39(3-4):641–648, April 1995. R. Luce and P. Suppes. Handbook of Mathematical Psychology Vol. III, chapter Preference, utility, and subjective probability. Wiley, New York, 1965. Bartosz Mackowiak and Mirko Wiederholt. Optimal sticky prices under rational inattention. American Economic Review, 99(3):769–803, June 2009. Bartosz Adam Mackowiak and Mirko Wiederholt. Business cycle dynamics under rational inattention. CEPR Discussion Papers 7691, C.E.P.R. Discussion Papers, February 2010. Charles F. Manski. The structure of random utility models. Theory and Decision, 8:229–254, 1977. Jacob Marschak. Economics of information systems. Journal of the American Statistical Association, 66(333):192–219, March 1971. Daniel Martin. Strategic pricing and rational inattention to quality. Mimeo, New York University, 2013. Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y. Ozbay. Revealed attention. American Economic Review, 102(5):2183–2205, August 2012. 28 Filip Matejka and Alisdair McKay. Rational inattention to discrete choices: A new foundation for the multinomial logit model. CERGE-EI Working Papers wp442, The Center for Economic Research and Graduate Education - Economic Institute, Prague, June 2011. Filip Matejka. Rationally inattentive seller: Sales and discrete pricing. CERGE-EI Working Papers wp408, The Center for Economic Research and Graduate Education - Economic Institute, Prague, March 2010. Daniel McFadden and Marcel Richter. Preferences, Uncertainty and Rationality, chapter Stochastic Rationality and Revealed Stochastic Preference. Westview Press, 1991. Daniel McFadden. Frontiers in Econometrics, chapter Conditional logit analysis of qualitative choice behavior. Academic Press, New York, 1974. Daniel McFadden. Revealed stochastic preference: a synthesis. Economic Theory, 26(2):245–264, 08 2005. Maximilian Mihm and M. Kemal Ozbek. Decision making with rational inattention. Working paper, Social Science Research Network, 2012. Paul R Milgrom. Rational expectations, information acquisition, and competitive bidding. Econometrica, 49(4):921–43, June 1981. Jordi Mondria. Portfolio choice, attention allocation, and price comovement. Journal of Economic Theory, 145(5):1837–1864, September 2010. David J. Murray. A perspective for viewing the history of psychophysics. Behavioral and Brain Sciences, 16:115–137, 2 1993. Koch C. Navalpakkam, V. and P Perona. Homo economicus in visual search. Journal of Vision, 9(1):1–16, January 2009. Pietro Ortoleva. The price of ‡exibility: Towards a theory of thinking aversion. J. Economic Theory, 148(3):903–934, 2013. Luigi Paciello and Mirko Wiederholt. Exogenous information, endogenous information and optimal monetary policy. Technical report, 2011. Ricardo Reis. Inattentive Consumers. Journal of Monetary Economics, 53(8):1761–1800, 2006. Jean-Charles Rochet. A necessary and su¢ cient condition for rationalizability in a quasi-linear context. Journal of Mathematical Economics, 16(2):191–200, April 1987. Gilles Saint-Paul. A "quantized" approach to rational inattention. TSE Working Papers 10-144, Toulouse School of Economics (TSE), 2011. Babur De Los Santos, Ali Hortacsu, and Matthijs R. Wildenbeest. Testing models of consumer search using data on web browsing and purchasing behavior. American Economic Review, 102(6):2955–80, October 2012. M. L. Shaw and P. Shaw. Optimal allocation of cognitive resources to spatial locations. J Exp Psychol Hum Percept Perform, 3(2):201–211, May 1977. 29 Christopher A. Sims. Stickiness. Carnegie-Rochester Conference Series on Public Policy, 49(1):317– 356, December 1998. Christopher Sims. Implications of Rational Inattention. Journal of Monetary Economics, 50(3):665– 690, 2003. Christopher A. Sims. Rational inattention: Beyond the linear-quadratic case. American Economic Review, 96(2):158–163, May 2006. George J. Stigler. The economics of information. Journal of Political Economy, 69:213, 1961. Antonella Tutino. The rigidity of choice: Lifecycle savings with information-processing limits. Technical report, 2008. Stijn van Nieuwerburgh and Laura Veldkamp. Information immobility and the home bias puzzle. Journal of Finance, 64(3):1187–1215, 06 2009. Timothy van Zandt. Hidden information acquisition and static choice. Theory and Decision, 40:235–247, 1996. Preeti Verghese. Visual search and attention: A signal detection theory approach. 31(4):523–535, 2003. Neuron, Michael Woodford. Information-constrained state-dependent pricing. NBER Working Papers 14620, National Bureau of Economic Research, Inc, December 2008. Michael Woodford. Inattentive valuation and reference-dependent choice. Mimeo, Columbia University, 2012. Ming Yang. Coordination with rational inattention. Technical report, 2011. 30 10 10.1 Appendix A: Proofs Lemma 1 Lemma 1 If 2 is consistent with ( ; A) 2 F and q 2 Q , then it is su¢ cient for ( ;q) : Proof. Let 2 be an attention strategy that is consistent with q 2 Q in decision problem ( ; A) . First, we list in order all distinct posteriors i 2 ( ) for 1 i j ( )j. Given that is consistent with q, there exists a corresponding optimal choice strategy C : f1; :::; Ig ! (A), with C i (f ) denoting the probability of choosing act f 2 F (q) with posterior i , such that the attention and choice functions match the data, f qm = I X i m( )C i (f ): i=1 We also list in order all possible posteriors j 2 all chosen acts that are associated with posterior Fj ( ( ;q) ), 1 as F j , j ff 2 F jrf ( ; q) = j j j ( ( ;q) )j, and identify g: The garbling matrix bij sets the probability of j 2 given all choices associated with acts f 2 F j . X bij = C i (f ): i 2 ( ) as the probability of f 2F j Note that this is indeed a j ( )j j ( )j stochastic matrix B Given j 2 ( ) and m 2 , note that, I X bij i=1 i m( ) = I X i m( ) i=1 X C i (f ) = f 2F j 0 with X PJ j=1 b ij = 1 all i. f qm ; f 2F j by the data matching property. It is de…nitional that m ( j ) is precisely equal to this, as the observed probability of all acts associated with posterior j 2 . Hence, j m( ) = I X bij m( i ); i=1 as required for su¢ ciency. 10.2 Theorem 1 Theorem 1 Data set (D; q) has a rational inattention representation if and only if it satis…es NIAS and NIAC. Proof of Necessity. To con…rm that existence of a rational inattention representation (K; ) of (D; q) implies that the NIAS condition is satis…ed, consider ( ; A) 2 D and a corresponding choice 31 strategy C : ( ( ;A) ) ! (A) such that …nal choices are optimal and that matches the data, X f f qm = m ( )C ( ): ( ;A) ) 2 ( Given f 2 q ( ;A) , for which C f ( ) > 0, n o (f ) = (( ( ; A))jC f ( ) > 0 : consider all posteriors Note that it must be the case that f 2 M X M X f m Um m=1 g m Um m=1 all g 2 A for all posteriors 2 , Note also that we can rewrite the revealed posterior beliefs as a weighted average of the posterior beliefs in the underlying attention strategy, 2 3 X f 4 m ( )C ( )5 m f qm 2 ( ( ;A) ) f =: rm ( ; q) = M m M X X f f j qj j qj j=1 = X 2 ( ( j=1 2 6 6 6 6 6 ;A) ) 4 M X j=1 m 3 )C f ( ) 7 7 7 7= M 7 X f 5 q j j j m( X 2 ( mP f ( ): ( ;A) ) j=1 The second line above is obtained by dividing and multiplying each term in the sum by M X j m( )C f ( ), j=1 or the probability of state , and P f ( ) indicates the probability of state The value of choosing act f at its revealed posterior is therefore, M X f rm ( f ; q) Um = m=1 X 2 ( = M X m=1 P ( ) ( ;A) ) M X f m Um m=1 ( ;A) ) X 2 ( f given that f was chosen Pf( ) M X g m Um m=1 f g rm ( ; q) Um 8 g 2 A: This establishes NIAS. The middle inequality above stems from the fact that M X m=1 f m Um M X g m Um m=1 all g 2 A for every 2 ( ( ;A) ). Next we show that, if there exists a rational attention representation (K; ) of (D; q), then NIAC must hold For any sequence ( ; A1 ); ( ; A2 ); :::::( ; AJ ) 2 D with AJ = A1 , it must be the 32 case that, J X1 G( ; Aj ; ( ;Aj ) ) K( ; ( ;Aj ) J X1 ) G( ; Aj ; ( ;Aj+1 ) ) K( ; ( ;Aj+1 ) j=1 j=1 J X1 G( ; Aj ; ( ;Aj ) ) G( ; Aj ; ( ;Aj+1 ) J X1 ) j=1 K( ; ( ;Aj ) ) K( ; ( ;Aj+1 ) )) ) = 0: j=1 where the last equality stems from the fact that K( ; J X1 j=1 G( ; Aj ; ( ;Aj ) ) J X1 ( ;A1 ) ) G( ; Aj ; = K( ; ( ;Aj+1 ) ( ;AJ ) ). This implies that ) j=1 It is immediate that G( ; Aj ; ( ;Aj ) ) = G( ; Aj ; ( ;Aj ) ) 8 j, as the minimal attention strategy implies the same pattern of stochastic choice as does the original attention strategy. Moreover, by lemma 1, we know that ( ;Aj ) is su¢ cient for ( ;Aj ) 8 j, and so, by Blackwell’s theorem G( ; Aj ; ( ;Aj+1 ) ) G( ; Aj ; ( ;Aj+1 ) ) 8 j (see remark 1). Thus the true attention strategies ( ;Aj ) can be replaced by the minimal attention strategies ( ;Aj ) in the above expression and the inequality will still hold, implying NIAC. Proof of Su¢ ciency. There are three steps in the proof that the NIAS and NIAC conditions are su¢ cient for (D; q) to have a rational inattention representation. The …rst step is to establish that the NIAC conditions ensures that there is no global reassignment of the minimal attention strategies observed in the data to decision problems ( ; A) 2 D that raises total gross surplus. The second step is use this observation to de…ne a candidate cost function on attention strategies, K: ! R [ 1. The key is to note that, as the solution to the classical allocation problem of Koopmans and Beckmann [1957], this assignment is supported by “prices” set in expected utility units. It is these prices that de…ne the proposed cost function. The …nal step is to apply the NIAS represents a rational inattention representation of (D; q), where conditions to show that K; comprises minimal attention strategies. Consider any prior 2 such that there exists two or more sets A such that ( ; A) 2 D. Enumerate these decision problems as ( ; Aj ) for 1 j J and let D be the set of such decision problems. De…ne the corresponding minimal attention strategies j for 1 j J as revealed in ( ;A) the corresponding data and let [( ;A)2D be the set of all such decision problems, with a slight caveat to ensure that there are precisely as many strategies as there are decision problems. If all minimal attention strategies are di¤erent, the set as just de…ned will have cardinality J. If there is repetition, then retain the decision problem index with which they are associated so as to make them distinct, and thereby to ensure that the resulting set has precisely J elements. j Index elements 2 in order of the decision problem ( ; Aj ) with which they are associated. We now allow for arbitrary matchings of attention strategies to decision problems. First, let bjl denote the gross utility of decision problem j combined with minimal attention strategy l, bjl = G( ; Aj ; l ); with B the corresponding matrix. De…ne M to be the set of all matching functions m : f1; ::; Jg ! 33 f1; ::; Jg that are 1-1 and onto and identify the corresponding sum of gross payo¤s, S(m) = J X bjm(j) : j=1 It is simple to see that the NIAC condition implies that the identify map mI (j) = j maximizes the sum over all matching functions m 2 M. Suppose to the contrary that there exists some alternative matching function that achieves a strictly higher sum, and denote this match m 2 M. In this case construct a …rst sub-cycle as follows: start with the lowest index j1 such that m (j1 ) 6= j1 . De…ne m (j1 ) = j2 and now …nd m(j2 ), noting by construction that m(j2 ) 6= j2 . Given that the domain is …nite, this process will terminate after some K J steps with m (jK ) = j1 . If it is the case that m (j) = j outside of the set [K k=1 jk , then we know the increase in the value of the sum is associated only with this cycle, hence, K X1 bjk jk < K X1 bjk jk+1 ; j=1 k=1 directly in contradiction to NIAC. If this inequality does not hold strictly, then we know that there 0 0 exists some j 0 outside of the set [K k=1 jk such that m (j ) 6= j . We can therefore iterate the process, knowing that the above strict inequality must be true for at least one such cycle to explain the strict increase in overall gross utility. Hence the identity map mI (j) = j indeed maximizes S(m) amongst all matching functions m 2 M. With this, we have established that the identity map solves an allocation problem of precisely the form analyzed by Koopmans and Beckmann [1957]. They characterize precisely those matching functions m : f1; ::Jg ! f1; ::Jg that maximize the sum of payo¤s de…ned by a square payo¤ matrix such as B that identi…es the reward to matching objects of one set (decision problems in our case) to a corresponding number of objects in a second set (minimal attention strategies in our case). They show that the solution is the same as that of the linear program obtained by ignoring integer constraints, J J X X X xjl = 1: max bjl xjl s.t. xjl = xjl 0 j=1 j;l l=1 Standard duality theory implies that the optimal assignment mI (j) = j is associated with a system of prices on minimal attention strategies such that the increment in net payo¤ from any move of any decision problem is not more than the increment in the cost of the attention strategy. De…ning these prices as Kj , their result implies that, bjl bjj = G( ; Aj ; l ) G( ; Aj ; j ) Kl Kj ; or, G( ; Aj ; j ) Kj G( ; Aj ; l ) Kl : The result of Koopmans and Beckmann therefore implies existence of a function K : !R that decentralizes the problem from the viewpoint of the owner of the decision problems, seeking to identify surplus maximizing attention strategies to match to their particular problems. Note that if there are two distinct decision problems with the same revealed posterior, the result directly implies that they must have the same cost, so that one can in fact ignore the reference to the decision problem and retain only the posterior in the domain. To complete the de…nition of the 34 cost function, consider all 2 such that there exist a unique decision problem ( ; A) 2 D, and ( ;A) = 0 Now complete set the cost of the corresponding minimal attention strategy to zero K the function across all 2 such that there exist one or more ( ; A) 2 D by setting K( ; ) = 1 for 6= ( ;A) . Finally, for all 2 such that there is no ( ; A) 2 D, set K( ) = 0 all 2 and K( ; ) = 1 for 2 = . Note that we have now completed construction of a qualifying cost function K : ! R[1 that satis…es K( ; ) = 1 for 2 = and K( ; ) 2 R for some 2 . The entire construction was aimed at ensuring that the observed attention strategy choices were always maximal, ( ;A) 2 ^ (K; ; A) for all ( ; A) 2 D. It remains to prove that ( ;A) is consistent with q( ; A) for all ( ; A) 2 D. This requires us to show that, for all ( ; A) 2 D, the choice rule that associates with each 2 ( ( ;A) ) the certainty of choosing the associated act f 2 F ( ; A) as observed in the data is both optimal and matches the data. That it is optimal is the precise content of the NIAS constraint, M M X X f f f g rm ( ; A)Um rm ( ; A)Um ; m=1 m=1 for all g 2 A. That this choice rule and the corresponding minimal attention function match the data holds by construction. 10.3 Theorem 2 Theorem 2 Data set (D; q) satis…es NIAS and NIAC if and only if it has a rational inattention representation with conditions K1 to K3 satis…ed. Proof. The proof of necessity is immediate from theorem 1. The proof of su¢ ciency proceeds in four steps, starting with a rational inattention representation K; of (D; q) of the form produced in theorem 1 based on satisfaction of the NIAS and NIAC conditions. A key feature of this function is that, given 2 such that ( ; A) 2 D some A 2 F, the function K is real-valued only on the minimal information strategies f ( ;A) j ; A) 2 Dg associated with all corresponding decision problems, otherwise being in…nite. The …rst step is the proof is to construct for each such 2 a larger domain on which K is real-valued to satisfy four properties: to include all minimal attention strategies, ; to include the inattentive strategy, I 2 ; to be closed under mixtures so that ; 2 and 2 (0; 1) implies (1 ) 2 ; and to be “closed under garbling,” so that if 2 is su¢ cient for attention strategy 2 , then 2 . The second step is, for each 2 , to de…ne a new function K that preserves the essential elements of K while being real-valued on the larger domain , and thereby to construct the full candidate cost function K : ! R [ 1. The third step is to con…rm that K 2 K and that K satis…es the required conditions K1 through K3. The …nal step is to con…rm that K; forms a rational inattention representation of (D; q). Given 2 such that ( ; A) 2 D some A 2 F, we construct the domain in two stages. First, we de…ne all attention strategies for which some minimal attention strategy 2 is su¢ cient; S =f 2 j9 2 su¢ cient for g: Note that this is a superset of and that it contains I . The second step is to identify as the smallest mixture set containing S : this is itself a mixture set since the arbitrary intersection of mixture sets is itself a mixture set. By construction, has three of the four desired properties: it is closed under mixing; it contains , and it contains the inattentive strategy. The only condition that needs to be checked 35 is that it retains the property of being closed under su¢ ciency: 2 su¢ cient for 2 =) 2 To establish this, it is useful …rst to establish certain properties of S is closed under garbling: 2 S su¢ cient for 2 =) 2 . S and of . The …rst is that S. Intuitively, this is because the garbling of a garbling is a garbling. In technical terms, the product of the corresponding garbling matrices is itself a garbling matrix. The second is that one can explicitly express as the set of all …nite mixtures of elements on S , 8 9 J < = X j J 1 j = = jJ 2 N; ( ; :: ) 2 S ; 2 j 1 J S; ; : j=1 where S J 1 is the unit simplex in RJ: To make this identi…cation, note that the set as de…ned on the RHS certainly contains S and is a mixture set, hence is a superset of . Note moreover that all elements in the RHS set are necessarily contained in any mixture set containing S by a process of iteration, making is also a subset of , hence …nally one and the same set. We now establish that if 2 is a garbling of some 2 , then indeed 2 . The …rst step is to express 2 as an appropriate convex combination of elements of S as we now know we can, J X j = : j j=1 with all weights strictly positive, j > 0 all j. Lemma 2 below establishes that in this case there exist garblings j of j 2 S such that, = J X j j ; j=1 establishing that indeed 2 since, with S closed under garbling, j 2 S and j a garbling j j of implies 2 S. Given 2 such that ( ; A) 2 D some A 2 F, we de…ne the function K on in three stages. First we de…ne the function KS on the domain S by identifying for any 2 S the corresponding set of minimal attention strategies 2 of which is a garbling, and assigning to it the lowest such cost. Formally, given 2 S , KS ( ) min f 2 j K( ): su¢ cient for g Note that KS ( ) = K( ) all 2 . To see this, consider ( ; A); ( ; A0 ) 2 D with ( ;A) su¢ cient for . By the Blackwell property, expected utility is at least as high using ( ( ;A) using for which it is su¢ cient, G( ; A; ( ;A0 ) ) G( ; A; 36 ( ;A) ): ( ;A0 ) ;A0 ) as At the same time, since K; ( ;A) 2 ^ (K; ; A), so that, G( ; A; is a rational attention representation of (D; q), we know that ( ;A) ) K( ( ;A) ) ( ;A0 ) G( ; A; ) K( ( ;A0 ) ): 0 Together these imply that K( ( ;A) ) K( ( ;A ) ), which in turn implies that KS ( ) = K( ) all 2 . Note that KS ( ) also satis…es weak monotonicity on S on this domain, since if we are given ; 2 S with su¢ cient for , then we know that any strategy 2 that is su¢ cient for is also su¢ cient for , so that the minimum de…ning KS ( ) can be no lower than that de…ning KS ( ). The second stage in the construction is extend the domain of the cost function from S to . As noted above, this set comprises all …nite mixtures of elements of S , 8 9 J < = X j J 1 j = = jJ 2 N; ( ; :: ) 2 S ; and 2 j 1 J S; : : j=1 Given 2 , we take the set of all such mixtures that generate it and de…ne K ( ) to be the corresponding in…mum, K ( )=8 > < > : inf X J J2N; 2S J 1 ;f j gJ j=1 2 Sj = j j 9 > = j=1 > ; j=1 J X j KS ( j ): Note that this function is well de…ned since KS is bounded below by the cost of inattentive strategies and the feasible set is non-empty by de…nition of . We establish in Lemma 3 that the in…mum is achieved. Hence, given 2 , there exists J 2 N; 2 S J 1 ; and elements j 2 S with J X j such that, = j j=1 K ( )= J X j KS ( j ): j=1 We show now that K satis…es K2, mixture feasibility. Consider distinct strategies 6= 2 . ; ; ; We know by Lemma 3 that we can …nd J 2 N; corresponding probability weights 2S J J X X j, j , and such that, and elements j ; j 2 S with = = j j j=1 K ( ) = j=1 J X j KS ( j j KS ( j ); j=1 K ( ) = J X ): j=1 Given 2 (0; 1), consider now the mixture strategy de…ned by taking each strategy j with j with probability (1 probability ) j . By construction, this mixture j and each strategy 37 strategy generates K ( ) that, K ( ) =[ J X + (1 j KS ( j ) )+ j=1 J X ] 2 and hence we know by the in…mum feature of (1 ) j j KS ( ) = K ( ) + (1 )K ( ); j=1 con…rming mixture feasibility. We show also that K satis…es K3, weak monotonicity in information. Consider ; 2 with su¢ cient for . We know by Lemma 3 that we can …nd J 2 N; 2 S J 1 ; and corresponding J X J j j j and such that, elements 2 S with …xed range ( ) = ( ) such that = j j=1 j=1 K ( )= J X j KS ( j ): j=1 j J j=1 We know also from Lemma 2 that we can construct such that each on its domain j is a garbling of the corresponding S , we conclude that, KS ( j j. 2 S such that = J X j j and j=1 Given that KS satis…es weak monotonicity KS ( j ): ) By the in…mum feature of K ( ) we therefore know that, K ( ) J X j KS ( j J X ) j=1 j KS ( j ) = K ( ); j=1 con…rming weak monotonicity. a rational inattention We show now that we have retained the properties that made K; representation of (D; q) for prior 2 . It is immediate that and the choice function that involves picking act f 2 F ( ; A) for sure in revealed posterior r( ;A) (f ) is consistent with the data, since this was part of the initial de…nition. What needs to be con…rmed is only that the revealed minimal attention strategies are optimal. Suppose to the contrary that there exists ( ; A) 2 D such that, G( ; A; ) K ( ) > G( ; A; ( ;A) ) K ( ( ;A) ); for some 2 . By Lemma 3 we can …nd J 2 N; a strictly positive vector J X j J j and such that, 2 , such that = corresponding elements j S j=1 j=1 K ( )= J X j=1 38 j KS ( j ): 2 SJ 1; and By the fact that = J X j j and by construction of the mixture strategy, j=1 G( ; A; ) = J X j G( j ; A; ); j=1 so that, J X j G( ; A; j ) j KS ( ) > G( ; A; ( ;A) ) ( ;A) K ( ): j=1 We conclude that there exists j such that, j G( ; A; ) KS ( j ) > G( ; A; ( ;A) ) K ( ( ;A) ): Note that each j 2 S inherits its cost KS ( j ) from an element j 2 that is the lowest cost minimal attention strategy according to K on set that is su¢ cient for j , KS ( j )=K ( j ); where the last equality stems from the fact (established above) that KS ( ) = K ( ) on 2 . Note by the Blackwell property that each strategy j 2 o¤ers at least as high gross value as the strategy for which it is su¢ cient, so that overall, G( ; A; j ) K ( j ) G( ; A; j ) KS ( j ) > G( ; A; ( ;A) ) K ( ( ;A) ): To complete the proof it is su¢ cient to show that, K ( ) = K ( ); on 2 : With this we derive the contradiction that, G( ; A; j ) K ( j ) > G( ; A; ( ;A) ) K ( ( ;A) ); in contradiction to the assumption that K; formed a rational inattention representation of (D; q). To establish that K ( ) = K ( ) on 2 , note that we know already that KS ( ) = K ( ) on 2 . If this did not extend to K ( ), then we would be able to identify a mixture strategy 2 su¢ cient for ( ;A) with strictly lower expected costs, K ( ) < K ( ). This see that this is not possible, note …rst from Lemma 1 that all strategies that are consistent with ( ; A) and q( ; A) are su¢ cient for ( ;A) . Weak monotonicity of K on then implies that the cost K ( ) of any mixture strategy su¢ cient for ( ;A) satis…es K ( ) K ( ), as required. The …nal and most trivial stage of the proof is to ensure that normalization holds. We …rst normalize each function K ( ) for those 2 for which ( ; A) 2 D for some A 2 F. In such cases we note that I 2 S , so that KS (I ) 2 R according to the rule immediately above. If we renormalize this function by subtracting KS (I ) from the cost function for all attention strategies associated with this prior then we impact no margin of choice and do not interfere with mixture feasibility, weak monotonicity, or whether or not we have a rational inattention representation. Hence we can avoid pointless complication by assuming that K (I ) = 0 from the outset so that 39 this normalization is vacuous. We have now fully-speci…ed a cost function K : ! R for all 2 such that ( ; A) 2 D some A 2 F. All that remains to complete our de…nition of the candidate cost function K is to expand the domain to include all inattentive strategies for the irrelevant priors 2 for which there is no corresponding ( ; A) 2 D. We de…ne = I in such cases and set and K (I ) = 0. Note that this single element domain is trivially closed under garbling and under mixtures. With this, we de…ne the candidate cost function K : ! R [ 1 by patching together the set of prior-based cost functions as de…ned above, K ( ) if 2 1 if 2 = : K( ; ) = Note that weak monotonicity implies that the function is non-negative on its entire domain. It is immediate that K 2 K, since K( ; ) = 1 for 2 = and all domains for 2 contain the corresponding inattentive strategy I on which K( ; ) is real-valued. It is also immediate that K satis…es K3, since K (I ) = 0 by construction. It also satis…es K1 and K2, and represents a rational inattention representation, completing the proof. Lemma 2 If = J X 1 any garbling j j 2 SJ with J 2 N; of , there exist garblings j j of = 1 2 J X j with j > 0 all j, and j J j=1 2 , then for such that, j ; j=1 Proof. By assumption, there exists a j ( )j j ( )j matrix B with for all k 2 ( ), X k bik m ( i ): m( ) = i2 Since = J X j j, we know that ( j) P k bik = 1 all i and such that, ( ) ( ). Now de…ne compressed matrix B j as the unique 1 submatrix of B obtained by …rst deleting all rows corresponding to posteriors i 2 ( )n ( j ), and then deleting all columns corresponding to posteriors k such that bik = 0 all i 2 ( )n ( j ). De…ne j 2 to be the strategy that has as its support the set of all posteriors that are possible given the garbling j of j , ( j) = f k 2 ( )jbik > 0 some i 2 ( j )g; and in which state dependent probabilities of all posteriors are generated by the compressed matrix Bj , X j k ( ) = bik jm ( i ); m i2 ( j) for all k 2 ( j ). j j 2 , Note by construction that each attention strategy P ik is a garbling iof the corresponding j j since each B is itself a garbing matrix for which k b = 1 for all 2 ( ). It remains only to 40 verify that = J X j j. This follows since, j=1 m( k )= X i2 b ik m( i X )= i2 ( ) Lemma 3 Given 2 b ik J X j i j m( ) = j=1 ( ) J X j j=1 , there exists J 2 N; 2 SJ 1, X i2 ( bik jm ( i ) = J X j k j m ( ): j=1 j) j and elements 2 S with = K ( )= j KS ( j J X j KS ( j) j j j=1 such that, J X J X ): j=1 Proof. By de…nition K ( ) is the in…mum of over all lists j J j=1 j=1 = J X j j. 2 S such that We now consider a sequence of such lists, indicating the order in this sequence j=1 j (n) J(n) , j=1 in parentheses, J(n) with = X j (n) j (n) such that in all cases there are corresponding weights (n) 2 S J(n) 1 and that achieve a value that is heading in the limit to the in…mum, j=1 J(n) lim n !1 X j (n)KS ( j (n)) = K ( ). j=1 A …rst issue that we wish to avoid is limitless growth in the cardinality J(n). The …rst key observation is that, by Charateodory’s theorem, we can reduce the number of strictly positive J (n) X j (n) to have cardinality J (n) weights in a convex combination = M + 1. We j (n) j=1 J (n) con…rm now that we can do this without raising the corresponding costs, X j (n)KS ( j (n)). j=1 Suppose that there is some integer n such that the original set of attention strategies has strictly higher cardinality J(n) > M + 1. Suppose further that the …rst selection of J 1 (n) M +1 such posteriors for which there exists a strictly positive probability weights 1j (n) such that = J 1 (n) X 1 j (n) j (n) has higher such costs (note WLOG that we are treating these as the …rst J 1 (n) j=1 attention strategies in the original list). It is convenient to de…ne so that we can express this inequality in the simplest terms, J(n) X 1 j (n) = 0 for J 1 (n)+1 J(n) 1 j j (n)KS ( (n)) > X j=1 j=1 41 j (n)KS ( j (n): j J(n) 1 This inequality sets up an iteration. We …rst take the smallest scalar 1 1 j (n) = j (n): J 1 (n) That such a scalar exists follows from the fact that 2 (0; 1) such that, X J(n) 1 j (n) = j=1 X j (n) = 1, with all components j=1 in both sums strictly positive and with J(n) > J 1 (n). We now de…ne a second set of probability weights 2j (n), 1 1 (n) j (n) j 2 : j (n) = 1 1 J(n) for 1 j J(n). Note that these weights have the property that = X 2 j (n) j (n) and that, j=1 J(n) J(n) X j 2 j (n)KS ( (n)) = X j=1 j=1 " 1 1 (n) j 1 j (n) 1 # J(n) KS ( j (n)) < X j (n)KS ( j (n): j=1 By construction, note that we have reduced the number of strictly positive weights 2j (n) by at least one to J(n) 1 or less. Iterating the process establishes that indeed there exists a set of no more than M + 1 posteriors such that a mixture produces that …rst strategy and in which this mixture has no higher weighted average costs than the original strategy. Given this, there is no loss of generality in assuming that J(n) M + 1 in our original sequence. With this bound on cardinality, we know that we can …nd a subsequence of attention strategies j (n) which all have precisely the same cardinality J(n) = J M + 1 all n. Going further, we j (n) 1 . First, we can select can impose properties on all of the J corresponding sequences n=1 subsequences in which the ranges of all corresponding attention functions have the same cardinality independent of n, ( j (n)) = K j for 1 j J. Note we can do this because, for all j and n, the number of posteriors in the attention strategy j (n) is bounded above by the number of posteriors in the strategy , which is …nite k K j and With this, we can index the possible posteriors jk (n) 2 ( j (n)) in order, 1 then select further subsequences in which these posteriors themselves converge to limit posteriors, jk (L) = lim n!1 jk (n) 2 : which is possible because the sequence of posteriors lives in a compact set, and so have a convergent subsequence. We ensure also that both the associated state dependent probabilities themselves and the weights J(n) X j (n) converge, = j (n) in the expression j (n) j=1 lim n!1 m lim jk n!1 (n) = jk m (L); j (n) = j (L): 42 Again, this is possible because the state dependent probabilities and weights live in a compact set. The …nal selection of a subsequence ensures that, given 1 j J, each j (n) 2 S has its j value de…ned by precisely the same minimal attention strategy 2 as the least expensive among those that were su¢ cient for it and hence whose cost it was assigned in function KS . Technically, for each 1 j J, KS ( j (n)) = K ( j ); for 1 n 1: this is possible because the data set and hence the number of minimal attention strategies is …nite. We …rst use these limit properties to construct a list of limit attention strategies j (L) 2 S J X j for 1 with = j J. Strategy j (L) has range, j j=1 ( j j (L)) = [K k=1 jk (L); with state dependent probabilities, j (L) jk Note that the construction ensures that = jk m (L): (L) = m J X j (L). j (L) To complete the proof we must j=1 establish only that, K ( )= J X j j (L)KS ( (L)): j=1 We know from the construction that, for each n; J X j (n)KS ( j (n)) = J X j (n)K ( j ): j=1 j=1 Hence the result is established provided only, KS ( j (L)) K ( j ); which is true provided j being su¢ cient for all j (n) implies that j is su¢ cient for the corresponding limit vector j (L). That this is so follows by de…ning B j (L) = [bik (L)]j to be the limit of any subsequence of the j ( j )j K j stochastic matrices B j (n) = [bik (n)]j which have the de…ning property of su¢ ciency, X j i (n) m ( jk (n)) = [bik (n)]j m ( ); i2 ( j) for all jk (n) 2 ( j (n)) and 1 m M . It is immediate that the equality holds up in the limit, establishing that indeed j is su¢ cient for each corresponding limit vector j (L), con…rming …nally that KS ( j (L)) K ( j ) and with it establishing the Lemma. 43