Scratch paper

November 20, 2009

This is me brainstorming. It’s probably not going to make much sense.

So. I’m able to use attitude/belief/method, and I have reason to believe that this approach will work in arbitrary environments. In some cases, due to properties of the game (zero-sum, identical payouts, non-interacting actions) learning can’t take place, but in all cases attitude is either irrelevant (zero-sum, identical payouts) or learnable (non-interacting actions – in which case belief and method aren’t learnable).

Now, the question is, given this ability to learn, how do you use it. The environment we’re trying to deal with is extremely complex. It is, essentially, an indefinitely repeated sequence of normal-form games.

Given this efficient learner, it is tempting to use it to predict opponent actions, and then respond to those actions in whatever manner you feel is most appropriate. This doesn’t work because it’s too short-sighted. The goal is not simply to optimize your action in this particular case, but also to alter the state of the opponent so as to allow you to achieve your objectives in future interactions.

The obvious problem here is that you don’t know the internal state of the opponent, or how it will be affected by your actions, or how it will affect their actions. Given the amount of possible strategies, and the fact that you don’t have the opportunity to repeat observations, there is no way to learn arbitrary strategies (obviously). So the question is, what assumptions are reasonable to make in this situation?

Rationality immediately leaps to mind. Unfortunately, the Folk theorem combined with an indefinite number of repetitions implies that any course of action at all can be rational, depending on the beliefs they hold about their opponent’s behavior.

I will begin with the assumption that an agents actions in each individual game are rational, in that they attempt to maximize some quantity in the presence of an opponent attempting to maximize another quantity, but the quantities maximized are not necessarily the agents own payoff. Specifically, I will assume that an agent will attempt to maximize a linear combination of their own payoff and their opponent’s payoff.

Note that this does not completely solve the problem, because there may be multiple Nash equilibria. In some senses, the Nash equilibrium selection problem is intractable (indeed, the failure of game theory to provide a means of distinguishing a single equilibrium provides support for that proposition). The simplest way to deal with that problem is to assume that the opponent uses a fixed method to select an equilibrium, and attempt to learn that method. There are some vulnerabilities (opponent deliberately selecting favorable equilibria) and some inaccuracies (opponent will also be learning you in self-play), but this is the approach I will use for now.

So – to recap: assumptions so far are that-
1 – problems of Nash equilibria selection are to be dealt with by learning, and adopting the equilibria selection method used by the opponent. (not strictly accurate in a co-learning situation, but will still result in convergence).
2 – each game will be dealt with by both players as an isolated rational decision in which each player is attempting to maximize a quantity linearly dependent on the payoffs of the agents.

The question of how to behave under these assumptions is still open, but I’ve written enough for now.

Gift Exchange

November 2, 2009

Consider a game of iterated Prisoner’s Dilemma . Reciprocation is a good model of cooperation in this environment. It’s used in the Tit-for-Tat strategy, it’s a very intuitive approach for people, and it has a number of nice properties. However, it’s not the only equilibrium available in the iterated game. Using tradition payoffs (5,3,1,0), any equilibrium in which both players achieve a payoff higher than 1 is an equilibrium of the iterated game. One example of a different equilibrium would be the inefficient equilibrium, where players take turns defecting while the other cooperates. This is Pareto-dominated by mutual cooperation, but there are also non-Pareto-dominated equlibria different from mutual cooperation. For example, an equilibrium where one player cooperates all the time, and the other cooperates half the time gives one player an average payoff of 4 and the other one an average of 1.5. The symmetry of the Prisoner’s Dilemma game tends to direct one to the focal point of mutual cooperation, but you can’t always count on symmetry.

Consider a situation of gift exchange. Each player begins with an initial endowment of 1 point. Each player can choose (secretly and simultaneously) to sacrifice a fraction of their initial endowment to provide a benefit to the other equal to the amount they sacrifice times their gift-giving efficiency. This produces a situation similar to Prisoner’s Dilemma. For example, if each players efficiency is 2, then you have a game of Prisoner’s Dilemma with payoffs (3,2,1,0) (with the modification that players can play a fractional strategy, and receive their expected payoffs.

With equal efficiencies, it is clear that for any efficiency above 1, the only symmetric Pareto optimal equilibrium is for both players to sacrifice their full endowment. Unequal equilibria are still possible though (just like in Prisoner’s Dilemma). For example, with efficiency c, full sacrifice by 1 player, and sacrifice of 1/c + e by the other is an equilibrium paying (1+ce,1+c-e). The interesting situation in the gift exchange game comes when the gift giving efficiencies are not equal.

If the product of they players efficiencies is less than 1, then no exchange is profitable – it’s better for each player to simply keep their initial endowment. If the product is exactly 1, exchange is possible, but pointless – any equilibrium which includes some exchange will give payouts of 1 to each player, and will be risk-dominated by the equilibrium involving no exchange. When the product of the efficiencies is greater than 1, things become interesting.

Fairness in exchange is a nice principal, but it’s loosely defined. First let’s consider those equilibria which provide the greatest social good. If both agents have an efficiency greater than one, that is the equilibrium where each agent donates their entire endowment. If one agent has an efficiency less than one, the greatest social benefit arises when that agent donates it’s entire endowment and the other agent donates enough to bring it’s payout down to 1. For efficiencies a1, ab>1, this gives (b*a,1).

That equilibrium provides the greatest social benefit, but it disturbs our sense of fairness, because the agent which is providing the greatest good to the community (the efficient giver) receives no payoff at all. In the case where both agent’s efficiencies are greater than 1, the more efficient agent does receive a payoff, but not as much as the less efficient agent. In addition to upsetting our sense of fairness, this approach is a matter of concern because it provides a motivation for the more efficient agent to lie about it’s efficiency.

Consider the case where the agent’s efficiencies are 2 and 3. The more efficient agent would do better to claim an efficiency of 2, and then to provide its opponent with a payoff of 2 by only spending 2/3s of it’s endowment, giving it a payoff of 2 1/3 (assuming that agents can only observe the amount they receive, and not the amount sacrificed by the opponent).

The Nash Bargaining Solution is worth exploring in this context. First consider agents with efficiencies a and b with a>1, b>1. The Nash Bargaining Solution is found by maximizing (bd+1-c-1)*(ac+1-d-1), where c and d are the amount spent by the agents. When the effort is constrained to lie between 0 and 1, the optimal value is found by the less efficient agent applying full effort, and the more efficient agent applying (ab+1)/2a where a is the agent’s own efficiency, and b is its opponent’s. (That value may be more than 1 – if it is, both agents give their entire endowment). For example, with efficiencies (1/2,4) effort will be (1,3/8), giving payoffs (3/2,9/8). Another alternative is to insist on equal benefit to each player – this is even less efficient than the Nash Bargaining solution, but more fair. With efficiencies (1/2,4) effort will be (1,3/10) giving payoffs of 1.2. The far extreme of this spectrum is when all the reward accrues to the more efficient agent. This policy gives effort (1,1/4) for payoffs (1,5/4).

The question of how to allocate effort appropriately is very complex. Nash Bargaining solution seems good, but doesn’t elicit truthful disclosure of efficiency. Strict equality does elicit truthfulness, but can be significantly less efficient than the Nash Bargaining solution (to say nothing about the socially optimal solution). A more definitive answer will require a better description of the problem. I suspect that the fairness concerns that people exhibit are a combination of a desire to avoid being exploited (expending effort to someone who is not willing to reciprocate), and a desire to promote social welfare (which will, in general, benefit them). A more precise definition of exploitation is clearly desirable, and a better description of the environment so that opportunities for reciprocation are better defined.

Risk Aversion II

October 25, 2009

So, a standard approach to risk aversion is to decide that peoples valuation of money is not linear – instead it varies as a concave function of the amount of money. One sample function is to value money as the log of a constant plus the amount of money. To go back to the problem from the previous post, if we take your total assets as the constant, then the amount you should gamble with your friend is 1/4 of your total assets.

This seems a much more reasonable amount to bet to me, but the justification for it is weaker. There are a couple other ways you can approach this problem, which also lead to the conclusion that you should bet 1/4 of your total assets.

One approach is to treat the process of gambling with your friend as a series of multiplications to your wealth instead of a series of additions. At each step, your wealth is multiplied by either (1-x) or (1+2x) where x is the fraction of your wealth you choose to gamble. In the limit, your total wealth will be (1-x)^l*(1+2x)^w where l is the number of times you lose, and w is the number of times you win. Since you’re equally likely to win or lose, you maximize your wealth by maximizing (1-x)^(g/2)*(1+2x)^(g/2) where g is the number of games you play. g is effectively fixed, so we can maximize wealth by maximizing (1-x)*(1+2x), which also maximizes at 1/4.

Another approach is to treat the problem in a more evolutionary manner. Suppose you were in competition with another person who was also gambling with your friend. Instead of maximizing the cash you finish with personally, you wish to maximize your share of the total cash owned by you and your opponent. Note that this is not a straightforward maximization problem because your opponent is also choosing an amount to gamble. Following this policy again leads you to choose to gamble 1/4 of your total wealth.

In this example both approaches lead to gambling 1/4 of your assets, however they are not guaranteed to produce identical results. The first approach can only be used in situations which you can treat as a multiplicative process, which doesn’t always apply. The second option will give different results in any situation where there is a positive chance of losing all your money (the logarithmic approach will pay anything to avoid that chance, but the evolutionary approach will not).

The first approach seems specialized to the current example, but the second approach is more flexible. In the next post I will discuss some implications of the second approach.

Risk Aversion

October 24, 2009

Suppose you had an incredibly rich, incredibly stupid friend. Your friend is so stupid that he will give 2 to 1 odds that a fair coin will come up heads. Your friend is so rich that that no matter how much he gambles, he never runs out of money. He will cheerfully gamble with you, as much as you like – his only shortcoming as a friend is that he will not give you credit.

So the question arises – how much should you bet? Obviously, gambling with him has a positive expected value. The more money you bet, the more you can expect to win. Following a strict maximization principle would lead you to bet all your money every time you get a chance to gamble with him. This holds even if you take into account your opportunity to gamble multiple times.

Betting all your money maximizes your expected rate of return, however it will inevitably lead to your bankruptcy. Since your friend doesn’t run out of money, he will cheerfully keep flipping coins with you until one comes up heads. Since your friend will not give you credit, at that point you can’t bet anymore – you will be stuck with no money.

The odd thing about this situation is that maximizing your expected money leads you to gambling all of it. This holds whether you consider the problem as a single gamble, or as any finite sequence of gambles. But following this policy almost certainly leads to bankruptcy. The question is – what principle is more valid than expectation maximization in this situation, and what decision does it lead you to? I will discuss that in a subsequent post – I’ve spent too much time writing this one already.

Poker Ideas

October 14, 2009

I want to consider the question of automated poker play. When I consider this problem my instinctive approach is to attempt to put a probability distribution over your opponents hand, based on betting history and the cards on the table. In order to do this, you need to be able to quickly evaluate the strength of all hands given the cards on the table.

This is fairly complicated once all the cards are down, but it’s even worse when there are still cards to come. For any pair of hands and set of cards on the table, you can calculate a probability of victory for each hand, but it is prohibitively expensive.

This probability-of-victory relationship is not necessarily a nice relationship. I don’t have an example in mind, but a brief search on the web reveals that, pre-flop, ACKD > JS10S > 4S4C > ACKD.
This makes calculating the strength of a hand problematic.

I propose the following approach. Assume that each hand has a real-valued strength. The odds of one hand beating another can be found using the logistic function of the difference of their strength values.
(1/(1+e^-x) – btw, is there a math formatting function for wordpress?). Now, we already know that these assumptions are false from the set of non-transitive hands, but they may provide a reasonable approximation. The question of how to calculate hand strengths then arises.

First, note that we will not be able to calculate a set of strength values which is fully consistent with the actual probabilities of victory for the various hands. This is good, because it stops us from the outset attempting to find a perfect value. I propose to approximate this value by conducting a number of trials between pairs of hands, and then finding the set of strength values which gives the highest likelihood of the observed outcome of the trials. Conducting the trials shouldn’t be too computationally expensive, because hands only need to be evaluated for a specific set of flop, turn, and river. It will require further exploration to determine how many trials are needed to provide a good sense of the strength of the various hands. I’m not sure of a perfect way to optimize the strength values, but I believe I have some Matlab code somewhere that does something like that. It’s probably worth coding this.

Damned Statistics

October 7, 2009

This is just a brief thought about an example that I ran across.  It was in a book about Newcomb’s paradox, and it goes like this:  There is a set of kings, each king chooses to be either strict, or lenient.  Each king either has a long reign, or is deposed. The number of observed kings follows these counts:

      Strict Lenient
Short  24     24
Long   26     26

From this, it would seem that kings are strict or lenient equally often, and that this does not have any effect on the length of their reigns. However, if you add a measurement classifying each king as type A or type B, and classify the data correctly, you can get these results:

A/B   Strict Lenient
Short  16/8  2/22
Long   24/2  8/18

These values make it seem that if you’re a type A king, you can improve your odds by choosing leniency (from 60% to 80%) and if you’re a type B king, you can improve your odds by choosing leniency (from 20% to 45%). This kind of operation can be done with any pair of apparently independent variables. In theory, given enough data, you can find a variable with the right properties just by sheer chance. I’m not sure what lesson should be drawn from this.

Newcomb’s Paradox

October 6, 2009

I’ve been reading a book which relates Newcomb’s paradox with the Prisoner’s Dilemma, and I am becoming more and more disillusioned with that comparison.  So I have decided to outline my objections to Newcomb’s Paradox.


Newcomb’s Paradox goes like this:  There are 2 boxes in front of you, one open, and one shut.  You have the option to take both boxes, or only the closed box.  Before you arrived, a superintelligent psychologist put 1000$ in the open box, and may have put 1000000$ in the closed box.  They only put 1000000$ in the closed box if they predict that you are going to only take the closed box – if they predict you are going to take both boxes, they leave the closed box empty.  You have observed them making predictions before, and seen that they were always accurate.  The question is, should you take both boxes, or should you just take the closed box?


The argument in favor of taking both boxes is that it cannot possibly hurt to take the extra box.  By the time you arrive, the psychologist has already put the money in the box, or he hasn’t.  In either case, the amount of money in both boxes is guaranteed to be higher than the amount of money in just the closed box.


On the other hand, since you believe that the psychologist is a perfect predictor, you believe that there are only 2 possible outcomes – you take both boxes, and received 1000$, and you take the closed box, and receive 1000000$.  Of those outcomes, you obviously prefer the second, so you should only take the closed box.


I think that the problem here is that it is assumed that you are able to make a choice.  If there is some entity which can accurately predict your actions, how can you be said to be able to make a choice at all?  When you drop a rock, we don’t say it chooses to fall down – it can only fall down.  In the presence of an entity which can accurately predict your actions, I don’t think it’s useful to say that you have free will at all.  It’s misleading to ask what you would choose, because obviously you wouldn’t choose anything at all – you would simply take one box or both, depending on some deterministic process which the psychologist can model with perfect accuracy.


Now, I actually believe that there’s nothing wrong with the perspective that people obey deterministic processes.  In principle, I would agree that it is theoretically possible to predict the behavior of someone to that degree of accuracy (although I think that it is also practically impossible).  I would say that free will is a useful concept used to model the behavior of entities which cannot be accurately modeled.  If it’s useful to speak of the behavior of an entity in terms of things like goals and intentions, then that entity is said to have free will.  Of course, this means that a single entity may or may not have free will, depending on your point of view 🙂  (practically speaking, I don’t think this comes up, because anything sufficiently simple for people to model in that degree of detail is too simple to think).

Hello world!

October 6, 2009

I find it difficult to write about things – especially ideas that I’m trying to get across.  I find it stressful to write for publication or for other people to read (may it’s associate with fear of public speaking?).  Anyway, I have created this blog to have a place to write things which are more formal than notes that I make for myself, but less formal than something that will be submitted for publication.  I may or may not read the comments, or depending on the level of noise, I may just turn them off.