On Probability, Cost and Risk

Events in life happen as a course of action and consequence. While several of the actions are predictable, sometimes even controllable, many of the contributing factors are unpredictable, uncontrollable, or potentially unknowable. In such situations, the rational person shall only endeavor to have an uncertain outlook on the consequence. The uncertainty will allow for eventualities that were previously unaccounted for. Such uncertainty can be represented as a probability over possible outcomes.

[ 07/5/2023 - Bob] In this particular conversation, I think we are talking about an everyday approach of one’s perspective of their experiences as they happen. One could think of not only just uncertainty, but an open-minded and informed mindset as events unfold.

Aside: The astute reader might realize that the “possible outcomes” already include only the known outcomes. To avoid further complications in defining outcomes’ uncertainties, we will include a non-zero probability for “other outcome”. This provides sufficient strain relief to the forthcoming arguments. [ 07/5/2023 - Bob] Your choice of “strain relief” reminds me of many mechanical algorithms we use during our running discussions that seem to work well when talking about behavior. Here ends my aside to an aside :D.

With probabilities assigned for outcomes, one does not impose reality or predictions (not in the sense of logic, but more like a fortune teller) on the future. One merely sets up a landscape of expectations before observing the event. In other words, one’s worldview, experience, and perhaps wishful thinking are encoded within a set of possibilities. Once we assign a relative probability for each of these possibilities, we have a probability distribution over possible outcomes. Such a probability distribution is referred to as the prior distribution. 

In Bayesian philosophy, the next step is to observe the event and use observations to update this probability distribution. Without digressing into the mechanics of how such an update is made, let’s move on to discuss this updated probability distribution. This distribution is traditionally referred to as the posterior distribution.

The key difference between the prior distribution and posterior distribution is the fact that the posterior distribution accounts for the event having occurred. Once the posterior probability is available, we now have an updated worldview based on the observation of this event. However, one instance of the event may not be sufficient evidence to increase certainty or confidence in any of the outcomes. It is prudent to observe several occurrences of the event to update the probability distribution. The beauty of the Bayesian update is that the posterior that was calculated from the first observation shall now serve as the prior for subsequent observations. Iteratively, we can update the probability distribution with observations.

On the choice of prior

The Bayesian update combines observations and prior probability distribution to compute a posterior distribution. The prior distribution, therefore has a sizable impact on the posterior distribution. For example, if one chose a prior for a roll of a single dice to be probability 1 for the face coming up as 3, and 0 for all other numbers, then no amount of observations are going to update the distribution to allow for non-zero probabilities for “not-three” outcomes. In other words, if the observer only expects 3 to show up (perhaps, the observer firmly believes every face on the dice has only 3 pips) then seeing other numbers is such a total surprise that they cannot account for this observation.

Indeed in the real world, humans are equipped to quickly learn (although not always) from such surprises — the observer in the above example will henceforth choose a more appropriate prior for future rolls for the dice. Perhaps, a prior with non-zero occurrence probabilities for numbers other than 3. The caution applies not just to an extreme choice (i.e. assigning 0 as prior probabilities) but also to highly biased priors. Typically a larger number of observations is required to shift biases in a prior as compared to using unbiased priors. In a world where data is abundant, it may seem like an absurd concern, but when decisions are made using posterior distributions, the number of observations needed for a decision may directly correlate to the time available (cost!) to make such a decision.

[ 07/5/2023 - Bob ] Interesting, I bet we often fall into this trap of only adding the unexpected outcome we just witnessed to our posterior - becoming the next prior, where the opportunity is there to catch yourself and re-examine the prior framework even more. Like, “Wait, this has 6 sides and went from 1 possible to 2 outcome, what am I missing?”

Decision Making

If observing events and learning from them is all we are interested in doing, the Bayes update has a batteries-included framework to accomplish this. However, in most real-world scenarios, observations and prior information are used to make decisions. In this context, a decision refers to a specific course of action taken to benefit the observer. An important point to note here is that this decision is influenced by the posterior but not entirely defined by it. Continuing with our example of rolling a dice, the probability updates are occurring with each observation of the roll, but perhaps the decision to be made is a bet as to what the next roll will be.

[ 07/5/2023 - Bob] I wonder how the framework changes between observing a “random” event and playing an active roll (pun intended) in an outcome.

Previous
Previous

It all began in Tachikawa