Joshua Wynn and Sean McPhillips - 24/10/2018
Introduction
In August, Lowes Financial Management surveyed a selection of clients to gather data as to the perceptions of what different phrases indicate in terms of probability.
The survey included ten phrases in no logical order, so as to have minimal influence on the results; respondents were asked to mark two points on a line numbered from 0% - 100% to specify the range which they considered each phrase represented in terms of probability. Below is the example we provided:
To encourage participation and as thanks for taking part, we promised to give £10 to one of five charities selected by each participant, as well as entering each participant into a prize draw with the chance to win a £200 Amazon voucher. The closing date for entries was 28^{th} September and the two most supported charities were Dementia UK and Macmillan Cancer Support. The winner of the prize draw was Mr. Millington from Lancashire.
Context and Research Rationale
Figure 1
In 1964, Sherman Kent, an American academic in the employ of the CIA, published a study much along the same lines as our own, only with a sample size of 23 NATO military intelligence officers.[1] Kent asked his participants what single percentage figure each phrase implied. The results table from the 1964 study can be seen in Figure 1. The black dots represent individual results and the grey rectangles are Kent’s proposed ranges according to the findings.
For some time, structured product providers have been publishing both back-tested and implied probability of outcomes for their various product offerings and Lowes sought to establish whether it could realistically and accurately translate these numerical statistics into appropriate words and phrases. We set out to create an up-to-date study of probability, with a much larger sample size from our client base. Our ambition for the end of the project was to be able to construct a system of defining the likely outcomes of the investments on our website, based on what the consensus of the site’s own users specified each phrase meant.
[1] Sherman Kent, ‘Words of Estimative Probability’ (1964).
Our own study was to differ from its predecessor in some significant ways. Firstly, would be undertaken in a different century, a full fifty-four years later, which some would say throws into question the value of contrasting results between the two. But chance and the predictability has been a perennial line of human inquiry. Indeed, many of those who subscribe to the Efficient Market Hypothesis could probably find solace in Sophocles: ‘What should a mortal man fear, for whom the decrees of Fortune are supreme, and who has clear foresight of nothing? It is best to live at random, as one may.’ The future is ever uncertain until it is behind us. On the other hand, if there were not investors willing to cast the die, taking instead to quivering and lamenting the anarchic forces at work in the world, we would not have the booming modern economy that sustains the prosperity of millions and all other progress would once again move at medieval speed. Fifty-four years does not seem like such a long time when considered in the wider historic context.
Where Kent asked for a single percentage and extrapolated a range from those results, we would do the opposite, requesting that respondents mark a range from which we could take an average and then a median point. We thought this a significant improvement on the original method, hypothesizing that when people think about the probability of an event occurring, they do so with a degree of flexibility. For example, something might be extremely likely to happen, so the upper end of the range might be 100%, but what would the lowest percentage be that can still encompass ‘extremely likely’?
Finally, the 2018 survey was to assess fewer phrases (10), in comparison to Kent’s 17, although those selected would still represent a broad range of probabilities. The phrases we decided to drop were those which we thought – and Kent’s results reinforced this assessment – were already covered by others and were therefore inessential. Examples include ‘Probable’, ‘Probably’, ‘Improbable’ and ‘Probably Not’.
Results
By the closing date on 28^{th} September, 118 surveys had been returned to us; of those, five were discounted due to misunderstanding of instructions or illegibility.
Figure two is a table of the average percentages provided by respondents; it shows the average maximum and minimum percentages calculated by taking the mean of each for each phrase. The Average Probability was calculated by taking the mean of all the results (both maximum and minimum) for each phrase.
Figure 3 shows the results in Figure 2 as a bar chart, with the X Axis detailing the phrases; the Y Axis represents the average probability. In all cases the phrases have three vertical bars: the results of Average Minimum Probability; Average Probability (Mean of Minimum and Maximum); Average Maximum Probability.
Table 1
Phrase |
Average Minimum Probability |
Average Probability (Mean of Minimum and Maximum) |
Average Maximum Probability |
Almost Certainly |
81% |
89% |
96% |
Highly Likely |
73% |
83% |
92% |
Very Good Chance |
65% |
77% |
88% |
We Believe |
57% |
74% |
91% |
Likely |
60% |
71% |
82% |
Doubtful |
17% |
28% |
38% |
Unlikely |
15% |
25% |
35% |
Highly Unlikely |
12% |
20% |
28% |
Little Chance |
9% |
19% |
28% |
Almost no Chance |
7% |
13% |
18% |
Figure 3
Analysis
The most immediately striking aspect of the results is the clear polarization we can observe between positive and negative terminology in all cases. A stark contrast exists between the two centermost phrases: ‘Likely’ (71% avg.) and ‘Doubtful’ (28% avg.) – a difference of 43%. As previously stated, the phrases were arranged in no logical order, to prevent such results as were the product of design bias. However, it is important to note the absence of a phrase indicating a balance of probability, unlike Kent’s ‘About Even’, and the drop seems more dramatic because of it. Even so, we might reasonably expect that ‘Likely’ and ‘Unlikely’ would be roughly equidistant from 50%, but this is clearly not the case, with the former being treated with an optimism (71%) disproportionate to the latter’s pessimism (25%).
The average of all the negative terms was 21%; the average of the positives was 79%, coincidentally 21% above/below each end of the scale. We should not read too much into this, though, as not every positive phrase had a parallel negative counterpart (e.g. ‘Very Likely’ and ‘Very Unlikely’), and vice versa.
Whilst the results of Kent’s study were affected by polarisation by negativity and reluctance to commit to positivity (look again at Figure 1), it would seem that the much larger sample size of the 2018 study – five times that of the original – and the use of averaging in presenting the results has gone some way to diminishing the effect of outliers. The results in Kent’s graph seems far more arbitrary, often exhibiting little correlation, so the justification for the ranges he proposed (the grey rectangles) as the percentage range specified by each phrase is debatable.
Generally, it was those phrases which had a meaning closest to neutrality in probabilistic terms which had the most dispersed ranges, which is not surprising as, arguably, they imply more subjectivity than phrases such as ‘About Even’, ‘Almost Certainly’ and ‘Almost no Chance’, which is exactly what Sherman Kent concluded in his own work. None was more varied in response, however, than ‘We Believe’; in this case, many respondents designated very wide ranges, with one as broad as 0%-100%. Whist averaging was able to provide a mean of 74%, the Average Minimum Probability (57%) and Average Maximum Probability (91%) of ‘We Believe’ had the largest spread between them by over 10%; a difference of 34% made it the least consistently defined phrase of the survey.
Figure 6
Figure 7
Figure 8
Conclusion
Whilst the study has provided valuable results based upon with a larger sample size than Kent’s original and collection of ranges rather than single percentages, it also has some shortfalls. For example, the effect of respondents’ personal circumstances, and indeed their demography, has not been considered. Overall, though, it has been a useful process.
We have however found it difficult to translate the results into a scale that we believe is appropriate to use to describe outcomes. To explain, let us look again at a selection of the results:
Table 2
Phrase |
Average Minimum Probability |
Average Probability (Mean of Minimum and Maximum) |
Average Maximum Probability |
Almost Certainly |
81% |
89% |
96% |
Highly Likely |
73% |
83% |
92% |
Very Good Chance |
65% |
77% |
88% |
Unlikely |
15% |
25% |
35% |
Highly Unlikely |
12% |
20% |
28% |
Little Chance |
9% |
19% |
28% |
Almost no Chance |
7% |
13% |
18% |
As can be seen, the broad range of responses demonstrates that interpretation of phrase plays a large part to the extent that it offers no logical order, yet it would be wrong to shift the definitions to more logical positions, thereby forcing consensus.
If we were to use such a table in financial services to explain probability of outcomes we might be inclined to skew the scale to accommodate the most pessimistic perceptions. We could therefore take the average minimum probability to describe outcomes. So, ‘Almost Certainly’ would be 81% rather than the mean of 89% and ‘Almost no chance’ would be 7% and below as opposed to the average of 13%, giving an indication of probability that only succeeds in being as pessimistic, and therefore as safe as possible.
One of our respondents kindly pointed to an accepted probability scale defined by the International Panel on Climate Change:
Table 3: IPCC Probability Scale
Verbal Expression |
Chance (per cent) |
Chance (Fraction) |
Virtually Certain |
More than 99% chance that the result is true |
≥ 99 out of 100 |
Very Likely |
90-99% chance that the result is true |
≥ 9 out of 10 and ≤ 99 out of 100 |
Likely |
66-90% chance that the result is true |
≥ 2 out of 3 and ≤ 9 out of 10 |
Medium Likelihood |
33-66% chance that the result is true |
Between 1 and 2 out of 3 |
Unlikely |
10-33% chance that the result is true |
≤ 1 out of 3 and ≥ 1 out of 10 |
Very Unlikely |
1-10% chance that the result is true |
≤ 1 out of 10 and ≥ 1 out of 100 |
Exceptionally Unlikely |
Less than 1% chance the result is true |
≤ 1 out of 100 |
As an internationally accepted scale this would seem to be a useful tool if we were to utilise a probability scale in explaining risk and reward but, again, it has to be acknowledged that the interpretation of a phrase in scale between people would differ in individual circumstances.
Final Thoughts
Upon deep consideration, we have concluded that it would be best not to refer to such a scale when describing the risks involved when investing. There are three main reasons.
One commentator on Kent’s study, Richards J. Heuer Jr., asserted that the main issue with perceptions of probability ‘was not a major difference of opinion, but the ambiguity of the term probable.’ To illustrate, imagine two people, one colourblind and one not, both stating what is to them ‘almost certainly’ a true description of the colour red. Clearly there would be opposition, yet in this case neither party is necessarily wrong, rather we have a complication because of the situation in which the language is used.
Secondly, and looking to Figure 6, it must be accepted that ‘We Believe’ is not an indicator of probability, it only admits that an acceptance of an opinion that an outcome may occur. But shouldn’t we consider that to be the case for all qualitative measures of probability? Even the phrase ‘Almost Certainly’ admits the possibility that the outcome in question might not happen. All probabilistic phrases are subjective to an extent, some more so than others, because probability is essentially the product of chaotic forces. Kent concluded in 1964:
'What we consciously or subconsciously seek is an expression
which conveys a definite meaning but at the same time either
absolves us completely of the responsibility or makes the estimate
at enough removes from ourselves as not to implicate us.'
Just so, figures 7 and 8 show that more than 20% of respondents did not regard the two most obviously positive and negative phrases in our survey without some caution, with 21% selecting minimum percentages of or below 80% for ‘Almost Certainly’ and 23% marking maximum possible percentages above 20% for ‘Almost No Chance’.
The final reason is this: context is everything. To use an analogy, if you were to be given a scratch card, told that the chance of not winning a prize was ‘unlikely’ (which equates to one-in-five, according to our respondents), most people would take those odds. However, if you were to be passed a five-round revolver with one bullet in the cylinder, even though the chance of lethal injury is still ‘highly unlikely’, the effect of a negative outcome is amplified to the point that one-in-five does not seem quite so friendly a probability. In the same way, the loss of a one-thousand-pound investment would be viewed differently by an average earner and a millionaire.
With all this in mind, we feel that it could be arguably misleading to attempt a quantification of what is a subjective matter for investment purposes, since each phrase represents a different to thing to every individual and changes from situation to situation.
By most probability assessments, almost every structured investment offered on the market today has a ‘very good chance’ of maturing with a gain and is ‘highly unlikely’ to mature with a loss and such statements can be backed up as ‘clear, fair and not-misleading’, but we are concerned that such a statement would raise expectations to the extent that, if the unlikely outcomes transpired an investor making a loss could seek to argue they were misled. Clearly a defensible position but one that is avoidable.
There is no doubt that this has been an extremely interesting exercise and has provided valuable results. However, it has perhaps understandably proven to be too subjective a matter to be utilised for investment purposes without referring to a defined scale in every instance. As we have established, any such scale could be controversial as it would not be universally accepted, so we will not be introducing such a scale in the short term.