Cricket's Birthday Problem

 
border birthday.jpg

by Arunabha Sengupta

It was one of those quarter of a million IPL matches, and a chance look at a graphic on a screen led a member of a cricket group to make the observation: “Sanju Samson and Robin Uthappa are both born on 11 November. Wonder how many times that has happened in cricket. (Of course we can ignore the Mark and Steve matches)”
What he meant was how common was it to have two players playing in the same game share a birthday. Of course, we were talking birthday and not the year. Samson was born in 1994, Uthappa in 1985.

It was pointed out that it was not really that uncommon. And given that two teams have 22 cricketers, the probability of there being one common pair of birthdays is just a shade under fifty. So, there is almost a 50% (47.58 % actually) of two players sharing a birthday in any match.

The formula is simple enough: Probability of n people having separate birthdays is:

prob.jpg
The birthday curve

The birthday curve

The graph of how this probability changes with n is given alongside.
For n=22, this is 47.58% and for n=23 it pushes beyond 50.

This, as always, proved to be thoroughly counterintuitive. Especially when told that a group of 23 people pushes the probability of a common birthday over ½. Yes, the odds lean favourably towards getting a common pair of birthdays among 23 people.
With 365 days in a year, this is really, really hard to believe that just 23 people will send the probability of a matching birthday past 50%.


In fact we soon heard the assertion: “We can see the formula, but it seems way off. I’m sure if we go through scorecards, we will have way less matches.”
Let me point out that this statement was not made by some archetypal cricket fan who finds ‘numbers don’t mean anything’ the moment it contradicts his perception of his chosen hero/villain on the field. This was uttered by a man who very frequently uses statistics to prove his point, often quite astutely.

Which just goes to show that patterns and probabilistic distribution of numbers is almost impossible to ascertain through intuition.

This led me to perform this experiment.
Time did not permit me to do it for all Test matches, but I picked up a random sample of 25 Tests across time, with due care not to choose more or less the same cohort of cricketers twice.
Why 25? It is generally considered a decent enough sample size to do away with the vagaries of small sample.
The matches and the instances of same DOB are provided below.

Birthday Experiment.jpg

 Here are the results.

Total Tests : 25
Tests with matching DOB: 12

The probability formula tells us that there should be 47.58% instances of same birthdays. (Concussions subs make it slightly more)
The real world experiment gives is 12/25 that is 48%.

What is more, the matches are distributed across teams as well.
Now let us consider the probability of matching birthdays in the same team. That is 11 players.

We have 25 Tests, which amounts to 50 XIs.
For nos of XIs where we have two people sharing birthdays  we have to ignore the birtdhays across teams. So, we accept Jadeja and Bumrah at Christchurch in 2019-20, but reject Darling and Jackson at Oval 1896.

Total XIs: 50
XIs with matching DOB: 7

The probability formula  tells us that there should be matching birthdays in 14.11% cases. The real world experiment gives us 7/50 = 14%.

The exercise proves quite beautifully that numbers not only make sense, they do so as elegantly and beautifully as a perfect David Gower cover drive. 
And for all those who think the numerical minded cannot appreciate the beauty of cricket, it is not really true. They can see beauty in numbers as much as they can see in cricket … a fascinating delight one refusing to look at data is often deprived of.

And it also points out the pitfalls of intuition.
Our assessment is always dependant on our perceptions, and they are shaped by a lot of things, impressionability, media, age at which experienced, likes, dislikes, mood etc.
Hence, we tend to have all sorts of interpretations of the world that are often not quite what goes on.
Especially so in cricket.
I fully believe no one is that invested in the distribution of birthdays. So, in the face of evidence, very few people will pick one match as a counterexample and  claim that actually it is very rare to have matching birthdays. (anecdotal evidence)
Similarly, very few will mouth the ‘numbers don’t give the full picture’ drivel. (cognitive dissonance)

However, when the same irrefutable numerical truth of cricket performances are demonstrated … perceptions and intuitions, fatally flawed in human beings, take root and the clamours of ‘numbers don’t mean anything’ are shouted from rooftops, and anecdotal evidence are furnished to the hilt.

Well…look at these facts and judge for yourself. Or rather, try checking yourself from taking recourse to the above two ‘lines of argument’
-  Larwood averaged 35 other than the Bodyline series
 - Harvey was rather ordinary against best sides
 -  both Lillee-Thomson and Hall-Griffith are hyped because of their performances against England and were ordinary against the others
 - Dravid averaged 33 against SA and 38 against Aus (the 2 best teams of his time across 54 Tests)
- Simon Katich had a better record as an opener than Gavaskar, Boycott, Sehwag and Langer.