WorldCup T20 Cricket – Pakistan Vs India – How Data Science Can Help?
Have you ever heard of Money Ball? Or Nate Silver who runs FiveThirtyEight? I was wondering if I can do something similar for Cricket, what Tim Chartier has done for baseball. How can data science help us understand upcoming series, matches, opponent teams, players, grounds, home-vs-foreign crowd/ground, weather and its impact and most importantly, which player we should include against a particular team to increase our chances of winning.
I went to sleep last night while thinking about all these mind boggling questions and woke up in the middle of the night in a quest to at least answer a few. Thanks to ESPN CricInfo API and R Packages like Cricketr and yorkR, where all the data is readily available and much thanks to my wonderful team at PredictifyMe, for helping me dig the data and visualize it for this article.
Like every true Pakistani, while my heart hopes and foolishly wants to believe that Pakistan will and should win against India, sadly, data sciences narrates a completely different story. India is 6 times more likely to win today’s match. Whoever gets to bat first, will have twice the probability of winning especially if they can score more than 200, their chance to win goes up to almost 93%. If the weather (with 30% chances of rain) forces the match to get cancelled, both the teams will get 1 point and India team will go out of the tournament as they have already lost their first match against New Zealand.
Pakistan has played 96 T20 matches so far and won 57 (59.3%). Pakistan has played 7 matches against India in T20 and won only one (14.2%). Although, Pakistan has played 4 ODI matches at Eden Gardens in Kolkatta and won all four, this is the first time Pakistan is playing a T20 against India on this ground. Figure 1 shows the details of scores of all seven T20 encounters between India and Pakistan. Pakistan has only won once on December 25th 2012.
Figure 1: T20 History – Pakistan Vs. India
The teams from both sides are as follows (players with asterisk have never played against the opponent in T20)
Pakistan: Shahid Afridi (c), Ahmed Shehzad, Anwar Ali*, Imad Wasim*, Khalid Latif*, Mohammad Amir, Mohammad Hafeez, Mohammad Irfan, Mohammad Nawaz, Sarfraz Ahmed (wk), Mohammad Sami, Sharjeel Khan, Sohaib Malik, Umer Akmal, and Wahab Riaz
India: MS Dhoni (c/wk), Ravishandran Ashwin, Jasprit Bumrah, Shikhar Dhawan, Harbhajan Singh, Ravindra Jadeja, Virat Kohli, Mohammed Shami,Pawan Negi*, Ashish Nehra, Hardik Pandya, Ajinkya Rahane, Suresh Raina, Rohit Sharma, and Yuvraj Singh
Strike rate matters when it comes to the short format game of T20. Figure 2 shows the strike of key players in both teams. Pakistan has a better strike rate than India collectively. Muhammad Hafeez and Sharjeel Khan are better than RG Sharma and MS Dhoni.
Figure 2: Strike Rate by batsmen
On the contrary, when it comes to number of 4s and 6s, there is no match with Kohli and Yuvraj. Figure 3 presents the charge.
Figure 3: 4s and 6s Strikes
On number of 50s, Hafeez has scored it twice against India, while Shoaib Malik, Kohli and Yuvraj scored it once each.
Figure 4 presents the economy rate of bowlers for the opponent. India has a far better bowling lineup.
Figure 4. IND vs. PAK Bowling Economy Rate
Figure 5 presents the bowlers’ performance vis-à-vis ‘runs’ they have scored in T20. Since India has a better bowling performance, they might opt for field-first in case they win the toss.
Figure 5. Runs vs. Wickets
Following four figures show the runs and strike rate for all key players from both teams.
Figure 6: Team Pakistan Performance on Runs
Figure 7: Team Pakistan Striking Capability
Figure 8: Team India Performance on Runs
Figure 9: Team India Striking Capability
Figure 10 shows the players’ performance on the basis of number of position they sent to. Both Afridi and Dhoni perform best when sent on position six.
Figure 10: Pakistani Player’s Performance on Position
Figure 11: Indian Player’s Performance on Position
Following charts shows the number of runs and strike rate for key players from both teams:
Figure 12: Pakistan – Individuals’ Performances on Runs
Figure 13: Pakistan – Individuals’ Performances on Strike Rate
Figure 14: India – Individuals’ Performances on Runs
Figure 15: India – Individuals’ Performances on Strike Rate
Next four charts present the bowlers performances from both teams on wickets taken and economy rate:
Figure 16: Pakistan – Individuals’ Performances on Wickets
Figure 17: Pakistan – Individuals’ Performances on Economy Rate
Figure 18: India – Individuals’ Performances on Wickets Rate
Figure 19: India – Individuals’ Performances on Economy Rate
Going beyond exploratory data analysis, we wanted to come up with a scoring method to be able to quantify team’s strength and hence their chances of winning the match. Highest performing players are used as the base highest score and everyone else is scored relatively. We have used score for percentage of runs through boundaries, strike rate, average score of a player in all matches, opponent score (performance against India or Pakistan), present form (performance in last 1 year), home vs. foreign ground score, frequency of loss/win, dismissal score (how long a player can stay in the ground), position score (performance based on position of the player), and innings score (performance in a given innings). E.g. Imad has the highest score in Pakistan while Dhoni has the best score. We were also able to compute players’ performance against each bowler from both sides and have also included the difference in performance in first and last 10 overs.
Based on the data we see and analyze, we predict that India has 6 times more chances to win. We don’t see that any team can bowl out the other team. We project the total combined score for the day would be 350 to 390 by both teams. We also expect to see at least one 50 from each side. We also predict sixteen or more 4s and eight or more 6s in the day. After running this short exercise, we believe that data science can help this sport tremendously and both ICB and PCB should take advantage of technologies and data that is readily available today and make far better and smarter decisions for the collective good.
Having said this, our heart and prayers still go out for Team Green. As a nation we believe in miracles so all the best dear team; let us all prove that national vigor, so many heart felt prayers and the Boom Boom magic can beat any science; data and its likes. Fingers crossed* (That will be an anomaly, by the way – but we love anomalies – bring it on please).
Reproduced with permission from LinkedIn.