Simple example of pandas in action
Football crowds get excited when their team is on the attack only to have it break down and be awarded a corner. Sure you are still in possession, but statistically you are very unlikely to score. Something like a 3% chance. How many goals are scored with a Messi style free quick? And how long to we have to wait whilst the Ref footers about with shaving foam only to see the ball disappear into Row Z.
Football data is hard to get. Most data is proprietary and has to be paid for. Alan Morrison on (celticbynumbers.ipage.com) collects all his own data on Celtic by going through each match. I presume he has his own videos. His work is excellent and I thoroughly recommend checking out his site. The site (Modern fitba) have the benefit of ORTEC data. Again, well worth a visit.
I found a site: (http://football-data.co.uk) which collects data and gives it for free as a CSV or excel formats. They have an excel file which can be downloaded to analyse their data (for betting purposes) but what’s the fun in that? This is a python article so let’s use Pandas! Click on any links surrounded in () for further information and instruction!
The aim of this little article is to pick a team from the main SCO file, extract all the Home and Away games of said team, and write this data to a new excel file. Click on links that are enclosed in () for more explanation.
Pandas creates a Data Frame from the excel file which mirrors the Row and Column structure of an excel file. Our first step is to import the pandas modules. I switch between the spyder IDE supplied with (anaconda) and the (VSCode) IDE from Microsoft on a whim as both have different features I like and dislike! I recommend installing anaconda first as it gives access to hundreds of Python packages and modules and then VSCode setup in the Conda environment..
I have imported os for later use. Pandas is imported as pd to save time typing and to write an excel file, we import (ExcelWriter )from pandas. the first thing to do is to create a variable (df) which is a data Frame of the excel file. I should say spreadsheet file, since I use Libre Calc!
The variable, labels, is another use of pandas. It creates a variable which holds the column ‘labels’ of our DF by referencing [df.columns].
Most of the work done in this tutorial is done by python functions. they allow us to write blocks of code which can be easily re-used and are self-encapsulating. We want to get the data for just one Team from the original excel file. First of all we have to know which teams are in the file and get the name of the team we are interested in.
The two functions are intertwined, so to speak.
get_team_name() calls get_teams() to give us a list of teams to choose from.
get_teams() loops through all the df[‘HomeTeam’] column using the (range) and len functions, and adds an entry to the (set) type variable, teams. (Lines 43-44). get_team_name() calls this function and assigns it to the temporary variable x.
We then just loop through the variable x, using a (for loop) and print the Teams to the screen.
get_team_name() uses a (while loop) that allows the users to make an arse of selecting his team before he finally gets the spelling right.
The next function is :
This is a bit messy for my liking, but the data is split into Home and Away. Therefore, our Team’s away games are structured in a different way. Away games Away Goals are actually our Team’s goals etc. However, we are selecting two sets of data based on our Team being Home or Away and merging them into one Data Frame using (pd.concat). The TeamStats variable is then simply sorted by using (pd.sort_values) by date.
The temp variables select data from the main Data Frame based on our team selection and are then concatenated into one Data Frame, TeamStats, to give the frame we need.
The function, create_excel_file(), asks the user if he actually wants to create an output file for the data. If he answers anything but (‘Y’ or ‘y’), it finishes.
If he answers (‘Y’ or ‘y’) the the input file extention, ‘.xlsx’, being 5 characters long is used by write_output_file() to remove the last 5 characters from the input_file variable and add the Team name and the extension ‘.xlsx’. output_file = input_file[:-5]+Team + ‘.xlsx’. This just removes the last 5 chars from input_file and concatenates the ‘Team’ name and file extension. The Sheet Name in the file would be the name of your team.
An output file example name would then be SCOFinalRangers.xlsx.
An example of this file would be:
You can then footer about with spreadsheet formulas to your heart’s content – or use Python for more analysis!
I mentioned that we had imported (os) at the start of our programme. This module allows many Operating Sysytem commands to be run from within python. here we can use a simple ls in Linux or DIR in DOS to give the user a reminder of what file he has in his Home Directory to choose to analyse.
This little function uses (os.listdir) to list the directory and gives the user a hint as to what files are available.
I’ve covered a few topics in this article. I hope you have enjoyed it and I am sure that the code can be much improved by user comments! Please subscribe and follow for more.
You can find the code for this on my github. Click Me!