LAEGAP MethodologyAugust 28, 2013 5 minute read
This piece is dedicated to detailing my specific methodology for the calculation of LAEGAP. If you hate math or would just like a basic explanation, head over to my original post, which also includes some of my insight of the resulting data. All data is from the 2012-13 season.
Fixing Recorder Bias
NHL has had an infamous lack of consistency in its statistics across arenas that is almost embarrassing compared to other professional sports leagues in North America. Things like hits, takeaways, giveaways, blocked shots, and shot location are all recorded differently in different arenas. Before we use shot location data provided by the NHL, we will first attempt to fix the inconsistencies.
NHL records shot location data on a (x, y) plane. The X axis spans from -99 to 99 and the Y axis spans from -42 to 42. A standard NHL arena is 200 feet by 85 feet, so each X value resembles slightly less than 1 foot and each Y value resembles exactly 1 foot.
Past studies have attempted to fix the recording bias by using distance as a reference, or simply ignored it and attempted to ease it by regressing each point locally. Since shot location data is not recorded by distance and angle from the net (1st study), but rather by x, y values, measuring bias of positive x and y values, and negative x and y values is much better than measuring bias of distance and angle from the net. We will be using regression to improve our data but we won’t be relying on it to fix the recording bias.
I measured each team’s average positive and negative x, y values on away games with no more than two games’ data recorded in the same arena. I choice to limit the amount of games to avoid any variation that would be caused by having division rivals that have a heavy bias, which would in turn have a possible effect on the team average.
I then recorded each arena’s average positive and negative x, y values for its visiting teams. This is used to compare with its expected coordinates average, which is the average team coordinates multiplied by their number of visits in proportion, to arrive at each arena’s coordinates bias. Each arenas coordinates bias will later be used to adjust the data later on.
|Los Angeles Kings||1.47619||-1.52381||0||0.714286||-1.19223|
|New York Islanders||-4.31579||3.36842||2.47368||-1.94737||4.23246|
|Tampa Bay Lightning||1.52632||-0.736842||-4.84211||-0.105263||-2.96685|
|St. Louis Blues||-3.09091||1.31818||0.954545||-0.227273||2.14256|
|New York Rangers||-4||2||4.55||-1.7||4.62993|
|Toronto Maple Leafs||2.31579||-1.36842||-0.315789||0.0526316||-1.49276|
|Columbus Blue Jackets||-0.25||-1.6||0.5||3.7||-1.11734|
|New Jersey Devils||-1.18182||0.227273||-0.318182||1.04545||0.176491|
|Detroit Red Wings||1.2||-2.4||2.55||-0.45||0.103993|
|San Jose Sharks||3.59091||0.272727||-0.227273||0.590909||-1.83297|
As expected, MSG, notorious for its inaccuracy, records each shot on average 4.6 feet closer to the net than it is. Meanwhile, Consol Energy Center does the best job in recording shot location data accurately in the league, according to the aggregate average of the entire eastern conference and some of the western conference.
Location Specific Shooting Percentage
With the 634 games of data available from NHL, every even strength, non-empty net shot and goal is recorded and its shooting percentage (goals/(shots+goals)) is calculated and its location adjusted for arena bias. All of this data’s x, y value is flipped to only be in the east half of the rink. There are some occasional shots that are taken from a player’s own half that might be transformed when it wasn’t suppose to, and vice versa, but as it is almost entirely reliant on luck to score from so far I consider these few data points insignificant . Because there are not a lot of data on low scoring areas, the shooting percentage is varies greatly from point to point at those locations. For example, at (30, 9), five feet inside the blue line, the shot percentage is 50% (1 / (1+1)). It’s neighbor, (30, 11) has a shot percentage of 0% (0/(0+1)). Clearly, the 50% shooting percentage is not sustainable. To fix this, I regressed each point with its neighbors in a radius of 5 feet from the original point with an exponential weighting function.
w = weight; d = distance
This means that each point further away will have a weight of 75% of the point that is 1 feet closer to the original. Here is the heat map of the shooting percentage distributed in one half of the rink, with red being the highest shooting percentage and blue being the lowest. Missing squares indicates that no data is available at that location.
Expected Goals For/Against
I created Location Adjusted Expected GoAls Percentage (LAEGAP) to improve Corsi’s lack of account for shot quality. Corsi was invented to improve +/-‘s lack of sample size. It is natural that LAEGAP will resemble a form of +/-, essentially what EGF/EGA is.
Expected Goals For is an approximation of the number of goals scored by a players team that the player will be on the ice for, if all shots taken in the respective locations have a shot percentage (chance of scoring) of the league average. Expected Goals against is an approximation of the number of goals scored by the opponent team that a player will be on the ice for, under the same conditions. Essentially, it is a metric of the amount of shots multiplied by its quality.
Now that we have EGF, a measure of on-ice offensive events, and EGA, a measure of on-ice defensive, or lack of therefore, events, we can calculate the difference, because, ultimately, hockey is a game you win by scoring more than your opponent. Simply subtracting EGA from EGF is not good enough. This is because it gives high event players an advantage, or disadvantage, depending if they were a positive possession player. For example, if our imaginary player, John, was on ice during a shot for, at a 10% shooting percentage location, and a shot against, at a 5% shooting percentage location, every shift, for 100 shifts, he would have a EGF of 10, and a EGA of 5, a difference of 5 goals. Now image a 2nd player, Ethan, who had the same percentages every shift, but had 1000 shifts. His expected goals difference/expected +/- would be 50 goals, even though he and John both performed equally, possession wise, every time they were on-ice.
The solution is simple. Calculate the percentage of EGF in the total expected goals events (EGF + EGA). Now, both John and Ethan have the same EG%: 66.7% (10/(10+5) = 100/(100+50)). This raises another problem. What if a 3rd imaginary player, Jacob, had one shift that logged a shot for at 5% before he suffered a season ending pinky toe strain? He would have a EG% of 100% (5/(5+0)). Clearly that’s not sustainable once the sample size increases (more shifts), so how do we differentiate small sample size error margins and actual performance?
We will utilize a variation of binomial proportion confidence interval, Wilson score interval, invented by mathematician Edwin Bidwell Wilson in 1927. LAEGAP is the lowest value Wilson score interval with n being total EG events (EGA+EGF) and p being the proportion of EGF over n and a confidence of 95%.
By taking the lowest value of the interval players low-event but positive possession (>0.50 EG%) players will receive a penalty and likewise, high-event but negative possession players will receive a advantage, but this is more preferable to the contrasting solution.
Below is the top 30 players with the best LAEGAP:
|name||position||team||games||LAEGAP||EGF||EGA||shots for||shots against||shot diff|
|Dan Boyle||Defenseman||San Jose Sharks||41||0.493526||37.9177||23.3686||437||312||125|
|Brendan Gallagher||Right Wing||Montr̩al Canadiens||41||0.492882||27.3263||15.1572||311||186||125|
|David Clarkson||Right Wing||New Jersey Devils||43||0.491641||29.4998||16.9234||340||209||131|
|Jonathan Toews||Center||Chicago Blackhawks||43||0.487849||35.4267||21.9762||388||272||116|
|Jake Muzzin||Defenseman||Los Angeles Kings||39||0.485027||28.0701||16.3438||356||200||156|
|Max Pacioretty||Left Wing||Montr̩al Canadiens||41||0.484933||27.5553||15.9449||336||203||133|
|Marian Hossa||Right Wing||Chicago Blackhawks||36||0.475947||27.4272||16.5591||294||207||87|
|Joe Thornton||Center||San Jose Sharks||43||0.472452||30.4976||19.4209||332||248||84|
|Patrick Marleau||Left Wing||San Jose Sharks||43||0.472194||30.7604||19.6678||360||248||112|
|Brandon Saad||Left Wing||Chicago Blackhawks||42||0.468371||30.411||19.7317||343||246||97|
|Lubomir Visnovsky||Defenseman||New York Islanders||29||0.465838||27.1036||17.1213||314||205||109|
|Patrik Elias||Left Wing||New Jersey Devils||43||0.465605||27.2111||17.2328||298||210||88|
|Justin Williams||Right Wing||Los Angeles Kings||42||0.463848||29.9752||19.7833||373||241||132|
|Tyler Seguin||Center||Boston Bruins||42||0.462289||31.7253||21.4821||396||275||121|
|Zach Parise||Left Wing||Minnesota Wild||43||0.461388||31.1538||21.0652||377||281||96|
|Logan Couture||Center||San Jose Sharks||43||0.459966||29.7073||19.9178||339||254||85|
|Mark Fayne||Defenseman||New Jersey Devils||27||0.459596||17.7727||9.748||193||136||57|
|Evgeni Malkin||Center||Pittsburgh Penguins||26||0.459346||21.0059||12.4287||239||159||80|
|Mikko Koivu||Center||Minnesota Wild||43||0.456033||30.4973||21.0142||375||281||94|
|Henrik Sedin||Center||Vancouver Canucks||40||0.452746||29.5615||20.4875||350||268||82|
|Ryan Getzlaf||Center||Anaheim Ducks||38||0.450954||28.5721||19.7529||306||234||72|
|Anton Stralman||Defenseman||New York Rangers||41||0.45036||30.8343||21.9066||375||274||101|
|Marc-Edouard Vlasic||Defenseman||San Jose Sharks||43||0.447783||33.4906||24.6981||395||327||68|
|Patrice Bergeron||Center||Boston Bruins||36||0.446412||25.8487||17.658||346||228||118|
|Andy Greene||Defenseman||New Jersey Devils||43||0.445498||28.1978||19.9277||344||275||69|
|Matt Irwin||Defenseman||San Jose Sharks||35||0.444544||27.63||19.4851||337||251||86|
|Ryan McDonagh||Defenseman||New York Rangers||41||0.444263||37.3756||28.9067||475||369||106|
|Derek Stepan||Center||New York Rangers||41||0.442697||31.42||23.2883||377||267||110|
|Alexandre Burrows||Right Wing||Vancouver Canucks||40||0.441986||25.1964||17.432||298||244||54|
|P.K. Subban||Defenseman||Montr̩al Canadiens||40||0.441122||28.8511||20.9794||360||259||101|
|EGF||Expected Goals For|
|EGA||Expected Goals Against|
|Shots For||Number of shots on goal for player's team while player is on ice|
|Shots Against||Number of shots on goal against player's team while player is on ice|
|Shots Diff||Shot Differential = Shots For – Shots Against. Pretty much same as Corsi except missed shots don't count.|
|LAEGAP||Location Adjusted Expected Goals Percentage|
For the record, Crosby is 37th.