Join time-series data with relational data for deeper analysis

TimescaleDB is packaged as a PostgreSQL extension. As such, TimescaleDB is PostgreSQL with super-powers. You can do anything in TimescaleDB that you can in PostgreSQL, including joining tables and combining data for further analysis.

The Mile-High advantage

In professional football, it’s a well-understood phenomenon that teams that visit Denver’s Mile-High stadium are at a disadvantage because unlike the home team (the Denver Broncos), they are not accustomed to playing in high altitude.

Earlier we ingested stadium data. Now we can run a query to see the performance of players when they are playing at Mile High Stadium.

Like many of the queries in our analysis section, for this example you use the relational nature of this data. You join the tracking, player, and game tables to compare the average acceleration and yards run of individual players when they are performing in stadiums outside of Denver versus when they are playing within Denver. The columns avg_acc_den and avg_yards_den represent the acceleration and yard statistics while in Denver.

  1. WITH stat_vals AS (
  2. -- This table collects the summed yard and avg acceleration data of a player during one game
  3. SELECT a.player_id, displayname, SUM(yards) AS yards, AVG(acc) AS acc, team, gameid
  4. FROM player_yards_by_game a
  5. LEFT JOIN player p ON a.player_id = p.player_id
  6. GROUP BY a.player_id, displayname, gameid, team
  7. ), team_data AS (
  8. -- This table gets us the team information so that we can filter on teams
  9. SELECT a.player_id, acc, yards, a.gameid,
  10. CASE
  11. WHEN a.team = 'away' THEN g.visitor_team
  12. WHEN a.team = 'home' THEN g.home_team
  13. ELSE NULL
  14. END AS team_name,
  15. g.home_team
  16. FROM stat_vals a
  17. LEFT JOIN game g ON a.gameid = g.game_id
  18. ), avg_stats AS (
  19. -- This table takes the avg acceleration and yards run for players when they are not in denver
  20. -- and then when they are in denver
  21. SELECT p.player_id, p.displayname,
  22. AVG(acc) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_acc,
  23. AVG(acc) FILTER (WHERE team_name != 'DEN' AND home_team = 'DEN') AS avg_acc_den,
  24. AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_yards,
  25. AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team = 'DEN') AS avg_yards_den,
  26. COUNT(gameid) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS games,
  27. COUNT(gameid) FILTER (WHERE team_name != 'DEN' AND home_team ='DEN') AS games_den
  28. FROM team_data t
  29. LEFT JOIN player p ON t.player_id = p.player_id
  30. GROUP BY p.player_id, p.displayname
  31. )
  32. SELECT * FROM avg_stats
  33. WHERE avg_acc IS NOT NULL AND avg_acc_den IS NOT NULL
  34. ORDER BY avg_acc DESC, avg_acc_den DESC

You should see this:

player_iddisplaynameavg_accavg_acc_denavg_yardsavg_yards_dengamesgames_den
2552597Breshad Perriman2.322.06408.79461.0891
2560988Antonio Callaway2.272.30778.91741.06151
2556214Tyreek Hill2.272.191007.311004.26141
2558194Josh Reynolds2.26]2.40]527.80]529.16151
2543498Brandin Cooks2.262.26975.61875.90151

You can see that generally, it appears many players may have worse acceleration and average number of yards run per game while playing in Denver. However, it is good to note that you only have one sample point showing Denver averages which effects statistical significance.

Grass vs. turf, the eternal (football) question

Players often say they “feel” faster on artificial turf. How much faster are they in reality?

Using this query you join the tracking, stadium_info, game, and player tables, to extract the average acceleration that a player has while using turf verses grass. The column avg_acc_turf represents the players average acceleration while using artificial turf, and avg_acc_grass represents their average acceleration while on grass.

  1. WITH acceleration AS (
  2. -- This table gets the average acceleration for a player over each game
  3. SELECT a.player_id, avg(acc) AS acc, a.gameid, a.team
  4. FROM player_yards_by_game a
  5. GROUP BY a.player_id, a.gameid, a.team
  6. ), team_data AS (
  7. -- This table gets us the surface information so that we can filter on turf type
  8. SELECT a.player_id, acc, g.game_id, si."location", si.surface
  9. FROM acceleration a
  10. LEFT JOIN game g ON a.gameid = g.game_id
  11. LEFT JOIN stadium_info si on g.home_team = si.team_abbreviation
  12. ), avg_stats AS (
  13. -- This table takes the avg acceleration and yards run for players when they are not in denver
  14. -- and then when they are in denver
  15. SELECT p.player_id, p.displayname,
  16. AVG(acc) FILTER (WHERE surface LIKE '%Turf%') AS avg_acc_turf,
  17. AVG(acc) FILTER (WHERE surface NOT LIKE '%Turf%') AS avg_acc_grass
  18. FROM team_data t
  19. LEFT JOIN player p ON t.player_id = p.player_id
  20. GROUP BY p.player_id, p.displayname
  21. )
  22. SELECT * FROM avg_stats
  23. WHERE avg_acc_turf IS NOT NULL AND avg_acc_grass IS NOT NULL AND player_id IS NOT NULL
  24. ORDER BY avg_acc_turf DESC, avg_acc_grass DESC

You should see this:

player_iddisplaynameavg_acc_turfavg_acc_grass
2559066Gehrig Dieter3.252.29
2507948Frank Zombo2.932.30
2555173Eric Murray2.781.86
2552374Ameer Abdullah2.762.48
2552408Darren Waller2.692.83

For many players, it appears that they are indeed faster on artificial turf. This ‘feeling’ of increased speed may in fact be grounded in reality.

We’re going to overtime!

Sometimes it’s helpful to visualize time-series data in order to fully understand how a system is performing. The NFL dataset includes play-by-play data. It is possible to visualize this information so that you can see how a play unfolds.