Series of functions to help clean the play-by-play data for analysis
Source:R/cfbd_pbp_data.R
, R/helper_pbp_add_player_cols.R
, R/helper_pbp_add_yardage.R
, and 2 more
helpers_pbp.Rd
add_play_counts()
: functionAdds play counts to Play-by-Play data pulled from the API's raw game data.
add_yardage()
: functionAdd yardage extracted from play text.
add_player_cols()
: functionAdd player columns extracted from play text.
clean_drive_dat()
: functionCreate new Drive results and id data.
clean_pbp_dat()
: functionClean Play-by-Play data.
penalty_detection()
: functionAdds penalty columns to Play-by-Play data pulled from the API.
prep_epa_df_after()
: functionCreates the post-play inputs for the Expected Points model to predict on for each game.
clean_drive_info()
: functionCleans CFB (D-I) Drive-By-Drive Data to create
pts_drive
column.
Cleans Play-by-Play data pulled from the API's raw game data
Usage
add_play_counts(play_df)
clean_drive_dat(play_df)
prep_epa_df_after(dat)
clean_drive_info(drive_df)
add_player_cols(pbp)
add_yardage(play_df)
clean_pbp_dat(play_df)
penalty_detection(raw_df)
Arguments
- play_df
(data.frame required): Performs data cleansing on Play-by-Play DataFrame, as pulled from
cfbd_pbp_data()
- dat
(Data.Frame required) Clean Play-by-Play DataFrame pulled from
cfbd_pbp_dat()
- drive_df
(data.frame required) Drive dataframe pulled from API via the
cfbd_drives()
function- raw_df
(data.frame required): Performs data cleansing on Play-by-Play DataFrame, as pulled from
cfbd_pbp_data()
Value
The original play_df
with the following columns appended/redefined:
game_play_number
..
half_clock.minutes
..
TimeSecsRem
..
Under_two
..
half
..
kickoff_play
..
pos_team
..
def_pos_team
..
receives_2H_kickoff
..
pos_score_diff
..
lag_pos_score_diff
..
lag_pos_team
..
lead_pos_team
..
lead_pos_team2
..
pos_score_pts
..
pos_score_diff_start
..
score_diff
..
lag_score_diff
..
lag_offense_play
..
lead_offense_play
..
lead_offense_play2
..
score_pts
..
score_diff_start
..
offense_receives_2H_kickoff
..
half_play_number
..
lag_off_timeouts
..
lag_def_timeouts
..
off_timeouts_rem_before
..
def_timeouts_rem_before
..
off_timeout_called
..
def_timeout_called
..
lead_TimeSecsRem
..
lead_TimeSecsRem2
..
lead_yards_to_goal
..
lead_yards_to_goal2
..
lead_down
..
lead_down2
..
lag_distance3
..
lag_distance2
..
lag_distance
..
lead_distance
..
lead_distance2
..
end_of_half
..
lag_play_type3
..
lag_play_type2
..
lag_play_type
..
lead_play_type
..
lead_play_type2
..
lead_play_type3
..
change_of_poss
..
change_of_pos_team
..
pos_team_timeouts
..
def_pos_team_timeouts
..
pos_team_timeouts_rem_before
..
def_pos_team_timeouts_rem_before
..
The original play_df
with the following columns appended/redefined:
lag_change_of_poss
..
lag_punt
..
lag_scoring_play
..
lag_turnover_vec
..
lag_downs_turnover
..
lead_play_type
..
lead_play_type2
..
lead_play_type3
..
drive_numbers
..
number_of_drives
..
pts_scored
..
drive_result_detailed
..
drive_result_detailed_flag
..
drive_result2
..
lag_new_drive_pts
..
lag_drive_result_detailed
..
lead_drive_result_detailed
..
new_drive_pts
..
drive_scoring
..
drive_play
..
drive_play_number
..
drive_event
..
drive_event_number
..
new_id
..
log_ydstogo
..
down
..
distance
..
yards_to_goal
..
yards_gained
..
Goal_To_Go
..
dat
with the following columns appended/modified:
turnover_indicator
..
down
..
new_id
..
new_down
..
distance
..
yards_to_goal
..
yards_gained
..
turnover
..
drive_start_yards_to_goal
..
end_of_half
..
new_yardline
..
new_distance
..
new_log_ydstogo
..
new_Goal_To_Go
..
new_TimeSecsRem
..
new_Under_two
..
first_by_penalty
..
lag_first_by_penalty
..
lag_first_by_penalty2
..
first_by_yards
..
lag_first_by_yards
..
lag_first_by_yards2
..
row
..
new_series
..
firstD_by_kickoff
..
firstD_by_poss
..
firstD_by_yards
..
firstD_by_penalty
..
yds_punted
..
yds_punt_gained
..
missing_yard_flag
..
The original drive_df
with the following columns appended to it:
drive_id
: Returned asdrive_id
from original variabledrive_id
.
pts_drive
: End result of the drive.
scoring
: Logical flag for if drive was a scoring drive updated.
The original pbp
with the following columns appended to it:
rusher_player_name
.
receiver_player_name
.
passer_player_name
.
sack_player_name
.
sack_player_name2
.
pass_breakup_player_name
.
interception_player_name
.
fg_kicker_player_name
.
fg_block_player_name
.
fg_return_player_name
.
kickoff_player_name
.
kickoff_returner_player_name
.
punter_player_name
.
punt_block_player_name
.
punt_returner_player_name
.
punt_block_return_player_name
.
fumble_player_name
.
fumble_forced_player_name
.
fumble_recovered_player_name
.
The original play_df
with the following columns appended to it:
yds_rushed
.
yds_receiving
.
yds_int_return
.
yds_kickoff
.
yds_kickoff_return
.
yds_punted
.
yds_fumble_return
.
yds_sacked
.
yds_penalty
.
The original play_df
with the following columns appended/redefined:
- scoring_play
.
- td_play
.
- touchdown
.
- safety
.
- fumble_vec
.
- kickoff_play
.
- kickoff_tb
.
- kickoff_onside
.
- kickoff_oob
.
- kickoff_fair_catch
.
- kickoff_downed
.
- kick_play
.
- kickoff_safety
.
- punt
.
- punt_play
.
- punt_tb
.
- punt_oob
.
- punt_fair_catch
.
- punt_downed
.
- rush
.
- pass
.
- sack_vec
.
- play_type
.
- td_check
.
- id_play
.
- sack
.
- int
.
- int_td
.
- completion
.
- pass_attempt
.
- target
.
- pass_td
.
- rush_td
.
- turnover_vec
.
- offense_score_play
.
- defense_score_play
.
- downs_turnover
.
- scoring_play
.
- fg_inds
.
- yds_fg
.
- yards_to_goal
.
- lag_play_type3
.
- lag_play_type2
.
- lag_play_type
.
- lead_play_type
.
- lead_play_type2
.
- lead_play_type3
.
The original raw_df
with the following columns appended/redefined:
penalty_flag
: TRUE/FALSE flag for penalty play types or penalty in play text plays..
penalty_declined
: TRUE/FALSE flag for 'declined' in penalty play types or penalty in play text plays..
penalty_no_play
: TRUE/FALSE flag for 'no play' in penalty play types or penalty in play text plays..
penalty_offset
: TRUE/FALSE flag for 'off-setting' in penalty play types or penalty in play text plays..
penalty_1st_conv
: TRUE/FALSE flag for 1st Down in penalty play types or penalty in play text plays..
penalty_text
: TRUE/FALSE flag for penalty in text but not a penalty play type..
orig_play_type
: Copy of original play_type label prior to any changes by the proceeding functions.
down
: Defines kickoff downs and penalties on kickoffs and converts them from 5 (as from the API) to 1..
play_type
: Definesplay_type
, "Penalty (Kickoff)", penalties on kickoffs with a repeat kick..
half
: Defines the half variable (1, 2)..
Details
Requires the following columns to be present
game_id
.
id_play
.
clock.minutes
.
clock.seconds
.
half
.
period
.
offense_play
.
defense_play
.
home
.
away
.
offense_score
.
defense_score
.
offense_timeouts
.
defense_timeouts
.
play_text
.
play_type
.
Prep for EPA calculations at the end of the play. Requires the following columns be present:
game_id
..
id_play
..
drive_id
..
down
..
distance
..
period
..
yards_to_goal
..
play_type
..
Cleans CFB (D-I) Drive-By-Drive Data to create pts_drive
column. Requires the following columns be present:
drive_id
: Returned asdrive_id
.
drive_result
: End result of the drive.
scoring
: Logical flag for if drive was a scoring drive.
game_id
: Unique game identifier.
Cleans CFB (D-I) player Data to create player name columns. Requires the following columns be present:
rush
.pass
.play_text
.play_type
.sack
.fumble_vec
.
Cleans CFB (D-I) Drive-By-Drive Data to create yardage column. Requires the following columns be present:
play_text
.
play_type
.
rush
.
pass
.
int
.
int_td
.
kickoff_play
.
kickoff_tb
.
kickoff_downed
.
kickoff_fair_catch
.
fumble_vec
.
sack
.
punt
.
punt_tb
.
punt_downed
.
punt_fair_catch
.
punt_oob
.
punt_blocked
.
penalty_detail
.
Requires the following columns to be present
- game_id
.
- id_play
.
- offense_play
.
- defense_play
.
- home
.
- away
.
- play_type
.
- play_text
.
- kickoff_play
.
- down
.
- distance
.
- yards_gained
.
- yards_to_goal
.
- change_of_poss
.
- penalty_1st_conv
.
- off_timeouts_rem_before
.
- def_timeouts_rem_before
.
Runs penalty detection on the play text and play types. Requires the following columns be present:
game_id
Referencing game id.period
Game period (quarter).down
Down of the play.play_type
Categorical play type.play_text
A description of the play.