
Series of functions to help clean the play-by-play data for analysis
Source:R/cfbd_pbp_data.R, R/helper_pbp_add_player_cols.R, R/helper_pbp_add_yardage.R, and 2 more
helpers_pbp.Rdadd_play_counts(): functionAdds play counts to Play-by-Play data pulled from the API's raw game data.
add_yardage(): functionAdd yardage extracted from play text.
add_player_cols(): functionAdd player columns extracted from play text.
clean_drive_dat(): functionCreate new Drive results and id data.
clean_pbp_dat(): functionClean Play-by-Play data.
penalty_detection(): functionAdds penalty columns to Play-by-Play data pulled from the API.
prep_epa_df_after(): functionCreates the post-play inputs for the Expected Points model to predict on for each game.
clean_drive_info(): functionCleans CFB (D-I) Drive-By-Drive Data to create
pts_drivecolumn.
Cleans Play-by-Play data pulled from the API's raw game data
Usage
add_play_counts(play_df)
clean_drive_dat(play_df)
prep_epa_df_after(dat)
clean_drive_info(drive_df)
add_player_cols(pbp)
add_yardage(play_df)
clean_pbp_dat(play_df)
penalty_detection(raw_df)Arguments
- play_df
(data.frame required): Performs data cleansing on Play-by-Play DataFrame, as pulled from
cfbd_pbp_data()- dat
(Data.Frame required) Clean Play-by-Play DataFrame pulled from
cfbd_pbp_dat()- drive_df
(data.frame required) Drive dataframe pulled from API via the
cfbd_drives()function- raw_df
(data.frame required): Performs data cleansing on Play-by-Play DataFrame, as pulled from
cfbd_pbp_data()
Value
The original play_df with the following columns appended/redefined:
game_play_number..
half_clock_minutes..
TimeSecsRem..
Under_two..
half..
kickoff_play..
pos_team..
def_pos_team..
receives_2H_kickoff..
pos_score_diff..
lag_pos_score_diff..
lag_pos_team..
lead_pos_team..
lead_pos_team2..
pos_score_pts..
pos_score_diff_start..
score_diff..
lag_score_diff..
lag_offense_play..
lead_offense_play..
lead_offense_play2..
score_pts..
score_diff_start..
offense_receives_2H_kickoff..
half_play_number..
lag_off_timeouts..
lag_def_timeouts..
off_timeouts_rem_before..
def_timeouts_rem_before..
off_timeout_called..
def_timeout_called..
lead_TimeSecsRem..
lead_TimeSecsRem2..
lead_yards_to_goal..
lead_yards_to_goal2..
lead_down..
lead_down2..
lag_distance3..
lag_distance2..
lag_distance..
lead_distance..
lead_distance2..
end_of_half..
lag_play_type3..
lag_play_type2..
lag_play_type..
lead_play_type..
lead_play_type2..
lead_play_type3..
change_of_poss..
change_of_pos_team..
pos_team_timeouts..
def_pos_team_timeouts..
pos_team_timeouts_rem_before..
def_pos_team_timeouts_rem_before..
The original play_df with the following columns appended/redefined:
lag_change_of_poss..
lag_punt..
lag_scoring_play..
lag_turnover_vec..
lag_downs_turnover..
lead_play_type..
lead_play_type2..
lead_play_type3..
drive_numbers..
number_of_drives..
pts_scored..
drive_result_detailed..
drive_result_detailed_flag..
drive_result2..
lag_new_drive_pts..
lag_drive_result_detailed..
lead_drive_result_detailed..
new_drive_pts..
drive_scoring..
drive_play..
drive_play_number..
drive_event..
drive_event_number..
new_id..
log_ydstogo..
down..
distance..
yards_to_goal..
yards_gained..
Goal_To_Go..
dat with the following columns appended/modified:
turnover_indicator..
down..
new_id..
new_down..
distance..
yards_to_goal..
yards_gained..
turnover..
drive_start_yards_to_goal..
end_of_half..
new_yardline..
new_distance..
new_log_ydstogo..
new_Goal_To_Go..
new_TimeSecsRem..
new_Under_two..
first_by_penalty..
lag_first_by_penalty..
lag_first_by_penalty2..
first_by_yards..
lag_first_by_yards..
lag_first_by_yards2..
row..
new_series..
firstD_by_kickoff..
firstD_by_poss..
firstD_by_yards..
firstD_by_penalty..
yds_punted..
yds_punt_gained..
missing_yard_flag..
The original drive_df with the following columns appended to it:
drive_id: Returned asdrive_idfrom original variabledrive_id.
pts_drive: End result of the drive.
scoring: Logical flag for if drive was a scoring drive updated.
The original pbp with the following columns appended to it:
rusher_player_name.
receiver_player_name.
passer_player_name.
sack_player_name.
sack_player_name2.
pass_breakup_player_name.
interception_player_name.
fg_kicker_player_name.
fg_block_player_name.
fg_return_player_name.
kickoff_player_name.
kickoff_returner_player_name.
punter_player_name.
punt_block_player_name.
punt_returner_player_name.
punt_block_return_player_name.
fumble_player_name.
fumble_forced_player_name.
fumble_recovered_player_name.
The original play_df with the following columns appended to it:
yds_rushed.
yds_receiving.
yds_int_return.
yds_kickoff.
yds_kickoff_return.
yds_punted.
yds_fumble_return.
yds_sacked.
yds_penalty.
The original play_df with the following columns appended/redefined:
- scoring_play
.
- td_play
.
- touchdown
.
- safety
.
- fumble_vec
.
- kickoff_play
.
- kickoff_tb
.
- kickoff_onside
.
- kickoff_oob
.
- kickoff_fair_catch
.
- kickoff_downed
.
- kick_play
.
- kickoff_safety
.
- punt
.
- punt_play
.
- punt_tb
.
- punt_oob
.
- punt_fair_catch
.
- punt_downed
.
- rush
.
- pass
.
- sack_vec
.
- play_type
.
- td_check
.
- id_play
.
- sack
.
- int
.
- int_td
.
- completion
.
- pass_attempt
.
- target
.
- pass_td
.
- rush_td
.
- turnover_vec
.
- offense_score_play
.
- defense_score_play
.
- downs_turnover
.
- scoring_play
.
- fg_inds
.
- yds_fg
.
- yards_to_goal
.
- lag_play_type3
.
- lag_play_type2
.
- lag_play_type
.
- lead_play_type
.
- lead_play_type2
.
- lead_play_type3
.
The original raw_df with the following columns appended/redefined:
penalty_flag: TRUE/FALSE flag for penalty play types or penalty in play text plays..
penalty_declined: TRUE/FALSE flag for 'declined' in penalty play types or penalty in play text plays..
penalty_no_play: TRUE/FALSE flag for 'no play' in penalty play types or penalty in play text plays..
penalty_offset: TRUE/FALSE flag for 'off-setting' in penalty play types or penalty in play text plays..
penalty_1st_conv: TRUE/FALSE flag for 1st Down in penalty play types or penalty in play text plays..
penalty_text: TRUE/FALSE flag for penalty in text but not a penalty play type..
orig_play_type: Copy of original play_type label prior to any changes by the proceeding functions.
down: Defines kickoff downs and penalties on kickoffs and converts them from 5 (as from the API) to 1..
play_type: Definesplay_type, "Penalty (Kickoff)", penalties on kickoffs with a repeat kick..
half: Defines the half variable (1, 2)..
Details
Requires the following columns to be present
game_id.
id_play.
clock_minutes.
clock_seconds.
half.
period.
offense_play.
defense_play.
home.
away.
offense_score.
defense_score.
offense_timeouts.
defense_timeouts.
play_text.
play_type.
Prep for EPA calculations at the end of the play. Requires the following columns be present:
game_id..
id_play..
drive_id..
down..
distance..
period..
yards_to_goal..
play_type..
Cleans CFB (D-I) Drive-By-Drive Data to create pts_drive column. Requires the following columns be present:
drive_id: Returned asdrive_id.
drive_result: End result of the drive.
scoring: Logical flag for if drive was a scoring drive.
game_id: Unique game identifier.
Cleans CFB (D-I) player Data to create player name columns. Requires the following columns be present:
rush.
pass.
play_text.
play_type.
sack.
fumble_vec.
Cleans CFB (D-I) Drive-By-Drive Data to create yardage column. Requires the following columns be present:
play_text.
play_type.
rush.
pass.
int.
int_td.
kickoff_play.
kickoff_tb.
kickoff_downed.
kickoff_fair_catch.
fumble_vec.
sack.
punt.
punt_tb.
punt_downed.
punt_fair_catch.
punt_oob.
punt_blocked.
penalty_detail.
Requires the following columns to be present
- game_id
.
- id_play
.
- offense_play
.
- defense_play
.
- home
.
- away
.
- play_type
.
- play_text
.
- kickoff_play
.
- down
.
- distance
.
- yards_gained
.
- yards_to_goal
.
- change_of_poss
.
- penalty_1st_conv
.
- off_timeouts_rem_before
.
- def_timeouts_rem_before
.
Runs penalty detection on the play text and play types. Requires the following columns be present:
game_idReferencing game id.
periodGame period (quarter).
downDown of the play.
play_typeCategorical play type.
play_textA description of the play.