Skip to contents
add_play_counts(): function

Adds play counts to Play-by-Play data pulled from the API's raw game data.

add_yardage(): function

Add yardage extracted from play text.

add_player_cols(): function

Add player columns extracted from play text.

clean_drive_dat(): function

Create new Drive results and id data.

clean_pbp_dat(): function

Clean Play-by-Play data.

penalty_detection(): function

Adds penalty columns to Play-by-Play data pulled from the API.

prep_epa_df_after(): function

Creates the post-play inputs for the Expected Points model to predict on for each game.

clean_drive_info(): function

Cleans CFB (D-I) Drive-By-Drive Data to create pts_drive column.

Cleans Play-by-Play data pulled from the API's raw game data

Usage

add_play_counts(play_df)

clean_drive_dat(play_df)

prep_epa_df_after(dat)

clean_drive_info(drive_df)

add_player_cols(pbp)

add_yardage(play_df)

clean_pbp_dat(play_df)

penalty_detection(raw_df)

Arguments

play_df

(data.frame required): Performs data cleansing on Play-by-Play DataFrame, as pulled from cfbd_pbp_data()

dat

(Data.Frame required) Clean Play-by-Play DataFrame pulled from cfbd_pbp_dat()

drive_df

(data.frame required) Drive dataframe pulled from API via the cfbd_drives() function

raw_df

(data.frame required): Performs data cleansing on Play-by-Play DataFrame, as pulled from cfbd_pbp_data()

Value

The original play_df with the following columns appended/redefined:

game_play_number.

.

half_clock.minutes.

.

TimeSecsRem.

.

Under_two.

.

half.

.

kickoff_play.

.

pos_team.

.

def_pos_team.

.

receives_2H_kickoff.

.

pos_score_diff.

.

lag_pos_score_diff.

.

lag_pos_team.

.

lead_pos_team.

.

lead_pos_team2.

.

pos_score_pts.

.

pos_score_diff_start.

.

score_diff.

.

lag_score_diff.

.

lag_offense_play.

.

lead_offense_play.

.

lead_offense_play2.

.

score_pts.

.

score_diff_start.

.

offense_receives_2H_kickoff.

.

half_play_number.

.

lag_off_timeouts.

.

lag_def_timeouts.

.

off_timeouts_rem_before.

.

def_timeouts_rem_before.

.

off_timeout_called.

.

def_timeout_called.

.

lead_TimeSecsRem.

.

lead_TimeSecsRem2.

.

lead_yards_to_goal.

.

lead_yards_to_goal2.

.

lead_down.

.

lead_down2.

.

lag_distance3.

.

lag_distance2.

.

lag_distance.

.

lead_distance.

.

lead_distance2.

.

end_of_half.

.

lag_play_type3.

.

lag_play_type2.

.

lag_play_type.

.

lead_play_type.

.

lead_play_type2.

.

lead_play_type3.

.

change_of_poss.

.

change_of_pos_team.

.

pos_team_timeouts.

.

def_pos_team_timeouts.

.

pos_team_timeouts_rem_before.

.

def_pos_team_timeouts_rem_before.

.

The original play_df with the following columns appended/redefined:

lag_change_of_poss.

.

lag_punt.

.

lag_scoring_play.

.

lag_turnover_vec.

.

lag_downs_turnover.

.

lead_play_type.

.

lead_play_type2.

.

lead_play_type3.

.

drive_numbers.

.

number_of_drives.

.

pts_scored.

.

drive_result_detailed.

.

drive_result_detailed_flag.

.

drive_result2.

.

lag_new_drive_pts.

.

lag_drive_result_detailed.

.

lead_drive_result_detailed.

.

new_drive_pts.

.

drive_scoring.

.

drive_play.

.

drive_play_number.

.

drive_event.

.

drive_event_number.

.

new_id.

.

log_ydstogo.

.

down.

.

distance.

.

yards_to_goal.

.

yards_gained.

.

Goal_To_Go.

.

dat with the following columns appended/modified:

turnover_indicator.

.

down.

.

new_id.

.

new_down.

.

distance.

.

yards_to_goal.

.

yards_gained.

.

turnover.

.

drive_start_yards_to_goal.

.

end_of_half.

.

new_yardline.

.

new_distance.

.

new_log_ydstogo.

.

new_Goal_To_Go.

.

new_TimeSecsRem.

.

new_Under_two.

.

first_by_penalty.

.

lag_first_by_penalty.

.

lag_first_by_penalty2.

.

first_by_yards.

.

lag_first_by_yards.

.

lag_first_by_yards2.

.

row.

.

new_series.

.

firstD_by_kickoff.

.

firstD_by_poss.

.

firstD_by_yards.

.

firstD_by_penalty.

.

yds_punted.

.

yds_punt_gained.

.

missing_yard_flag.

.

The original drive_df with the following columns appended to it:

drive_id: Returned as drive_id from original variable drive_id

.

pts_drive: End result of the drive

.

scoring: Logical flag for if drive was a scoring drive updated

.

The original pbp with the following columns appended to it:

rusher_player_name

.

receiver_player_name

.

passer_player_name

.

sack_player_name

.

sack_player_name2

.

pass_breakup_player_name

.

interception_player_name

.

fg_kicker_player_name

.

fg_block_player_name

.

fg_return_player_name

.

kickoff_player_name

.

kickoff_returner_player_name

.

punter_player_name

.

punt_block_player_name

.

punt_returner_player_name

.

punt_block_return_player_name

.

fumble_player_name

.

fumble_forced_player_name

.

fumble_recovered_player_name

.

The original play_df with the following columns appended to it:

yds_rushed

.

yds_receiving

.

yds_int_return

.

yds_kickoff

.

yds_kickoff_return

.

yds_punted

.

yds_fumble_return

.

yds_sacked

.

yds_penalty

.

The original play_df with the following columns appended/redefined:

scoring_play

.

td_play

.

touchdown

.

safety

.

fumble_vec

.

kickoff_play

.

kickoff_tb

.

kickoff_onside

.

kickoff_oob

.

kickoff_fair_catch

.

kickoff_downed

.

kick_play

.

kickoff_safety

.

punt

.

punt_play

.

punt_tb

.

punt_oob

.

punt_fair_catch

.

punt_downed

.

rush

.

pass

.

sack_vec

.

play_type

.

td_check

.

id_play

.

sack

.

int

.

int_td

.

completion

.

pass_attempt

.

target

.

pass_td

.

rush_td

.

turnover_vec

.

offense_score_play

.

defense_score_play

.

downs_turnover

.

scoring_play

.

fg_inds

.

yds_fg

.

yards_to_goal

.

lag_play_type3

.

lag_play_type2

.

lag_play_type

.

lead_play_type

.

lead_play_type2

.

lead_play_type3

.

The original raw_df with the following columns appended/redefined:

penalty_flag: TRUE/FALSE flag for penalty play types or penalty in play text plays.

.

penalty_declined: TRUE/FALSE flag for 'declined' in penalty play types or penalty in play text plays.

.

penalty_no_play: TRUE/FALSE flag for 'no play' in penalty play types or penalty in play text plays.

.

penalty_offset: TRUE/FALSE flag for 'off-setting' in penalty play types or penalty in play text plays.

.

penalty_1st_conv: TRUE/FALSE flag for 1st Down in penalty play types or penalty in play text plays.

.

penalty_text: TRUE/FALSE flag for penalty in text but not a penalty play type.

.

orig_play_type: Copy of original play_type label prior to any changes by the proceeding functions

.

down: Defines kickoff downs and penalties on kickoffs and converts them from 5 (as from the API) to 1.

.

play_type: Defines play_type, "Penalty (Kickoff)", penalties on kickoffs with a repeat kick.

.

half: Defines the half variable (1, 2).

.

Details

Requires the following columns to be present

game_id

.

id_play

.

clock.minutes

.

clock.seconds

.

half

.

period

.

offense_play

.

defense_play

.

home

.

away

.

offense_score

.

defense_score

.

offense_timeouts

.

defense_timeouts

.

play_text

.

play_type

.

Prep for EPA calculations at the end of the play. Requires the following columns be present:

game_id.

.

id_play.

.

drive_id.

.

down.

.

distance.

.

period.

.

yards_to_goal.

.

play_type.

.

Cleans CFB (D-I) Drive-By-Drive Data to create pts_drive column. Requires the following columns be present:

drive_id: Returned as drive_id

.

drive_result: End result of the drive

.

scoring: Logical flag for if drive was a scoring drive

.

game_id: Unique game identifier

.

Cleans CFB (D-I) player Data to create player name columns. Requires the following columns be present:

  • rush.

  • pass.

  • play_text.

  • play_type.

  • sack.

  • fumble_vec.

Cleans CFB (D-I) Drive-By-Drive Data to create yardage column. Requires the following columns be present:

play_text

.

play_type

.

rush

.

pass

.

int

.

int_td

.

kickoff_play

.

kickoff_tb

.

kickoff_downed

.

kickoff_fair_catch

.

fumble_vec

.

sack

.

punt

.

punt_tb

.

punt_downed

.

punt_fair_catch

.

punt_oob

.

punt_blocked

.

penalty_detail

.

Requires the following columns to be present

game_id

.

id_play

.

offense_play

.

defense_play

.

home

.

away

.

play_type

.

play_text

.

kickoff_play

.

down

.

distance

.

yards_gained

.

yards_to_goal

.

change_of_poss

.

penalty_1st_conv

.

off_timeouts_rem_before

.

def_timeouts_rem_before

.

Runs penalty detection on the play text and play types. Requires the following columns be present:

  • game_idReferencing game id.

  • periodGame period (quarter).

  • downDown of the play.

  • play_typeCategorical play type.

  • play_textA description of the play.