*DSFL Prospects to Graduates Similarity (Part 2)

**Hordle** · (This post was last modified: 09-30-2020, 11:26 PM by Evok.)

Intro:

Welcome back to the Art/Hordle co-op judging everyone party. This class is for the RB’s another small class this season. Will probably see a lot of similarities between this one and the QB one of previous but with different insight to the position. Unlike the QB one both RB prospects have been linked to someone different.

Methodology recap:

Art here. So let's recap how the methodology works. If you missed it, here’s a link to the first article. In this article I’m going to instead talk about the methodology of the code behind the analysis and break down, line by line how it all works.

Just a reminder, first, that this is all done ahead of time so we might miss some prospects who weren’t around when we pulled the data. Apologies ahead of time!

We’ll start by linking the Google Colab notebook that you can view here. Its a bit of a mess, because I’m adjusting the list index as we go along to pull the different positions, but it should give you an idea and you can try to follow along as I explain here.

The first step actually happens out of the notebook - and the step every data scientist spends 99% of their time on - is data manipulation, cleaning and preparation for the code. Normally I’d import the data straight to the notebook, but the TPE tracker (while awesome) is a little bit of a b**ch to import directly and scrape data from - most likely due to the deployment and backend code in dataframes / datatables. Regardless, I made the decision to forego any fancy web scraping and just copy and paste the tables I needed from the tracker. It was ridiculously simple given how well formatted and run the TPE tracker is - the prospect categories saved time and otherwise just filtering for S25 and the right position let me quickly grab the other data I needed for comparisons.

Code:
import pandas as pd

df_s25 = pd.read_csv("./Prospect Import - S25_normalized.csv")

df_s26 = pd.read_csv("./Prospect Import - S26_normalized (1).csv")

Great, we’ve got the data and its normalized (as discussed in the last post). Now we import it into a Pandas DataFrame as outlined above. Unfortunately this requires the csv files quoted above to be in the Google Colab storage and that resets after each kernel restart. In other words, you won’t be able to follow along with the Google Colab and that’s one reason why I’m outlining the code here. (Also cash dollars).

Code:
df_s25.dropna(inplace=True)

df_s26.dropna(inplace=True)

Some simple manipulations to the dataframe to remove any null values so we don’t run into errors later with the math.

Code:
s25 = []

s26 = []

all_s26_dfs = []

all_s25_dfs = []

Next we get some empty lists created to hold our S26 / S25 player vectors and another two for the S26 / S25 dataframes. Since we’re doing this by position, the lists will all be 11 in length, each item in the list representing a position. The S26 / S25 lists will be used just for the plain old vectors, just numbers, while the dataframe list will hold more contextual information (like names).

Code:
for position in df_s25['Position'].unique():

  all_s26_dfs.append(df_s26[df_s26['Position'] == position])

  all_s25_dfs.append(df_s25[df_s25['Position'] == position])

  s25.append(df_s26[df_s26['Position'] == position][['Str.1', 'Agi.1', 'Arm.1', 'Int.1', 'Thr.1', 'Tck.1', 'Spd.1', 'Hnd.1', 'PBlk.1', 'RBlk.1', 'End.1', 'KPow.1', 'KAcc.1']].values)

  s26.append(df_s25[df_s25['Position'] == position][['Str', 'Agi', 'Arm', 'Int', 'Thr', 'Tck', 'Spd', 'Hnd', 'PBlk', 'RBlk', 'End', 'KPow', 'KAcc']].values)

Alright now we get into the good stuff. We create a for loop to iterate through every position. We then append to our list of dataframes a dataframe just for that position ( df_s26[df_s26[‘Position’] == position] grabs a slice of the df_s26 dataframe that matches just that position value in the column ‘position.’ Then we do the same thing but append just the values (a NumPy array) into the empty S26 / S25 lists we initialized earlier.

Code:
def numpy_cosine_similarity(u, v):

  u = np.expand_dims(u, 1)

  n = np.sum(u * v, axis=2)

  d = np.linalg.norm(u, axis=2) * np.linalg.norm(v, axis=1)

  return n / d 

results = []

for i in range(len(s25)):

  results.append(numpy_cosine_similarity(s25[i], s26[i]))

And now we get into the maths. This is a function that will intake two matrices (u and v) and manipulate them to perform the cosine similarity on them. I recommend checking out the earlier article if you want a refresher on the math behind how this works, but it will spit out a vector that is the same length as the S26 vector and contain the cosine probabilities for each S25 row (i.e. the comparison vector). You can see at the end that we initialize a new empty results list and append the results as we move along. The results list can then be used to produce the results into the dataframe.

Code:
for result, df, df2 in zip(results, all_s26_dfs, all_s25_dfs):

  _res = [r for r in result]

  df2.reset_index(inplace=True, drop=True)

  df['Cosine_Similarity'] = _res

  df['Match'] = [list(r).index(max(r)) for r in result]

  df['Player_Match'] = df2.loc[df['Match'], 'Name'].values

  likelihoods = [max(r) for r in result]

  df['Score'] = likelihoods

Now we need to just get it all set up for export and analysis. We take each result, S25 dataframe, and S26 dataframe inside a for loop. We then add to the S26 dataframe the results for Cosine Similarity, the Match index, and the Player Match that matches the Match index (the highest cosine similarity in the vector produced). We also get the highest cosine similarity to produce a score for the Player Match. This is all added to the S26 dataframe as new columns.

Code:
all_s26_dfs[8].to_csv('rb_prospect_results.csv')

And there we have it. The dataframe for whichever index matches the position we’re analyzing is saved as a CSV file. We can then use it for reference when making these articles. There’s slightly more to the visualization, but we’ll save that for next time.

[Image: 6BNAcqm.png]

Color codes: Mr. Forty-Two - 3, Big Chungus - 0, Mike Rotchburns - 2, Bronko Mills - 1

Explaining the chart:

As mentioned in the last article, read the chart left to right to see how each player (represented by a colored line, the color code is above just under the chart) intersects with each attribute (represented by a vertical axis). You can see where the lines intersect are spots of commonality for that player against the other draft prospect and the matched S25 players.

-------------------------------

RBs

User: C9Van
Color # in chart above: 3
Draft Year: S26
Position: RB
Name: Mr. Forty-Two
S25 Match: Bronko Mills
Cosine Similarity: 0.913
Other notable matches: Mike Rotchburns (0.893), Jameson Vermillion (0.872)
Distinguishing attributes: Higher SPD

Why it works:

Art - Well let's get the similarities in first, because this is a match that’s a little strange on paper. Mr. Forty-Two has tremendous speed (its 79 before any uncounted updates) for a prospect, but it does only measure at 0.89 after normalization. Both Mr. Forty-Two and Bronko Mills have above average speed, run blocking, pass blocking and endurance. Mr. Forty-Two and Mills both also intersect with normalized values around intelligence, and a near similarity in strength. But it's stretching things for the match that was chosen, in my personal opinion, as I like Vermillion as a better comparator. But then again I’m not a mathematical function on vectors, am I?

Hordle- Not only am I not a mathematical function on Vectors, I barely understand what it’s trying to tell me. But it is interesting to see how it compares the two different types seeing as I believe there were no speed backs in the last class. Thus leaving the system to pick from what it could. Like you said they have a lot of similar stats aside from the speed and strength difference which makes sense for between the two archetypes. I think I’m going to have to leave this one to you, Art.

Why it doesn’t:

Art - This is far easier. Take your pick: Mr. Forty-Two is a speed back with a significant normalized speed (0.89, almost a full standard deviation away from the mean), Bronko Mills is a power back, and you can see in the chart above looks like a poor match for Mr. Forty-Two and a better match for Big Chungus. Give me Jameson Vermillion or Darren Pama, as a personal opinion, for a better match with a speed that is better reflective. Looking at the normalized values, however, I can see why they weren’t really matched (both Vermillion and Pama have a normalized speed over 1.2, nearly as far from Mr. Forty-Two’s 0.89 as Mills’ 0.45). I take faith that Vermillion was second with 0.872 as a cosine similarity. But enough about numbers, any other reason they don’t match Hordle?

Hordle - I mean they’re the same position with different play styles. Mr. Forty-Two isn’t looking to run over people. He’s looking to break ankles and make you pay for underestimating his speed. The dudes fast, like you said rocking a 79 speed before draft. He’s going to make a lot of unprepared LB’s look silly and might even cause problems for some CB’s if they’re not ready for it. He’s not going to be a huge blocking threat. So his options in terms of shotgun is going to be limited. But I’m sure any team looking at him already knows that. It does open more options with sweeps, as long as his speed and agility continue to climb. He’s going to be able to turn a corner and take off within a second.

What to watch for / build similarities:

Hordle- As of now Mr. Forty-Two is rocking a solid 83 speed. Making him only 17 points off of max speed before the draft. If any team is looking to find someone who can break ankles and bust through a whole this is going to be their guy. With a total of 118 TPE, Mr. Forty-Two is close to capping out on speed and should be very close to reaching the cap by the time he hits the 250 TPE DSFL cap.

Art - You hit the nail on the head Hordle, I feel like all eyes will be on Mr. Forty-Two if he can keep up this earning and keep producing the TPE on speed. I guess the question becomes do you focus entirely on the speed back build and double down on what you’re good at, or do you try to diversify. It's a question that will come up I’m sure again during these position breakdowns, but I love when we have a draft class of just 2 prospects and they are different builds with potentially different focuses on how to create a running back in the league. I’m going to not just be watching Mr. Forty-Two on the field but the update page too.

----------------------------

[img]

User: The_Mediocre_Pigeon
Color # in chart above: 0
Draft Year: S26
Position: RB
Name: Big Chungus
S25 Match: Mike Rotchburns
Cosine Similarity: 0.674
Other notable matches: Buster Bawlls (0.669), Bronko Mills (0.644)
Distinguishing attributes: Higher INT, Higher RBLK

Why it works:

Art - The Mediocre Pigeon’s running back Big Chungus gets matched to Mike Rotchburns, the highest TPE earning back in the S25 class (by 1 TPE, but still). Lets see if we can figure out what similarities led to the match. Both Chungus and Rotchurns have a lower speed than their cohorts, but higher strength - lending themselves to their shared runningback archetype; power back. Both also share decent, if unspectacular run blocking - in the case of Chungus he beats out his fellow draft mate Mr. Forty-Two and fits neatly alongside Rotchurns with a normalized run blocking around 0.45 for the overall class. So based on initial archetype decisions, perhaps it's unsurprising that Chungus gets matched to one of only two power backs in the S25 cohort. The other power back in that class is Bronko Mills, who was a close third in terms of cosine similarity for Big Chungus. Any other reasons it works Hordle?

Hordle- Well Art, you pretty much covered it. It makes sense for a PowerBack to be compared to a PowerBack. Mike is insane with 248 TPE. Leading the way as an example of how a PB should be built. It’ll be interesting to see if Chungus follows in his footsteps. It's going to be interesting to see where the RB’s go with such different Archetypes and to see if either will be asked to change based on team needs. With his current strength and the higher blocking potential, it is possible that Chungus can serve as a dual threat later in his career.

Why it doesn’t:

Art - From a purely build perspective, which is somewhat meaningless at this stage of the game with only a few updates logged from the Mediocre Pigeon, Chungus has less agility than his match Rotchburns. His intelligence is also much closer to the draft class mean than Rotchburn’s to his draft class mean, but that’s probably a result of varying intelligence priorities in player development over the course of their first season in the DSFL. All in all, I think the maths did a great job here matching Chungus to someone who makes sense for the build. And I mean, if you’re Mediocre Pigeon, you’d hope it's right - Rotchburns had a terrific season with over 1000 yards rushing (and 5 pancakes).

Hordle - The minimal updates are interesting for sure. I don’t know Pigeon and haven’t really seen him but that doesn’t necessarily mean anything. Things come up so it’s possible that he’s just waiting. Beside the GM’s are the ones with information. So it could be possible to see Pigeon drop a huge TPE update bomb come draft day. Either way having a dual threat in blocking and running would be good for any team with the space to pick him up and if Pigeon does follow in Rotchburns footsteps he could end up having a very good season.

What to watch for / build similarities:

Art - Think we’ve pretty much covered what I’m watching here - I want to see if Pigeon can follow in Rotchburns footsteps and earn all that TPE, making waves both through and at the line of scrimmage. The comparison to C9Van will always exist, so it’ll be great fun to revisit these two at the end of the season and see if they’ve both moved towards the regular RB build - high speed over everything - or Big Chungus will follow in the footsteps of his match and Mills and diversify a bit. Should be fun - yet another friendly positional battle to keep an eye on.

Hordle - 100% agree with you Art. We’ve pretty much covered everything there is to cover here. Mr. Forty-Two is currently running high, pun intended, in current rb draft stock. Which isn’t hard when there are just two of them. They’re gonna be a pair that we just have to watch and see how they develop. Though one thing I will note, since we only went back a season. Zoe Watt’s is also a speed back and she’s coming off of a fantastic season. Proof that you don’t have to have power to succeed.

Extra:
I am sorry for what's coming next. - Hordle

Quote:Word count - 2652
Please split between Hordle and wonderful_art