An Ordination Visualisation to Profile Training

Plot relationships between training drills based on athlete activity profile data using non-metric multidimensional scaling

Athlete activity profile data, such as that from global positioning systems (GPS) and accelerometers, is often used to quantify physical output during training1,2. This data may be aggregated to report the mean distances covered and speeds reached across training as a whole and within specific drills. Rather than simply reporting data this way, this edition of Visualising Athlete Data in R covers a technique that formed part of my Honours research (unpublished) to quantify and visualise the (dis)similarity between training drills based on physical output - as measured from GPS and accelerometers. Its hoped what’s covered here may add value to a practitioner’s decision making when designing training around specific activity targets (total distance, high-speed running etc.).

Non-metric multidimensional scaling

Its typical to profile training based on a variety of GPS- and accelerometer-derived variables, and we can use this multivariate data to spatially illustrate the relationships between different drills with non-metric multidimensional scaling (nMDS)3. Often used in ecology, nMDS is an ordination technique that takes a multivariate dataset and calculates the distances between samples and schematically represents the (dis)similarity in a two-dimensional space3. From an nMDS output we can get a sense of “likeness”, whereby similar training drills, for example, are located proximal to each other and dissimilar drills are observed further apart on an ordination plot3. This technique has previously been applied in sport to trace team match performance4, and it may also be useful in highlighting the (dis)similarity between drills that could aid the prescription of training. The following content will provide a (very) brief overview of how we can run nMDS in R using simulated GPS and accelerometer data. For a more comprehensive overview of nMDS, I encourage you to check out other resources online.

Example

Just as I have throughout my Visualising Athlete Data in R series, I’m simulating some data for the purpose of this example. The code below produces a 20 x 6 data frame containing drills and four commonly used variables describing the average physical output from each one. I’ve included the drill_type variable to colour the drill labels by their type when we eventually get to plotting.

library(tidyverse)

set.seed(450)
drill <- paste("Drill", 1:20)
drill_type <- rep(c("Passing & Receiving", "Running", "Defensive", "Tackling"),
                  each = 5)
tot_dist <- c(rnorm(20, 1000, 400))
metres_per_min <- c(rnorm(20, 100, 30))
hsr <- c(rnorm(20, 300, 100))
load_per_min <- c(rnorm(20, 12, 3))

dat <- data.frame(drill, drill_type, tot_dist, metres_per_min, hsr,
                  load_per_min)

dat <- dat %>% mutate_if(is.numeric, round)

head(dat)
##     drill          drill_type tot_dist metres_per_min hsr load_per_min
## 1 Drill 1 Passing & Receiving     1390            101 328           12
## 2 Drill 2 Passing & Receiving      933             86 147           14
## 3 Drill 3 Passing & Receiving     1025             84 265           12
## 4 Drill 4 Passing & Receiving      550            164 358           10
## 5 Drill 5 Passing & Receiving     1075             62 285           11
## 6 Drill 6             Running     1185             91 352           16

Its important to keep in mind here that this dataset isn’t real and its use is only intended to illustrate the functionality of nMDS. If you’re collecting your own GPS and accelerometer data, you’ll most likely have this archived in a neatly formatted spreadsheet where the values will be much more representative of actual training than they are here.

In order to perform nMDS, our data frame needs to contain all numeric variables, so I’m dropping drill and drill_type momentarily using the - symbol in select() and will call on these again soon.

dat_num <- dat %>%
  select(-c(drill, drill_type))

Now we’re ready to run our nMDS analysis. To do this, I’m using the metaMDS() function from the vegan package, so make sure to install this if you haven’t already.

library(vegan)

set.seed(20)
nmds <- metaMDS(dat_num)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1074252 
## Run 1 stress 0.1074252 
## ... Procrustes: rmse 0.0003488619  max resid 0.001203764 
## ... Similar to previous best
## Run 2 stress 0.1074251 
## ... New best solution
## ... Procrustes: rmse 0.0002179825  max resid 0.0007518953 
## ... Similar to previous best
## Run 3 stress 0.1995752 
## Run 4 stress 0.2134182 
## Run 5 stress 0.163605 
## Run 6 stress 0.1074251 
## ... New best solution
## ... Procrustes: rmse 4.492981e-05  max resid 0.0001547674 
## ... Similar to previous best
## Run 7 stress 0.1082238 
## Run 8 stress 0.1890291 
## Run 9 stress 0.1082239 
## Run 10 stress 0.2393549 
## Run 11 stress 0.1074251 
## ... Procrustes: rmse 6.034984e-05  max resid 0.0002079273 
## ... Similar to previous best
## Run 12 stress 0.1074251 
## ... Procrustes: rmse 0.0001228853  max resid 0.000423898 
## ... Similar to previous best
## Run 13 stress 0.1074251 
## ... Procrustes: rmse 6.298803e-05  max resid 0.000215939 
## ... Similar to previous best
## Run 14 stress 0.1074251 
## ... Procrustes: rmse 0.0001134127  max resid 0.0003913605 
## ... Similar to previous best
## Run 15 stress 0.1082238 
## Run 16 stress 0.1082237 
## Run 17 stress 0.2091802 
## Run 18 stress 0.1074251 
## ... Procrustes: rmse 0.0001119139  max resid 0.0003856809 
## ... Similar to previous best
## Run 19 stress 0.163605 
## Run 20 stress 0.1074251 
## ... Procrustes: rmse 0.0001125207  max resid 0.0003853515 
## ... Similar to previous best
## *** Solution reached

The nMDS algorithm runs 20 times to find the smallest stress value which describes the goodness-of-fit of taking multidimensional data and squashing it down to only two dimensions. Lower stress values suggest a better fit of the data3, and here we have a stress of 0.107 representing a fair fit.

By calling str(), you’ll notice that nmds contains a list of a whole bunch of other objects.

str(nmds)
## List of 35
##  $ nobj      : int 20
##  $ nfix      : int 0
##  $ ndim      : num 2
##  $ ndis      : int 190
##  $ ngrp      : int 1
##  $ diss      : num [1:190] 0.0154 0.0213 0.0227 0.0278 0.0286 ...
##  $ iidx      : int [1:190] 12 6 3 6 12 12 5 5 12 14 ...
##  $ jidx      : int [1:190] 6 3 1 5 7 5 3 1 3 1 ...
##  $ xinit     : num [1:40] 0.708 0.868 0.411 0.303 0.888 ...
##  $ istart    : int 1
##  $ isform    : int 1
##  $ ities     : int 1
##  $ iregn     : int 1
##  $ iscal     : int 1
##  $ maxits    : int 200
##  $ sratmx    : num 1
##  $ strmin    : num 1e-04
##  $ sfgrmn    : num 1e-07
##  $ dist      : num [1:190] 0.0307 0.0203 0.0452 0.0563 0.04 ...
##  $ dhat      : num [1:190] 0.0255 0.0255 0.0437 0.0437 0.0437 ...
##  $ points    : num [1:20, 1:2] -0.00833 0.09646 -0.00318 -0.06959 -0.03873 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:2] "MDS1" "MDS2"
##   ..- attr(*, "centre")= logi TRUE
##   ..- attr(*, "pc")= logi TRUE
##   ..- attr(*, "halfchange")= logi TRUE
##   ..- attr(*, "internalscaling")= num 7.62
##  $ stress    : num 0.107
##  $ grstress  : num 0.107
##  $ iters     : int 145
##  $ icause    : int 3
##  $ call      : language metaMDS(comm = dat_num)
##  $ model     : chr "global"
##  $ distmethod: chr "bray"
##  $ distcall  : chr "vegdist(x = comm, method = distance)"
##  $ data      : chr "wisconsin(sqrt(dat_num))"
##  $ distance  : chr "bray"
##  $ converged : logi TRUE
##  $ tries     : num 20
##  $ engine    : chr "monoMDS"
##  $ species   : num [1:4, 1:2] 0.034001 -0.000461 -0.157215 0.121388 -0.141023 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:4] "tot_dist" "metres_per_min" "hsr" "load_per_min"
##   .. ..$ : chr [1:2] "MDS1" "MDS2"
##   ..- attr(*, "shrinkage")= Named num [1:2] 0.0112 0.0112
##   .. ..- attr(*, "names")= chr [1:2] "MDS1" "MDS2"
##   ..- attr(*, "centre")= Named num [1:2] 2.33e-18 8.67e-19
##   .. ..- attr(*, "names")= chr [1:2] "MDS1" "MDS2"
##  - attr(*, "class")= chr [1:2] "metaMDS" "monoMDS"

For visualising our mock training data, I only need the $points object from nmds which contains the positions in both MDS1 and MDS2 axes for each of the drills. I’m storing these points in a new data frame and rejoining the variables drill and drill_type from the original data.

plot_dat <- as.data.frame(nmds$points)

plot_dat$drill <- dat$drill
plot_dat$drill_type <- dat$drill_type

head(plot_dat)
##           MDS1         MDS2   drill          drill_type
## 1 -0.008325439 -0.058195970 Drill 1 Passing & Receiving
## 2  0.096458690  0.008543030 Drill 2 Passing & Receiving
## 3 -0.003175063 -0.013318578 Drill 3 Passing & Receiving
## 4 -0.069592025  0.123089054 Drill 4 Passing & Receiving
## 5 -0.038730410 -0.052840500 Drill 5 Passing & Receiving
## 6 -0.018653684 -0.000207782 Drill 6             Running

We can now go ahead and start plotting!

Plot

Creating the nMDS plot is relatively straightforward once your data is set out correctly. Here’s the code.

library(RColorBrewer)
library(scales)

ggplot(data = plot_dat, aes(x = MDS1, y = MDS2, label = drill)) +
  geom_path(size = 0.1) +
  geom_label(aes(fill = drill_type), alpha = 0.6, color = "black",
             fontface = "bold") +
  scale_fill_manual(values = brewer.pal(name = "Set1", n = 4)) +
  scale_x_continuous(limits = c(-0.35, 0.15),
                     breaks = pretty_breaks(n = 8),
                     labels = number_format(accuracy = 0.01)) +
  scale_y_continuous(breaks = pretty_breaks(n = 5),
                     labels = number_format(accuracy = 0.01)) +
  theme_minimal() +
  guides(fill = guide_legend(title = "Drill Type",
                             title.position = "top",
                             title.hjust = 0.5,
                             override.aes = aes(label = ""))) +
  theme(panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "white", colour = NA),
        legend.position = "top",
        legend.title = element_text(face = "bold", size = 10),
        legend.text = element_text(size = 8),
        legend.key = element_rect(colour = "black"))

Here, I’m using geom_label() to plot the name of each drill in their respective position and, as alluded to above, applying a fill based on drill_type so we can see how these interact with each other also. I’m not personally a fan of the default colour scheme in ggplot2, so I’m using a different palette from the RColorBrewer package to customise this within scale_fill_manual(). The n = 4 in brewer.pal() represents the four different drill types so there’s a unique colour for each one. A neat function called number_format() from scales allows you to format the axis labels to the desired number of decimal places, where I’m using two (accuracy = 0.01) for consistency.

Interpretation

Drills that are clustered together on the ordination surface are similar based on the physical output they tend to elicit, whereas drills that are further apart, such as Drill 20 versus Drill 13, are dissimilar. In this mock example, we can say Drills 1, 3, 5, 6, 7, 12, 14 and 19 share an activity profile that is alike and we may expect the distances covered - including at high-speed - and accelerometer load accumulated to be similar amongst these drills. Conversely, despite Drills 16, 17, 18, 19 and 20 being tackling drills, they all produce a different physical output compared to one another based on their proximity on the plot.

Application

This is a relatively simple method to visualise the relationships between training drills using GPS and/or accelerometer data. A drill ordination plot may provide insight to coaches and practitioners about how athlete activity profile differs between drills and may be referred to when prescribing training.

If you collect GPS and/or accelerometer data during training and use this to profile your drills, give this plotting technique a try and hopefully it adds some value to your decision making around training prescription.

1.  Boyd, L.J., K. Ball, and R.J. Aughey, Quantifying external load in Australian football matches and training using accelerometers. International Journal of Sports Physiology and Performance, 2013. 8(1): p. 44-51.
2.  Corbett, D.M., et al., Development of physical and skill training drill prescription systems for elite Australian Rules football. Science and Medicine in Football, 2018. 2(1): p. 51-57.
3.  Hout, M.C., M.H. Papesh, and S.D. Goldinger, Multidimensional scaling. Wiley Interdiscip Rev Cogn Sci, 2013. 4(1): p. 93-103.
4.  Woods, C.T., et al., Non-metric multidimensional performance indicator scaling reveals seasonal and team dissimilarity within the National Rugby League. Journal of Science and Medicine in Sport, 2018. 21(4): p. 410-415.

Ben Horsley
Ben Horsley
PhD Candidate

Related