7.3 Statistical Visualizations

AlgebraOfGraphics.jl can perform statistical transformations as layers with five functions:

  • expectation: calculates the mean (expectation) of the underlying Y-axis column
  • frequency: computes the frequency (raw count) of the underlying X-axis column
  • density: computes the density (distribution) of the underlying X-axis column
  • linear: computes a linear trend relationship between the underlying X- and Y-axis columns
  • smooth: computes a smooth relationship between the underlying X- and Y-axis columns

Let’s first cover expectation:

  1. plt = data(df) *
  2. mapping(:name, :grade) *
  3. expectation()
  4. draw(plt)

Figure 57: AlgebraOfGraphics bar plot with expectation.

Figure 57: AlgebraOfGraphics bar plot with expectation.

Here, expectation adds a statistical transformation layer that tells AlgebraOfGraphics.jl to compute the mean of the Y-axis values for every unique X-axis values. In our case, it computed the mean of grades for every student. Note that we could safely remove the visual transformation layer (visual(BarPlot)) since it is the default visual transformation for expectation.

Next, we’ll show an example with frequency:

  1. plt = data(df) *
  2. mapping(:name) *
  3. frequency()
  4. draw(plt)

Figure 58: AlgebraOfGraphics bar plot with frequency.

Figure 58: AlgebraOfGraphics bar plot with frequency.

Here we are passing just a single positional argument to mapping since this is the underlying column that frequency will use to calculate the raw count. Note that, as previously, we could also safely remove the visual transformation layer (visual(BarPlot)) since it is the default visual transformation for frequency.

Now, an example with density:

  1. plt = data(df) *
  2. mapping(:grade) *
  3. density()
  4. draw(plt)

Figure 59: AlgebraOfGraphics bar plot with density estimation.

Figure 59: AlgebraOfGraphics bar plot with density estimation.

Analogous to the previous examples, density does not need a visual transformation layer. Additionally, we only need to pass a single continuous variable as the only positional argument inside mapping. density will compute the distribution density of this variable which we can fuse all the layers together and visualize the plot with draw.

For the last two statistical transformations, linear and smooth, they cannot be used with the * operator. This is because * fuses two or more layers into a single layer. AlgebraOfGraphics.jl cannot represent these transformatinos with a single layer. Hence, we need to superimpose layers with the + operator. First, let’s generate some data:

  1. x = rand(1:5, 100)
  2. y = x + rand(100) .* 2
  3. synthetic_df = DataFrame(; x, y)
  4. first(synthetic_df, 5)
xy
1.01.96607080855035
5.06.993454790161161
5.05.709911916334441
3.04.544011304772464
4.05.382914898287261

Let’s begin with linear:

  1. plt = data(synthetic_df) *
  2. mapping(:x, :y) *
  3. (visual(Scatter) + linear())
  4. draw(plt)

Figure 60: AlgebraOfGraphics scatter plot with linear trend estimation.

Figure 60: AlgebraOfGraphics scatter plot with linear trend estimation.

We are using the distribute property (Section 7) for more efficient code inside our mapping, a * (b + c) = (a * b) + (a + b), where:

  • a: the data and mapping layers fused into a single layer
  • b: the visual transformation layer
  • c: the statistical linear transformation layer

linear adds a linear trend between the X- and Y-axis mappings with a 95% confidence interval shaded region.

Finally, the same example as before but now replacing linear with smooth:

  1. plt = data(synthetic_df) *
  2. mapping(:x, :y) *
  3. (visual(Scatter) + smooth())
  4. draw(plt)

Figure 61: AlgebraOfGraphics scatter plot with smooth trend estimation.

Figure 61: AlgebraOfGraphics scatter plot with smooth trend estimation.

smooth adds a smooth trend between the X- and Y-axis mappings.

7.3 Statistical Visualizatio.. - 图6 Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso