Statistics basics and python — part 1

Some notes about stats and python

Rafał Łagowski
2 min readJun 21, 2022

Mean, median, mode, average

We all know from statistics the most basic phrase — average, or mean. It is simply the sum of the elements divided by their number.

The second is median. To calculate the median, we need the elements of the array to be sorted in either non-increasing or non-decreasing order. We then average the two middle elements or median is (in case of odd number of elements) one middle emlement.

The last is mode, which is the most frequently occurring value in a data set. In our case in case here are more elements with the same maximum frequency, we return the smaller one.

The same, using math library (statisctics) is easier

The next concept is the weight mean — very frequent and very useful. It’s mean that we need to multiply every value by its weight.

More elegant using lambda and maps

And if we play with stats, probably we are using numpy, and with numpy …

Quartiles

In quartiles we need to perform few operations.

  1. divide set to lower and upper halfs
  2. get median from lowet set (q1), from all set (q2), from upper set (q3)

Semi-manul solution is like that

But it’s easier use directly median function

Interquartile range

There is only issue to compute defference between Q1 and Q3. Using previous code, we can do something like that

Standard deviation

In standart deviation we have to compute

And simple implementation could be

And as always in easier way (Atention! we use in this case Population stadtar deviation — pstdev() function)

--

--

Rafał Łagowski
Rafał Łagowski

Written by Rafał Łagowski

Software Engeneer, DevOps, MLOPs, AI enthusiast

No responses yet