Statistics basics and python — part 1
Some notes about stats and python
Mean, median, mode, average
We all know from statistics the most basic phrase — average, or mean. It is simply the sum of the elements divided by their number.
The second is median. To calculate the median, we need the elements of the array to be sorted in either non-increasing or non-decreasing order. We then average the two middle elements or median is (in case of odd number of elements) one middle emlement.
The last is mode, which is the most frequently occurring value in a data set. In our case in case here are more elements with the same maximum frequency, we return the smaller one.
The same, using math library (statisctics) is easier
The next concept is the weight mean — very frequent and very useful. It’s mean that we need to multiply every value by its weight.
More elegant using lambda and maps
And if we play with stats, probably we are using numpy, and with numpy …
Quartiles
In quartiles we need to perform few operations.
- divide set to lower and upper halfs
- get median from lowet set (q1), from all set (q2), from upper set (q3)
Semi-manul solution is like that
But it’s easier use directly median function
Interquartile range
There is only issue to compute defference between Q1 and Q3. Using previous code, we can do something like that
Standard deviation
In standart deviation we have to compute
And simple implementation could be
And as always in easier way (Atention! we use in this case Population stadtar deviation — pstdev() function)