The following data-set is given: [71, 50, 48, 67, 53]
The mean of the given data set can be calculated by the following code:
import pandas as pd
dataset = pd.Series([71, 50, 48, 67, 53])
x = dataset.mean()
print("Mean is :", x)
Mean is : 57.8
Another way to describe the data set [71, 48, 48, 67, 53] is to put it in to a table form:
f | f(x) |
---|---|
71 | 1 |
48 | 2 |
67 | 1 |
53 | 1 |
The weighted Mean can be calculated by the following code:
import pandas as pd
import numpy as np
ft = pd.DataFrame.from_dict({
'x': [71, 48, 67, 53],
'fx': [1, 2, 1, 1]
})
x = np.average(a = ft['x'], weights = ft['fx'])
print("Weighted mean is :", x)
Weighted mean is : 57.4
The following data set is given: [71, 50, 48, 67, 53]
The middle-most value is the median and it can be calculated by the following code:
import pandas as pd
dataset = pd.Series([71, 50, 48, 67, 53])
x = dataset.median()
print("Median is :", x)
Median is : 53.0
The median of the data set [48, 50, 53, 65, 67, 71] is:
import pandas as pd
dataset = pd.Series([48, 50, 53, 65, 67, 71])
x = dataset.median()
print("Median is :", x)
Median is : 59.0
The mode is the value that occurs most frequently.
The mode of the data set [71, 50, 48, 48, 53] can be calculated by the following code:
import pandas as pd
dataset = pd.Series([71, 50, 48, 67, 53])
x = dataset.mode()
print("Mode is:", x[0])
Mode is: 48
If the values in a distribution have equal number of occurrences, then there isn’t a single mode.
The set of scores can be bimodal (with two modes), as the following set: [71, 48, 48, 50, 67, 67, 53]
import pandas as pd
dataset = pd.Series([71, 48, 48, 50, 67, 67, 53])
x = dataset.mode()
print("Modes are:", x[0], ",", x[1])
Modes are: 48 , 67
The range is calculated by subtracting the lowest value in a data set from the highest value in the data set.
For the dataset [71, 50, 48, 67, 53] the range is:
import pandas as pd
dataset = pd.Series([71, 50, 48, 67, 53])
x = dataset.max() - dataset.min()
print("Range is:", x)
Range is: 23
The standard deviation represents the average amount of variability in a data set
The standard deviation of the data set [71, 50, 48, 67, 53] is:
import pandas as pd
dataset = pd.Series([71, 50, 48, 67, 53])
x = dataset.std()
print("Standard deviation is:", x)
Standard deviation is: 10.473776778220929
The variance, is simply the squared standard deviation and for the data set [71, 50, 48, 67, 53] is
import pandas as pd
dataset = pd.Series([71, 50, 48, 67, 53])
x = dataset.var()
print("Variance is:", x)
Variance is: 109.69999999999999
The coefficient of variation is a derivative of two other statistical measures - the standard deviation and the mean and it is the ratio between them
Two datasets are given: [71, 50, 48, 67, 53] and [171, 150, 148, 167, 153]. Their coefficients of variation are:
import pandas as pd
def Variation(values):
return values.std() / values.mean()
dataset1 = pd.Series([71, 50, 48, 67, 53])
dataset2 = pd.Series([171, 150, 148, 167, 153])
x1 = Variation(dataset1)
x2 = Variation(dataset2)
print("Coefficient of variation:", x1)
print("Coefficient of variation:", x2)
Coefficient of variation: 0.18120721069586382 Coefficient of variation: 0.06637374384170425