FIZ353 - Numerical Analysis | 04/12/2020
Emre S. Tasci emre.tasci@hacettepe.edu.tr
A fair, 6-sided dice has equal probabilities for each of the outcomes between 1-6 and this can be represented in a graph such as:
import numpy as np
import matplotlib.pyplot as plt
fair_pdf = np.ones(6) * 1./6
plt.bar(np.arange(1,7),fair_pdf)
plt.yticks([0,1/12,1/6],["0","1/12","1/6"])
plt.title("PDF of a fair 6-sided dice")
plt.show()
These kind of probability distribution functions (PDFs) where every possible outcome has equal probability are called as Uniform Distributions.
Now consider that we have a "loaded dice" such that its PDF is given by:
loaded_pdf = np.array([1/8, 0, 2/8, 1/8, 3/8, 1/8])
plt.bar(np.arange(1,7),loaded_pdf)
plt.yticks([0,1/8,1/4,3/8],["0","1/8","1/4","3/8"])
plt.title("PDF of a loaded 6-sided dice")
plt.show()
and we are using it to draw our data.
We will assume that we only have a randomizer function that uses uniform distribution.
numpy's random.rand() is such a function that generates a random number uniformly (where each number has an equal probability) between 0 and 1 [0,1), i.e., 0 inclusive, 1 exclusive.
import numpy as np
# Fixing the seed so that the results are repeatable
np.random.seed(353)
# Returns a single random number between 0 and 1 [0,1)
r = np.random.rand()
print("r:",r,"\n")
# Returns a (2x3) matrix filled with random numbers [0,1)
m = np.random.rand(2,3)
print("m:\n",m,"\n")
# To pick a number between a and b,
#all we need to do is multiply the rand by (b-a) and add a, e.g.,
random_number_between_7_and_11 = np.random.rand() * (11-7) + 7
print("r_(7-11): ",random_number_between_7_and_11,"\n")
# To pick a random integer from [a,b):
random_integer_between_7_and_11 = np.floor(np.random.rand() * (11-7) + 7)
print("int_(7-11): ",random_integer_between_7_and_11,"\n")
# ... or we can use the built-in numpy.random.randint(min,max,size):
print(np.random.randint(7,11,[2, 3]))
In our first approach we will put the possible outcomes of our loaded dice into a box, such that the occurence of each of the possible outcomes will reflect the given PDF, e.g., out of a total 8 entries, 1 of them will be 1, no 2, 2 of them will be 3, etc..
loaded_pdf = np.array([1/8, 0, 2/8, 1/8, 3/8, 1/8])
plt.bar(np.arange(1,7),loaded_pdf)
plt.yticks([0,1/8,1/4,3/8],["0","1/8","1/4","3/8"])
plt.title("PDF of a loaded 6-sided dice")
plt.show()
box_contents = np.array([1,3,3,4,5,5,5,6])
So we will pick one from the box, record it and will then put it back. Let's do this operation 10 times:
drawn_numbers = np.array([])
for i in range(1,11):
# We have 8 elements in our box, so randomly pick one
r = np.random.randint(0,8) # r is an integer from [0,7]
rth_element = box_contents[r]
drawn_numbers = np.append(drawn_numbers,rth_element)
print(drawn_numbers)
drawn_numbers
f = drawn_numbers==5
sum(f)
# Let's calculate the frequencies of occurences:
freqs = np.zeros(7)
for i in range(1,7):
freqs[i] = np.sum(drawn_numbers == i)
print ("Number of occurences of %d: %d"%(i,freqs[i]))
# From here on, calculating the occurence probabilities are straightforward:
probs = freqs / 10.
probs = probs[1:7] # Taking out the dummy 0th entry
print(probs)
# We will use the original probability dist for comparison
probs_real = np.array([0, 1/8, 0, 2/8, 1/8, 3/8, 1/8])
plt.bar(range(1,7),probs)
plt.bar(range(1,7),probs_real[1:7],color="none",edgecolor="red")
plt.show()
Not quite what we started from in the first place.
Let's collect all the code we have written so far and also increase the number of drawings from the box:
import numpy as np
import matplotlib.pyplot as plt
# We will use the original probability dist for comparison
probs_real = np.array([0, 1/8, 0, 2/8, 1/8, 3/8, 1/8])
N = 10000; # Number of drawings
box_contents = np.array([1,3,3,4,5,5,5,6])
drawn_numbers = np.array([])
for i in range(0,N):
# We have 8 elements in our box, so randomly pick one
r = np.random.randint(0,8) # r is an integer from [0,7]
rth_element = box_contents[r]
drawn_numbers = np.append(drawn_numbers,rth_element)
# Let's calculate the frequencies of occurences:
freqs = np.zeros(7)
for i in range(1,7):
freqs[i] = np.sum(drawn_numbers == i)
print ("Number of occurences of %d: %d"%(i,freqs[i]))
probs = freqs / N
probs = probs[1:7] # Taking out the dummy 0th entry
print("\nprobabilities: \n",probs)
plt.bar(range(1,7),probs)
plt.bar(range(1,7),probs_real[1:7],color="none",edgecolor="red")
plt.title("PDF of a loaded 6-sided dice after %d rolls"%(N))
plt.show()
And thus, we recover our initial distribution.
Most of the time, we don't have a small number of options (in the limit, we will be talking about continous outcomes), or even rational probabilities meaning that, we are not able to construct a sample box representing the distribution precisely. In those times, we literally employ the probabilities:
Codewise, this translates to:
import numpy as np
import matplotlib.pyplot as plt
# The first 0 is to cover the dummy 0th value
probs = np.array([0, 1/8, 0, 2/8, 1/8, 3/8, 1/8])
N = 10000
drawn_numbers = np.array([])
while drawn_numbers.size<=N:
# First pick a random integer from [1,6]
r = np.random.randint(1,7)
# It's happening probability is probs[r],
# so, roll a dice, if the outcome is less than the
# probability, accept it, else, reject it
if(np.random.rand()<probs[r]):
drawn_numbers = np.append(drawn_numbers,r)
# Calculate the frequencies of occurence & probabilities from the set:
freqs = np.zeros(7)
for i in range(1,7):
freqs[i] = np.sum(drawn_numbers == i)
print ("Number of occurences of %d: %d"%(i,freqs[i]))
probs2 = freqs / N
probs2 = probs2[1:7] # Taking out the dummy 0th entry
print("\nCalculated probabilities: \n",probs2)
plt.bar(range(1,7),probs2)
plt.bar(range(1,7),probs[1:7],color="none",edgecolor="red")
plt.title("PDF of a loaded 6-sided dice after %d rolls"%(N))
plt.show()
A continous distribution is characterized by a normalized PDF p(x) such that: \begin{equation*} \int_{-\infty}^{\infty}{p(x)\,\mathrm{d}x } = 1 \end{equation*}
Gaussian (Normal) distribution is one such distribution that occurs very frequently in nature. Its (normalized) form is given by:
\begin{equation*} G(x|\mu,\sigma) = {{1} \over {\sqrt{2\pi\sigma^2}}}\exp\left[-{{(x-\mu)^2}\over{2\sigma^2}}\right] \end{equation*}where $(\mu,\, \sigma)$ are two variables that characterizes the symmetry center and the spanning of the gaussian curve (later, we'll see that they correspond to the mean & standard deviation of an ideal Gaussian distribution).
We can define and use the Gaussian PDF:
import numpy as np
def gauss(x,mu,sigma):
return 1/(np.sqrt(2*np.pi)*sigma)*np.exp(-(x-mu)**2/(2*sigma**2))
print(gauss(3,4,1.5))
or, we can use the built-in function:
from scipy.stats import norm
x = 3.0
mu = 4.0
sigma = 1.5
print(norm.pdf(x,mu,sigma))
As always, built-in functions are cheaper, more optimized, and faster, thus we'll retain using them whenever we can.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
x = np.linspace(-2,10,1000)
mu = 4.0
sigma = 1.5
g = norm.pdf(x,mu,sigma)
plt.plot(x,g)
plt.show()
norm.pdf([2,6],mu,sigma)
To draw from a continous distribution, we can employ the second approach from the discrete distributions. The corresponding built-in function is the scipy.stats.norm.rvs(mu, sigma, size), e.g.:
from scipy.stats import norm
import matplotlib.pyplot as plt
mu = 4.0
sigma = 1.5
N = 1000
r = norm.rvs(mu,sigma,N)
#print(r)
plt.plot(r,"*-")
plt.show()
plt.hist(r,10)
plt.show()
np.logical_and([True,False],[True,True])
sum(r == 4)
for i in np.arange(0.5,20):
s = sum(np.logical_and((r <= i+1), (r>i)))
print("{:} : {:d}".format(i,s))
Let's work on the probabilities:
The possible paths:
1
L R
LL LR|RL RR
LLL LLR|LRL|RLL LRR|RLR|RRL RRR
Now we're going to simulate this situation. We will N levels. One can see that each level has number of pegs equal to the level number.
import numpy as np
N = 4 # Number of levels.
ball_position = 0 # Center
path = ""
for n in range(1,N):
if(np.random.rand()<0.5):
ball_position -= 1 # Go left:
path += "L"
else:
ball_position += 1 # Go right
path += "R"
print(ball_position)
print(path)
Now let's repeat this experiment with many balls (M = 10000) and higher number of levels (N = 10):
import numpy as np
import matplotlib.pyplot as plt
M = 10000 # Number of balls (/trials)
N = 9 # Number of levels.
storage = np.zeros(N*2)
for m in range(M):
ball_position = 0 # Center
for n in range(1,N):
if(np.random.rand()<0.5):
ball_position -= 1 # Go left:
else:
ball_position += 1 # Go right
#print(ball_position)
storage[ball_position+N] += 1
print(storage)
storage_filtered = storage[storage>0]
print(storage_filtered)
plt.bar(np.arange(1,len(storage_filtered)+1),storage_filtered)
plt.xticks(range(len(storage_filtered)))
plt.show()
When we have a very high number of trials with very low probabilities (like the number of flights & probability of crash, or number of radioactive atoms & probability of decay), the binomial distribution takes the form of:
$$P(k) = \frac{\lambda^k e^{-\lambda}}{k!}$$(here, $\lambda = Np$, with $n$ being the number of trials and $p$ is the probability of the event occuring. As one is very high whereas the other is very low, their product tends to be a reasonable number).
This formula describes the probability of a very-low chance outcome ($p\rightarrow0$) happening k times out of a very high (practically infinite) number of drawings/tries/choosings ($N\rightarrow\infty$) is called as the Poisson Distribution.
import scipy as sp
def poisson(l,k):
return (l**k*np.exp(-l)/sp.special.factorial(k))
137Cs is a radioactive element with a half-life of 27 years. Thus the probability that a single Cs atom will decay now is :
years27toseconds = 27*365*24*60*60 # s
p = 1/years27toseconds #s^(-1)
print(p)
This is a very small probability. But in 1000 ng, there are around 1010 Cs atoms. Each one has a chance to decay, therefore we will have:
$$\lambda = Np = (10^{10})(1.174\times10^{-9}s^{-1})=11.74\text{ decays/s}$$N = 1E10
l = N*p
print("{:g}".format(l))
k=np.arange(0,25)
p_k = poisson(l,k)
print(p_k)
plt.bar(k,p_k)
plt.xticks(k)
plt.show()