A Course on Elementary Probability Theory

# A Course on Elementary Probability Theory

###### Abstract.

(English) This book introduces to the theory of probabilities from the beginning. Assuming that the reader possesses the normal mathematical level acquired at the end of the secondary school, we aim to equip him with a solid basis in probability theory. The theory is preceded by a general chapter on counting methods. Then, the theory of probabilities is presented in a discrete framework. Two objectives are sought. The first is to give the reader the ability to solve a large number of problems related to probability theory, including application problems in a variety of disciplines. The second was to prepare the reader before he approached the manual on the mathematical foundations of probability theory. In this book, the reader will concentrate more on mathematical concepts, while in the present text, experimental frameworks are mostly found. If both objectives are met, the reader will have already acquired a definitive experience in problem-solving ability with the tools of probability theory and at the same time he is ready to move on to a theoretical course on probability theory based on the theory of measurement and integration. The book ends with a chapter that allows the reader to begin an intermediate course in mathematical statistics.

(Français) Cet ouvrage introduit à la théorie des probabilités depuis le début. En supposant que le lecteur possède le niveau mathématique normal acquis à la fin du lycée, nous ambitionnons de le doter d’une base solide en théorie des probabilités. L’exposé de la théorie est précédé d’un chapitre général sur les méthodes de comptage. Ensuite, la théorie des probabilités est présentée dans un cadre discret. Deux objectifs sont recherchés. Le premier est de donner au lecteur la capacité à résoudre un grand nombre de problèmes liés à la théorie des probabilités, y compris les problèmes d’application dans une variété de disciplines. Le second était de préparer le lecteur avant qu’il n’aborde l’ouvrage sur les fondements mathématiques de la théorie des probabilités. Dans ce dernier ouvrage, le lecteur se concentrera davantage sur des concepts mathématiques tandis que dans le présent texte, il se trouvent surtout des cadres expérimentaux. Si les deux objectifs sont atteints, le lecteur aura déjà acquis une expérience définitive en capacité de résolution de problèmes de la vie réelle avec les outils de la théorie des probabilités et en même temps, il est prêt à passer à un cours théorique sur les probabilités basées sur la théorie de la mesure et l’intégration. Le livre se termine par par un chapitre qui permet au lecteur de commencer un cours intermédiaire en statistiques mathématiques.

Gane Samb LO

A Course on Elementary Probability Theory

Statistics and Probability African Society (SPAS) Books Series.

Saint-Louis, Calgary, Alberta. 2016.

ISBN 978-2-9559183-3-3

SPAS TEXTBOOKS SERIES

GENERAL EDITOR of SPAS EDITIONS

Prof Gane Samb LO
gane-samb.lo@ugb.edu.sn, gslo@ugb.edu.ng
Gaston Berger University (UGB), Saint-Louis, SENEGAL.
African University of Sciences and Technology, AUST, Abuja, Nigeria.

ASSOCIATED EDITORS

KEHINDE DAHUD SHANGODOYIN
shangodoyink@mopipi.ub.bw
UNIVERSITY Of BOTSWANA

Blaise SOME
some@univ-ouaga.bf
Chairman of LANIBIO, UFR/SEA
Ouaga I Pr Joseph Ki-Zerbo University.

Gaston Berger University, Senegal.

Tchilabalo Abozou KPANZOU
kpanzout@yahoo.fr
Kara University, Togo.

List of published books

SPAS Textbooks Series.

Weak Convergence (IA). Sequences of random vectors.
Par Gane Samb LO, Modou Ngom and Tchilabalo Abozou KPANZOU. (2016) Doi : 10.16929/sbs/2016.0001. ISBN : 978-2-9559183-1-9 (English).

A Course on Elementary Probability Theory.
Gane Samb LO (March 2017)
Doi : 10.16929/sbs/2016.0003. ISBN : 978-2-9559183-3-3 (English).

www.statpas.net/spaseds/

Gane Samb LO, 1958-

A Course on Elementary Probability Theory.

SPAS Books Series, 2016.

DOI : 10.16929/sbs/2016.0003

ISBN 978-2-9559183-3-3

Author : Gane Samb LO

Emails:
gane-samb.lo@ugb.edu.sn, ganesamblo@ganesamblo.net, gslo@aust.edu.ng

Url’s:
www.ganesamblo@ganesamblo.net
www.statpas.net/cva.php?email.ganesamblo@yahoo.com.

Affiliations.
Main affiliation : University Gaston Berger, UGB, SENEGAL.
African University of Sciences and Technology, AUST, ABuja, Nigeria.
Affiliated as a researcher to : LSTA, Pierre et Marie Curie University, Paris VI, France.

Teaches or has taught at the graduate level in the following universities:
Saint-Louis, Senegal (UGB)
Banjul, Gambia (TUG)
Bamako, Mali (USTTB)
African Institute of Mathematical Sciences, Mbour, SENEGAL, AIMS.
Franceville, Gabon

Acknowledgment of Funding.

The author acknowledges continuous support of the World Bank Excellence Center in Mathematics, Computer Sciences and Intelligence Technology, CEA-MITIC. His research projects in 2014, 2015 and 2016 are funded by the University of Gaston Berger in different forms and by CEA-MITIC.

Keywords. Combinatorics; Discrete counting; Elementary Probability; Equi-probability; Events and operation on events; Independence of events; Conditional probabilities; Bayes’ rules; Random variables; Discrete and continuous random variables; Probability laws; Probability density functions; Cumulative distribution functions; Independence of random variables; Usual probability laws and their parameters; introduction to statistical mathematics.

AMS 2010 Classification Subjects : 60GXX; 62GXX.

Dedication.

To our beloved and late sister Khady Kane LO

27/07/1953 - 7/11/1988

## General Preface

This textbook part of a series whose ambition is to cover broad part of Probability Theory and Statistics. These textbooks are intended to help learners and readers, both of of all levels, to train themselves.

As well, they may constitute helpful documents for professors and teachers for both courses and exercises. For more ambitious people, they are only starting points towards more advanced and personalized books. So, these texts are kindly put at the disposal of professors and learners.

Our textbooks are classified into categories.

A series of introductory books for beginners. Books of this series are usually accessible to student of first year in universities. They do not require advanced mathematics. Books on elementary probability theory and descriptive statistics are to be put in that category. Books of that kind are usually introductions to more advanced and mathematical versions of the same theory. The first prepare the applications of the second.

A series of books oriented to applications. Students or researchers in very related disciplines such as Health studies, Hydrology, Finance, Economics, etc. may be in need of Probability Theory or Statistics. They are not interested by these disciplines by themselves. Rather, the need to apply their findings as tools to solve their specific problems. So adapted books on Probability Theory and Statistics may be composed to on the applications of such fields. A perfect example concerns the need of mathematical statistics for economists who do not necessarily have a good background in Measure Theory.

A series of specialized books on Probability theory and Statistics of high level. This series begin with a book on Measure Theory, its counterpart of probability theory, and an introductory book on topology. On that basis, we will have, as much as possible, a coherent presentation of branches of Probability theory and Statistics. We will try to have a self-contained, as much as possible, so that anything we need will be in the series.

Finally, research monographs close this architecture. The architecture should be so large and deep that the readers of monographs booklets will find all needed theories and inputs in it.

We conclude by saying that, with only an undergraduate level, the reader will open the door of anything in Probability theory and statistics with Measure Theory and integration. Once this course validated, eventually combined with two solid courses on topology and functional analysis, he will have all the means to get specialized in any branch in these disciplines.

Our collaborators and former students are invited to make live this trend and to develop it so that the center of Saint-Louis becomes or continues to be a re-known mathematical school, especially in Probability Theory and Statistics.

## Preface of the first edition 2016

The current series of Probability Theory and Statistics are based on two introductory books for beginners : A Course of Elementary probability Theory and A course on Descriptive Statistics.

All the more or less advanced probability courses are preceded by this one. We strongly recommend to not skip it. It has the tremendous advantage to make feel the reader the essence of probability theory by using extensively random experiences. The mathematical concepts come only after a complete description of a random experience.

This book introduces to the theory of probabilities from the beginning. Assuming that the reader possesses the normal mathematical level acquired at the end of the secondary school, we aim to equip him with a solid basis in probability theory. The theory is preceded by a general chapter on counting methods. Then, the theory of probabilities is presented in a discrete framework.

Two objectives are sought. The first is to give the reader the ability to solve a large number of problems related to probability theory, including application problems in a variety of disciplines. The second was to prepare the reader before he approached the textbook on the mathematical foundations of probability theory. In this book, the reader will concentrate more on mathematical concepts, while in the present text, experimental frameworks are mostly found. If both objectives are met, the reader will have already acquired a definitive experience in problem-solving ability with the tools of probability theory and at the same time he is ready to move on to a theoretical course on probability theory based on the theory of measurement and integration.

The book ends with a chapter that allows the reader to begin an intermediate course in mathematical statistics.

To my late and beloved sister Khady Kane Lo(1953- ).

Saint-Louis, Calgary, Abuja, Bamako, Ouagadougou, 2017.

Preliminary Remarks and Notations.

WARNINGS

(1)

(2)

## Introduction

There exists a tremendous number of random phenomena in nature, real life and experimental sciences.

Almost everything is random in nature : whether, occurrences of rain and their durations, number of double stars in a region of the sky, lifetimes of plants, of humans, and of animals, life span of a radioactive atom, phenotypes of offspring of plants or any biological beings, etc.

The general theory states that each phenomena has a structural part (that is deterministic) and an random part (called the error or the deviation).

Random also appears is conceptual experiments : tossing a coin once or 100 times, throwing three dice, arranging a deck of cards, matching two decks, playing roulettes, etc.

Every day human life is subject to random : waiting times for buses, traffic, number of calls on a telephone, number of busy lines in a communication network, sex of a newborn, etc.

The reader is referred to [Feller (1968)] for a more diverse and rich set of examples.

The quantitative study of random phenomena is the objective of Probability Theory and Statistics Theory. Let us give two simple examples to briefly describe each of these two disciplines.

### 0.1. The point of view of Probability Theory

In Probability Theory, one assigns a chance of realization to a random event before its realization, taking into account the available information.

Example : A good coin, that is a homogenous and well balanced, is tossed. Knowing that the coin cannot stand on its rim, there is a 50% chances of having a Tail.

We base our conclusion on the lack of any reason to favor one of the possible outcomes : head or tail. So we convene that these outcomes are equally probable and then get 50% chances for the occurring for each of them.

### 0.2. The point of view of Statistics

Let us start with an example. Suppose that we have a coin and we do not know any thing of the material structure of the coin. In particular, we doubt that the coin is homogenous.

We decide to toss it repeatedly and to progressively monitor the occurring frequency of the head. We denote by the number of heads obtained after tossings and define the frequency of the heads by

 Fn=Nnn.

It is conceivable to say that the stability of in the neighborhood of some value is an important information about the structure of the coin.

In particular, if we observe that the frequency does not deviate from more than whenever is greater than , that is

 |Fn−1/2|≤10−3.

for , we will be keen to say that the probability of occurrence of the head is and, by this, we accept that that the coin is fair.

Based on the data (also called the statistics), we estimated the probability of having a head at and accepted the pre-conceived idea (hypothesis) that the coin is good in the sense of homogeneity.

The reasoning we made and the method we applied are perfect illustrations of Statistical Methodology : estimation and model or hypothesis validation from data.

In conclusion, Statistics Theory enables the use of the data (also called statistics or observations), to estimate the law of a random phenomenon and to use that law to predict the future of the same phenomenon (inference) or to predict any other phenomenon that seems identical to it (extrapolation).

### 0.3. Comparison between the two approaches

(1) The discipline of Statistics Theory and that of Probability Theory are two ways of treating the same random problems.

(2) The first is based primarily on the data to draw conclusions .

(3) The second is based on theoretical, and mathematical considerations to establish theoretical formulas.

(4) Nevertheless, the Statistics discipline may be seen as the culmination of Probability Theory.

### 0.4. Brief presentation of the book

This book is an introduction to Elementary Probability Theory. It is the result of many years of teaching the discipline in Universities and High schools, mainly in Gaston Berger Campus of Saint-Louis, SENEGAL.

It is intended to those who have never done it before. It focuses on the essential points of the theory. It is particularly adapted for undergraduate students in the first year of high schools.

The basic idea of this book consists of introducing Probability Theory, and the notions of events and random variables in discrete probability spaces. In such spaces, we discover as much as possible at this level, the fundamental properties and tools of the theory.

So, we do not need at this stage the elaborated notion of -algebras or fields. This useful and brilliant method has already been used, for instance, in Billingsley [Billingsley (1995)] for deep and advanced probability problems.

The book will finish by a bridge chapter towards a medium course of Mathematical Statistics. In this chapter, we use an anology method to express the former results in a general shape, that will be the first chapter of the afore mentioned course.

The reader will have the opportunity to master the tools he will be using in this course, with the course on the mathematical foundation of Probability Theory he will find in Lo [Lo (2016)], which is an element of our Probability and Statistics series. But, as explained in the prefaces, the reader will have to get prepared by a course of measure theory.

In this computer dominated world, we are lucky to have very powerful and free softwares like R and Scilab, to cite only the most celebrated. We seized this tremendous opportunity to use numerical examples using R software throughout the text.

The remainder of the book is organized as following.

Chapter 1 is a quick but sufficient introduction to combinatorial analysis. The student interested in furthering knowledge in this subject can refer to the course on general algebra.

Chapter 2 is devoted to an introduction to probability measures in discrete spaces. The notion of equi-probability is also dealt with, there.

Conditional probability and independence of random events are addressed with in Chapter 3.

Chapter 4 introduces random variables, and presents a review of the usual probability laws.

Chapter 5 is devoted to the computations of the parameters of the usual laws. Mastering the results of this chapter and those in Chapter 4 is in fact of great importance for higher level courses.

Distribution functions of random variables are studied in Chapter 7, which introduces non-discrete random variables.

## Chapter 1 Elements of Combinatorics

Here, we are concerned with counting cardinalities of subsets of a reference set , by following specific rules. We begin by a general counting principle.

The cardinality of a set is its number of elements, denoted by or . For an infinite set , we have

 Card(E)=#(E)=+∞.

### 1.1. General Counting principle

Let us consider the following example : A student has two (2) skirts and four (4) pants in his closet. He decides to pick at random a skirt and a pant to dress. In how many ways can he dress by choosing a skirt and a pant? Surely, he has 2 choices for choosing a skirt and for each of these choices, he has four possibilities to pick a pant. In total, he has

 2×4

ways to dress.

We applied the following general counting principle.

Proposition 2.1. Suppose that the set of size can be partitioned into subsets , of same size; and that each of these subsets can be split into subsets , j = 1, . . ., , , of same size, and that each of the can be divided into subsets , , , of same size also. Suppose that we may proceed like that up an order with subsets with common size .

Then the cardinality of is given by

 n=n1×n2××...×nk×a

Proof. Denote by the cardinality of a subset generated at step , for . A step , we have subsets partitioned into subsets of same size. Then we have

 Bh=nh+1×Bh+1 for all 0≤h≤k+1.

with , . Now, the proof is achieved by induction in the following way

 n = B0=n1×B1 = n1×n2×n3B3 ... = n1×n2×n3B3×....×nk×B(k+1),

with .

Although this principle is simple, it is a fundamental tool in combinatorics : divide and count.

But it is not always applied in a so simple form. Indeed, partitioning is the most important skill to develop in order to apply the principle successfully.

This is what we will be doing in all this chapter.

### 1.2. Arrangements and permutations

Let be a set of elements with .

Definition 2.1. A -tuple of is an ordered subset of with distinct elements of . A -tuple is called a -permutation or -arrangement of elements of and the number of -permutations of elements of is denoted as (we read before ). It as also denoted by .

We have the result.

Theorem 2.1. For all , we have

 Apn=n(n−1)(n−2)......(n−p+1).

Proof. Set . Let be the class of all ordered subsets of elements of . We are going to apply the counting principle to .

It is clear that may be divided into subsets , where each is the class of ordered subsets of with first elements . Since the first element is fixed to , the cardinality of is the number ordered subsets of , so that the classes have a common cardinality which is the number of -permutations from a set of elements. We have proved that

 (1.2.1) Apn=n×Ap−1n−1,

for any . We get by induction

 Apn=n×(n−1)×Ap−2n−2

and after repetitions of (1.2.1), we arrive at

 Apn=n×(n−1)×(n−2)×...×(n−h)×Ap−hn−h.

For , we have

 Apn=n×(n−1)×(n−2)×...×(n−p+2)×A1n−p+1.

And, clearly, since is the number of singletons form a set of elements.

Remark. Needless to say, we have for .

Here are some remarkable values of . For any positive integer , we have

(i) .
(ii) .

From an algebraic point of view, the numbers also count the number of injections.

### 1.3. Number of Injections, Number of Applications

We begin to remind the following algebraic definitions :

A function from a set to a set is a correspondence from to such that each element of has at most one image in .

An application from a set to a set is a correspondence from to such that each element of has exactly one image in .

An injection from a set to a set is an application from to such that each any two distinct elements of have distinct images in .

A surjection from a set to a set is an application from to such that each element of is the image of at least one element of .

A bijection from a set onto a set is an application from to such that that each element of is the image of one and only element of .

If there is an injection (respectively a bijection) from to , then we have the inequality : (respectively, the equality ).

The number of -arrangements from elements () is the number of injections from a set of elements to a set of elements.

The reason is the following. Let and be sets with and with . Forming an injection from on is equivalent to choosing a -tuple in and to set the following correspondance , . So, we may find as many injections from to as -permutations of elements of from .

Thus, is also the number of injections from a set of elements on a set of elements.

The number of applications from a set of elements to a set of elements is .

Indeed, let and be sets with and with no relation between and . Forming an application from on is equivalent to choosing, for any , one arbitrary element of and to assign it to as its image. For the first element of , we have choices, choices also for the second , choices also for the third , and so forth. In total, we have choices to form an application from to .

Example on votes casting in Elections. The members of some population are called to mandatory cast a vote for one of candidates at random. The number of possible outcomes is the number of injections from a set of elements to a set of elements : .

### 1.4. Permutations

Definition 2.2. A permutation or an ordering of the elements of is any ordering of all its elements. If is the cardinality of , the number of permutations of the element of is called : n factorial, and denoted (that is followed with an exclamation point).

Before we give the properties of n factorial, we point out that a permutation of objects is a -permutation of elements.

Theorem 2.2. For any , we have

(i) .

(ii) .

Proof. By reminding that a permutation of objects is a -permutation of elements, we may see that Point (i) is obtained for in the Formula of Theorem 2.1. Point (ii) obviously derives from from (i) by induction.

Exercise 2.1. What is the number of bijections between two sets of common cardinalily ?

Exercise 2.2. Check that for any pour .

 Apn=n!/(n−p)!

,

### 1.5. Permutations with repetition

Let be collection of distinct objects. Suppose that is partitioned into sub-collections of respective sizes , . . . the following properties apply :

(i) Two elements of a same sub-collection are indistinguishable between them.

(ii) Two elements from two different sub-collections are distinguishable one from the other.

What is the number of permutations of ? Let us give an example.

Suppose we have balls of the same form (not distinguishable by the form, not at sight nor by touch), with of them in red color, of blue color and of green color. And we can distinguish them only by their colors. In this context, two balls of the same color are the same for us, from our sight. An ordering of these balls in which we cannot distinguish the balls of a same color is called a permutation with repetition or a visible permutation.

Suppose that have we realized a permutation of these balls. Permuting for example only red balls between them does not change the appearance of the global permutation. Physically, it has changed but not visibly meaning from our sight. Such a permutation may be described as visible, or permutation with repetition.

In fact, any of real permutations represents all the visible repetitions where the red balls are permuted between them, the blue balls are permuted between them and the green balls are permuted between them. By the counting principle, a real permutation represents exactly

 9!×6!×5!

permutations with repetition. Hence, the number of permutations with repetition of these balls is

 20!9!×6!×5!

Now, we are going to do the same reasoning in the general case.

As we said before, we have two types of permutations here :

(a) The real or physical permutations, where the are supposed to be distinguishable.

(b) The permutations with repetition in which we cannot distinguish between the elements of a same sub-collection.

Theorem 2.3. The number of permutations with repetition of a collection of objects partitioned into sub-collections of size , , such such only elements from different sub-collections are distinguishable between them, is given by

 B(n1,n2,...,nk)=n!n1!n2!......nk!.

Terminology. The numbers of permutations with repetition are also called multinomial coefficients, in reference Formula (1.5.3).

Proof. Let be number of permutations with repetition. Consider a fixed real permutation. This real permutation corresponds exactly to all permutations with repetition obtained by permuting the objects of between them, the objects of between them, . . ., and the objects of between them. And we obtain permutations with repetition corresponding to the same real permutation. Since this is true for any real permutation which generates

 n1!×n2!×...×nk!

visible permutations, we get that

 (n1!n2!.....nk!)×B(n1,n2,...,nk)=n!.

This gives the result in the theorem.

#### 1.5.1. Combinations

Let us define the numbers of combinations as follows.

Definition 2.3. Let be a set of elements. A combination of elements of is a subset if of size .

In other words, any subset of is a -combination of elements of if and only if .

It is important to remark that we can not have combinations of more that elements in a set of elements.

We have :

Theorem 2.4. The number of combinations of elements from elements is given by

Proof. Denote by

the number of combinations of elements from elements.

The collection of -permutations is exactly obtained by taking all the orderings of the elements of the combinations of elements from . Each combination gives -permutations of . Hence, the cardinality of the collection of -permutations is exactly times that of the class of -combinations of , that is

By using Exercise 2.1 above, we have

We also have this definition :

Definition 2.4. The numbers , are also called binomial coefficients because of Formula (1.5.2) below.

#### 1.5.2. Urn Model

The urn model plays a very important role in discrete Probability Theory and Statistics, especially in sampling theory. The simplest example of urn model is the one where we have a number of balls, distinguishable or not, of different colors.

We are going to apply the concepts seen above in the context of urns.

Suppose that we want to draw balls at random from an urn containing balls that are distinguishable by touch (where touch means hand touch).

We have two ways of drawing.

(i) Drawing without replacement. This means that we draw a first ball and we keep it out of the urn. Now, there are balls in the urn. We draw a second and we have balls left in the urn. We repeat this procedure until we have the balls. Of course, should be less or equal to .

(ii) Drawing with replacement. This means that we draw a ball and take note of its identity or its characteristics (that are studied) and put it back in the urn. Before each drawing, we have exactly balls in the urn. A ball can be drawn several times.

It is clear that the drawing model (i) is exactly equivalent to the following one :

(i-bis) We draw balls at the same time, simultaneously, at once.

Now, we are going to see how the -permutations and the -combinations occur here, by a series of questions and answers.

Questions. Suppose that an urn contains distinguishable balls. We draw balls. In how many ways can the drawing occur? Or what is the number of possible outcomes in term of subsets formed by the drawn balls?

Solution 1. If we draw the balls without replacement and we take into account the order, the number of possible outcomes is the number of -permutations.

Solution 2. If we draw the balls without replacement and we do not take the order into account or there is no possible ordering, the number outcomes is the number of -combinations from .

Solution 3. If we draw the balls with replacement, the number of outcomes is , the number of applications.

Needless to say, the ordering is always assumed if we proceed by a drawing with replacement.

Please, keep in mind these three situations that are the basic keys in Combinatorics.

Now, let us explain the solutions before we continue.

Proof of Solution 1. Here, we draw the balls once by once. We have choices for the first ball. Once this ball is out, we have remaining balls in the urn and we have choices for the second ball. Thus, we have

 n×(n−1)

possible outcomes to draw two ordered balls. For three balls, , we have the number

 n×(n−1)×(n−2).

Remark that for , the number of possible outcomes is

 n×(n−1)×(n−2)×....×(n−p+1)=Apn.

You will have any difficulty to get this by induction.

Proof of Solution 2. Here, there is no ordering. So we have to divide the number of ordered outcomes by to get the result.

Proof of Solution 3. At each step of the drawing, we have choices. At the end, we get ways to draw elements.

We are going to devote a special subsection to the numbers of combinations or binomial coefficients.

#### 1.5.3. Binomial Coefficients

Here, we come back to the number of combinations that we call Binomial Coefficients here. First, let us state their main properties.

Main Properties.

Proposition 2.1. We have
(1) for all .
(2) for all .
(3 , pour .
(4) , for all .

Proof. Here, we only prove Point (4). The other points are left to the reader as exercises.

 =p{(n−1)!p!(n−p)!}+(n−p){(n−1)!p!(n−p)!}
 ={(n−1)!p!(n−p)!}{n+n−p}=n(n−1)!p!(n−p)!

Pascal’s Triangle. We are going to reverse the previous way by giving some of these properties as characteristics of the binomial coefficients. We have

Proposition 2.3. The formulas

(i) for all .
(ii) , for all .

entirely characterize the binomial coefficients

Proof. We give the proof by using Pascal’s triangle. Point (ii) gives the following clog rule (règle du sabot in French)

n p p-1 p
n-1
n

or more simply

 u v u+v

With is rule, we may construct the Pascal’s triangle of the numbers .

 n/p 0 1 2 3 4 5 6 7 8 0 1 1 1=u 1=v 2 1 u+v2 1 3 1 3 3 1 4 1 4=u 6=v 4 1 5 1 5 u+v10 10 5 1 6 1 1 7 1 1 8 1 1

The reader is asked to continue to fill this triangle himself. Remark that filling the triangle only requires the first column (), the diagonale () and the clog rule. So points (i) and (ii) are enough to determine all the binomial coefficients. This leads to the following conclusion.

Proposition 2.4. Any array of integers , such that

(i) == 1, for all tout n ,

(ii) for 1 .

is exactly the array of of binomial coefficients, that is

We are going to visit the Newton’s formula.

The Newton’s Formula.

Let us apply the result just above to the power of a sum of two scalars in a commutative ring ( for example).

###### Theorem 1.

For any , for any , we have

 (1.5.1)

Proof. Since is a commutative ring, we know that is a polynom in and and it is written as a linear combination of terms , ,…,. We have the formula

 (1.5.2) (a+b)n=n∑p=0 β(p, n) ap bn−p.

It will be enough to show that the array is actually that of the binomial coefficients. To begin, we write

 (a+b)n=(a+b)×(a+b)×...×(a+b).

From there, we see that is the number of choices of or in each factor such that is chosen times and is chosen times. Thus since is obtained in the unique case where is chosen in each factor . Likely , since this corresponds to the monomial , that is the unique case where is chosen in each case. So, Point (i) is proved for the array . Next, we have

 (a+b)n=(a+b)(a+b)n−1=(a+b)×n−1∑p=0 β(p, n−1)ap bn−p−1
 =(a+b)(. . .+β(p−1, n−1) ap−1bn−p
 ...+β(p, n−1)ap bn−p−1+....

This means that, when developing , the term can only come out

(1) either from the product of by of the binomial ,

(2) or from the product of by of the binomial .

Then it comes out that for , we get

 β(p−1, n−1)+β(p, n−1)=β(p, n).

We conclude that the array fulfills Points (i) and (ii) above. Then, this array is that of the binomial coefficients. QED.

Remark. The name of binomial coefficients comes from this Newton’s formula.

Multiple Newton’s Formula.

We are going to generalize the Newton’s Formula from dimension to an arbitrary dimension . Then binomial coefficients will be replaced by the numbers of permutations with repetition.

Let and let be given real numbers ,, …, and and let be a positive number. Consider

 Γn={(n1,...,nk),n1≤0,...,nk≤0,n1+...+nk=n}.

We have

 (1.5.3) (a1+...+ak)n=∑(n1,...,nk)∈Γnn(n1!×n2!×...×nk!)an11×an12×...×ankk,

that we may write in a more compact manner in

 (1.5.4) (k∑i=1ai)n=∑(n1,...,nk)∈Γnn∏ki=1ni!k∏i=1anii.

We will see how this formula is important for the multinomial law, which in turn is so important in Statistics.

Proof. Let us give a simple proof of it.

To develop , we have to multiply by itself times. By the distributivity of the sum with respect to the product, the result will be a sum of products

 z1×z2×...×zn

where each is one of the , , …, . Par commutativity, each of these products is of the form

 (1.5.5) an11×an12×...×ankk,

where . And for a fixed , the product (1.5.5) is the same as all products

 z1×z2×...×zn,

in which we have of the identical to , identical to , …, and identical to à . These products correspond to the permutations with repetition of elements such that are identical, are identical, …, and are identical. Then, each product (1.5.5) occurs

 n(n1!×n2!×...×nk!)

times in the expansion. This puts an end to the proof.

### 1.6. Stirling’s Formula

#### 1.6.1. Presentation of Stirling’s Formula (1730)

The number grows and becomes huge very quickly. In many situations, it may be handy to have an asymptotic equivalent formula.

This formula is the Sterling’s one and it is given as follows :

 n!=(2πn)12 (ne)n exp(θn),

with, for any , for large enough,

 |θn|≤1+η(12n).

This implies, in particular, that

 n!∼(2πn)12 (ne)n,

as .

One can find several proofs (See [Feller (1968)], page 52, for example). In this textbook, we provide a proof in the lines of the one in [Valiron (1956)], pp. 167, that is based on Wallis integrals. This proof is exposed the Appendix Chapter 8, Section 8.4.

We think that a student in first year of University will be interested by an application of the course on Riemann integration.

## Chapter 2 Introduction to Probability Measures

### 2.1. An introductory example

Assume that we have a perfect die whose six faces are numbered from to . We want to toss it twice. Before we toss it, we know that the outcome will be a couple , where is the number that will appear first and the second.

We always keep in mind that, in probability theory, we will be trying to give answers about events that have not occurred yet. In the present example, the possible outcomes form the set

 Ω={1,2,...,6}×{1,2,...,6}={(1,1),(1,2),...,(6,6)}.

is called the sample space of the experience or the probability space. Here, the size of the set is finite and is exactly . Parts or subsets of are called events. For example,

(1) {(3, 4)} is the event : Face 3 comes out in the first tossing and Face 4 in the second,

(2) A = is the event : 1 comes out in the first tossing.

Any element of , as a singleton, is an elementary event. For instance is the elementary event : Face 1 appears in both tossings.

In this example, we are going to use the perfectness of the die, and the regularity of the geometry of the die, to the conviction that

(1) All the elementary events have equal chances of occurring, that is one chance out of 36.

and then

(2) Each event of has a number of chances of occurring equal to its cardinality.

We remind that an event occurs if and only if the occurring elementary event in .

Denote by the fraction of the number of chances of occurring of and the total number of chances , i.e.,

 P(A)=CardA36.

Here, we say that is the probability that the event occur after the tossings.

We may easily check that

(1) for all .

(2) For all , , parts of such that , we have

 P(A∪B)=P(A)+P(B).

Notation : if and are disjoint, we adopt the following convention and write :

 A∪B=A+B.

As well, if is a sequence of pairwise disjoint events, we write

 ⋃n≥0An=∑n≥0An.

We summarize this by saying : We may the symbol + (plus) in place of the symbol (union), when the sets of mutually mutually disjoint, that is pairwise disjoint.

The so-defined application is called a probability measure because of (1) and (2) above.

If the space is infinite, (2) is written as follows.

(2) For any sequence of events pairwise disjoint, we have

 P(∑n≥0An)=∑n≥0P(An)

Terminology.

(1) The events and are said to be mutually exclusive if . On other words, the events and cannot occur simultaneously.

(2) If we have , we say that the event is impossible, or that is a null-set with respect to .

(3) If we have , we say that the event a sure event with respect to , or that the event holds the probability measure .

Nota-Bene. For any event , the number is a probability (that occur). But the application, that is the mapping, is called a probability measure.

Now we are ready to present the notion of probability measures. But we begin with discrete ones.

### 2.2. Discrete Probability Measures

Let be a with finite cardinality or with infinite countable cardinality. Let ( be the class of parts of .

Definition 1. An application , defined from () to is called a probability measure on if and only if :

(1)

(2) For all sequences of pairwise disjoint events of , we have

 P(∑iAi)=∑iP(Ai).

We say that the triplet (, (), ) is a probability space.

Terminology. Point (2) means that is additive on (). It will be referred to it under the name of additivity.

#### 2.2.1. Properties of a probability measure

.

Let a probability space. We have the following properties. Each of them will be proved just after its statement.

(A) .

Proof : . By additivity, we have

 P(∅)=P(∅+∅)=P(∅)+P(∅)=2P(∅).

Then, we get

(B) If and if , then .

Proof. Recall the definition of the difference of subsets :

 B∖A=x∈Ω,x∈B as x∉A=B∩Ac.

Since , we have

 B=(B∖A)+A.

 P(B)=P((B∖A)+A)=P(B∖A)+P(A).

It comes that

 P(B)−P(A)=P(B∖A)≥0.

(C) If and if B, through .

Proof. This is already proved through (B).

(D) (Continuity Property of a Probability measure) Let be a non decreasing sequence of subsets of with limit , that is :

(1) For all 0, A

and

(2) A˙

Then (A) as .

Proof : Since the sequence is non-decreasing, we have

 Ak=A0+(A1∖A0)+(A2∖A1)+...........+(Ak∖Ak−1),

for all . And then, finally,

 A=A0+(A1∖A0)+(A2∖A1)+...........+(Ak∖Ak−1)+.......

Denote et , for . By using the additivity of , we get

 P(A)=∑j≥0P(Bk)=limk→∞∑0≤j≤kP(Bk).

But

 ∑0≤j≤kP(Bk)=P(A0)+∑1≤j≤kP(Ak∖Ak−1)
 =P(A0)+∑1≤j≤kP(Ak)−P(Ak−1)
 =P(A0)+(P(A1)−P(A0))+(P(A2)−P(A1))+...+(P(Ak)−P(Ak−1))
 =P(Ak).

We arrive at

 P(A)=∑j≥0P(Bk)=limk→∞∑0≤j≤kP(Bk)=limk→∞P(Ak)

Then

 limn→∞P(An)=(A).

Taking the complements of the sets of Point (D), leads to the following point.

(E) (Continuity Property of a Probability measure) Let be a sequence of no-increasing subsets of to , that is,

(1) For all 0, ,

and

(2) .

Then when .

At this introductory level, we usually work with discrete probabilities defined on an enumerable space . A probability measure on such a space is said to be discrete.

#### 2.2.2. Characterization of a discrete probability measure

Let Card Card , meaning that is enumerable, meaning also that we may write in the form : .

The following theorem allows to build a probability measure on discrete spaces.

Theorem Defining discrete probability measure on , is equivalent to providing numbers , , such that and so that, for any subset of of

 P(A)=∑ωi∈Api.

Proof. Let be a probability measure on , I . Denote

 P({ωi})=pi,i∈I.

We have

 (2.2.1) ∀(i∈I), 0≤pi≤1

and

 (2.2.2) P(Ω)=∑i∈IP({ωi})=∑i∈Ipi=1.

Moreover, if , with , , we get by additivity of ,

 P(A)=∑j∈JP({ωij})=∑j∈Jpij=∑ωi∈Api.

It is clear that the knowledge of the numbers allows to compute the probabilities for all subsets of .

Reversely, suppose that we are given numbers such that (2.2.1) and (2.2.2) hold. Then the mapping defined on by

 P({ωi1,...,ωik})=k∑j=1pij

is a probability measure on .

### 2.3. equi-probability

The notion of equi-probability is very popular in probabilities in finite sample spaces. It means that on the finite sample space with size , all the individual events have the equal probability to occur.

This happens in a fair lottery : if the lottery is based on picking numbers out of fixed numbers, all choices have the same probability of winning the max lotto.

We have the following rule.

Theorem. If a probability measure , that is defined of a finite sample set with cardinality with = , assigns the same probability to all the elementary events, then for all , we have,

 P({ωi})=1n

and for ,

 P(A)=CardAn
 P(A)=CardACardΩ
 (2.3.1) =number of favorable casesnumber of possible cases.

Proof. Suppose that for any , , where is a constant number between and . This leads to . Then

 p=1n.

Let . We have,

 P(A)=pi1+...+pik=p+....+p=kn=CardACardΩ.

Remark. In real situations, such as the lottery and the dice tossings, the equi-probability hypothesis is intuitively deduced, based on symmetry, geometry and logic properties. For example, in a new wedding couple, we use logic to say that : there is no reason that having a first a girl as a first child is more likely that having a girl as a first child, and vice-verse. So we conclude that the probability of having a first child girl is one half.

In the situation of equi-probability, computing probabilities becomes simpler. Is is reduced to counting problems, based on the results of chapter 2.

In this situation, everything is based on Formula (2.3.1).

This explains the importance of Combinatorics in discrete Probability Theory.

Be careful. Even if equi-probability is popular, the contrary is also very common.

Example. A couple wants to have three children. The space is .

In the notation above, GGG is the elementary event that the couple has three girls, GBG is the event that the couple has first a girl, next a boy and finally a girl, etc.

We suppose the the eight individual events, that are characterized by the gender of the first, and the second and the third child, have equal probabilities of occurring.

Give the probabilities that :

(1) the couple has at least one boy.
(2) there is no girl older than a boy.
(3) the couple has exactly one girl.

Solution. Because of equi-probability, we only have to compute the cardinality of each event and, next use Formula (2.3.1).

(1) The event A=(the couple has exactly on boy) is :

 A={GGB,GBG,GBB,BGG,BGB,BBG,BBB}.

Then

 P(A)=number of favorable casesnumber of possible% cases=78

(2) The event B=(there is no girl older than a boy) is :

 B={GGG,BGG,BBG,BBB}.

Then

 P(A)=number of favorable casesnumber of possible% cases=48=12