Curso Nptel - Reliability Engineering 4r4x3x

0

a

b x

Figure 4 – A normal distribution X with mean X and standard deviation S2

20

A short notation

N ( x, S x ) is often used for a normal distribution (Figure 14)

A very useful form of the distribution is one with a zero mean and unit standard deviation and is referred to as the ‘standard’ normal distribution. Thus if S is the standard normal random variable (or simply variate), its PDF is (Figure 14)

f s ( s) =

⎡ 1 ⎤ exp ⎢− s 2 ⎥ ,−α < s < α 2π ⎣ 2 ⎦ 1

This is also denoted by N (0, 1) and is symmetrical about zero. Its cumulative distribution function or CDF is often denoted by φ (s ) that is:

φ (s ) =F (s)=p s fs(s)

Note Æ Zero mean and unit standard deviation

Probability = p N(0, 1]

0

s

s

Figure 5 – A standard normal distribution

Where p is the probability p=P(S ≤ s). This probability p represents the area under the standard normal distribution to the left of s, that is, from - o to a. This distribution is available in tables and often values are given only for positive values of the standard normal variate. Thus values will start from 0.5 for s =0 and approach unity for increasing positive values of s. For negative values of the probability is obtained by subtraction from unity (the total area under the distribution being unity). Hence, we have

φ (− s) = 1 − φ ( s) This is obviously correct because the standard distribution is symmetrical about s=0. The reverse calculation, that is determination of the value of the variate s for a given cumulative probability p is often important and one may write

21

s = φ −1 ( p) Returning to tabulated values, as noted earlier the tables usually contain the CDF for positive values of the variate, s. Because of symmetry the CDF for negative values of a can be simply obtained using equation φ ( − s ) = 1 − φ ( s ) . Positive values of the variates are associated with CDF>O.5 or p>0.5. For values of p<0.5, the variates is given by s = φ −1 ( p) = −φ −1 (1 − p) [Note:

In some tables the values of cumulative probability start from zero even though only

positive values of the variate are considered. In such cases the should add 0.5 to the tabulated value for the left symmetrical half of the area]. 2.10.2. Application of standard normal variate

The first step is to obtain the standard variables s from the given mean and standard deviation of the random variable x, The relationship between x and s is obvious from the corresponding expressions for PDF and we have s=

x−x Sx

The probability that the random variable A lies between two limits a and b is given by the probability that the standard normal variate lies between x and s. and we have; P (a ≤ X ≤ b) = φ ( s 2 ) − φ ( s1 ) ⎛b− x⎞ ⎛a− x⎞ ⎟ − Φ⎜ ⎟ = Φ⎜⎜ ⎟ ⎜ S ⎟ S ⎝ x ⎠ ⎝ x ⎠ 2.10.3. Logarithmic normal distribution

Consider a random variable X which does not follow a normal distribution but whose natural logarithm (In X) has a normal distribution. The variable X is then said to have a logarithmic normal or log-normal probability distribution and its density function is fx =

⎡ 1 ⎛ ln x − α ⎞ 2 ⎤ ⎟⎟ ⎥ exp ⎢− ⎜⎜ 2π β x ⎣⎢ 2 ⎝ β ⎠ ⎦⎥ 1

,0 ≤ x ≤ α

In which

22

α = E (ln X ) = ln X and

β = Var (ln X ) = S ln X are respectively the mean and standard deviation of ln X and are the parameters of the distribution. Assumption of a lognormal distribution is often preferred to the assumption of a normal distribution for random variables which must have a positive value. For instance the factor of safety F is, by definition. a positive quantity. Therefore, it appears desirable to adopt F as a lognormal variate than at a normal variate. Figure 16 shows a typical lognormal distribution. It is easy to show that the tabulated values of the CDF of a standard normal distribution can be used for a lognormal distribution as well. The probability of X being in the interval (a,b) is ⎛ ln b − α ⎞ ⎛ ln a − α ⎞ ⎟⎟ − Φ⎜⎜ ⎟⎟ P (a < X ≤ b) = Φ⎜⎜ ⎝ β ⎠ ⎝ β ⎠

The probability of X being less than or equal to unity is ⎛ ln 1 − α ⎞ ⎛ −α ⎞ ⎟⎟ = Φ⎜⎜ ⎟⎟ P ( X < 1) = Φ⎜⎜ ⎝ β ⎠ ⎝ β ⎠

fx(x)

It can be shown that in of x and Sx ,α and β are as

0 x

Figure 6 : Lognormal distribution showing typical shape of PDF

A random variable X has a logarithmic normal probability if lnX is normal; the density function similar to normal distribution is as written as

23

f x (x ) =

⎡ 1 ⎛ ln X − λ ⎞ 2 ⎤ exp ⎢− ⎜ ⎟ ⎥ 2π Gx ⎠ ⎦⎥ ⎣⎢ 2 ⎝ G 1

0≤ x≤∞

where

λ = E (ln X ) = mean

G = Var (ln X )

Where λ and G are the parameters of the distribution. Probability associated with a log-normal variate can be determined from standard normal probabilities. The probability that the variable X will assume values in an interval (a,b) is P(a < x ≤ b ) = ∫

⎡ 1 ⎛ ln X − λ ⎞ 2 ⎤ exp ⎢− ⎜ ⎟ ⎥dx 2π Gx ⎣⎢ 2 ⎝ G ⎠ ⎦⎥ 1

b

a

let s=

ln x − λ , G

then dx = xG ds and

P(a < X ≤ b ) =

1 2π

⎛ λ ⎞ ⎛1⎞ 3 ln ⎜ b − ⎟ −⎜ ⎟ s ds ⎝ G⎠ ⎝2⎠ ⎛ λ⎞ ln ⎜ a − ⎟ ⎝ G⎠

∫

e

⎛ ln b − λ ⎞ ⎛ ln a − λ ⎞ = φ⎜ ⎟ − φ⎜ ⎟ ⎝ G ⎠ ⎝ G ⎠

since log-normal distribution can be evaluated using normal distribution itself and since the value of the random variable are always positive , the log-normal distribution may be useful where the value of the variates are strictly positive Ex: Intensity of rainfall If the log-normal is bounded between y(a) and y(b) it is such that y (a ) = exp(− ∞ ) and y (b ) = exp(+ ∞ )

If E(x) and V(y) are the mean and CoV of a log normal then

24

⎛ ⎝

1 2

⎞ ⎠

µ = exp⎜ X + ξ 2 ⎟ 1 2 ξ = represents CoV as ξ = Var (ln X )

λ = ln µ − ξ 2

ξ=

σ = δ = CoV µ

if µ ≤ 0.3

2.11. Beta distribution

In geotechnical engineering one is often concerned with random variables whose Values are bounded between finite limits. For example, the angle of internal friction for given sand has specific limits to its value depending largely on the relative density (density index) of that sand. Similarly values of unit weight γ and un-drained shear strength cu, lie between finite limits. In such cases it is generally unrealistic to assume variation from - α to +α (as in normal distribution, or 0 to α as in lognormal distribution. Certainly it is true that all distributions lead to results of similar accuracy for values of the random variable in the central region of a distribution. Yet when one is concerned primarily with the tails of a distribution, significant differences in results are obtained depending on the choice of distribution. Putting it differently, if one is concerned with relatively high probabilities, the choice of a distribution may not be critical. However, if one is concerned with relatively low probabilities say <10-2) the choice of a distribution determines the accuracy and even the order of magnitude of the answer. In some civil engineering problems, instance, highway cuttings, high failure probabilities may be acceptable. In others, for example, earth dams and multi-storeyed buildings, low failure probabilities must be ensured. In special cases, for instance, foundations of nuclear power plants and other very sensitive structures, extremely low failure probabilities and low probabilities against settlement or differential settlement of a certain magnitude must be ensured. Therefore in some cases the choice of distribution is critical. A probability distribution is appropriate for a random variable who’s values are bounded between finite limit a and b in the bate distribution. The density function of such a distribution is f x (x ) =

1 (x − q ) (b − x ) B(q, r ) (b − a )(q =r −1) ( q −1)

( r −1)

(a ≤ x ≤ b ) 25

In which q and r are the parameters of the distribution and B(q,r) is the bets function 1

B(q, r ) = ∫ x (q −1) (1 − x ) 0

( r −1)

dx

and is related to gamma function as follows B(q, r ) =

T (q ) * T (r ) T (q + r )

Depending on the parameters q and r the density functions of the bets distribution will have different shapes. If the values of the variate are limited between 0 an d 0.1 (i.e. a=0 and b= 1.0) , then the above equation for fx(x) becomes f x (x ) =

1 ( r −1) x (q −1) (1 − x ) B(q, r )

0 ≤ x ≤ 0.1

The shape of the distribution becomes important when one is concerned with very low probability of failure. For example when one is concerned with high probabilities of safety or high reliability > 0.99, then choice of distribution plays a dominating role

−∞

∞

It also depends as the type of problem foe example for highway shallow cutting and mine works. A reliability of 0.90 to 0.95 is also acceptable where as for earth dams and multi storeyed buildings, the reliability should be in the range of 0.999 to 0.9999.

26

Q=2.0

0

2

4

6

8

10

r=6.0

12

The mean and the variance of the bets distribution are given by

µx = a + σ X2

q (b − a ) q+r qr

(q + r ) (q + r + 1) 2

(b − a )2

a = 5 b = 10 q (10 − 5) = 7 q=r q = 2r 3

5+

We have

qr

(q + r ) (q + r + 1) 2

q = 3.26

(b − a )2 = (0.1 * 7 )2

and

r = 4.89

The beta distribution (Figure 17) is appropriate for random variables which have a finite range. The uniform distribution, already considered earlier, is the simplest example of a beta distribution. Considering the limits a and b for a random variable X. the PDF of a beta distribution may be written in the form

fx (X ) =

1

β ( q,r )

(x − a )q −1 (b − x )r −1 (b − a) q + r −1

a≤ x≤b

in which q and rare the two parameters which determine the shape of the distribution and B(q,r) is the beta function

27

0.2

fx(x)

q = 2.0 : r = 6.0

0.1

0 0

2

4

6

8

10

12

x

Figure 7 – Beta distribution 1

B(q, r ) = ∫ x q −1 (1 − x) r −1 dx 0

The mean and variance of the beta distribution are: q (b − a) 1+ r qγ S x2 = (b − a) 2 2 (q + r ) (q + r + 1) x=a+

The standard beta distribution may be defined as one which has a=0 and b=1. The standard PDF is symmetrical for q=r=3 and it is uniform with a density of unity for q=r=1.0 (Figure 18)

q = 1.0 3

r = 4.0 q = r = 3.0

q = 4.0

fx(x)

r = 2.0 q = r = 1.0

0

0.5

1.0

Figure 8 – Various shapes ofx standard beta distribution

28

Probability calculation are facilitated by the incomplete beta function which is defined as x

B x (q, r ) = ∫ y q −1 (1 − y ) r −1 dy

0 < x < 1 .0

0

The probability of X being between limits c and d is given by P (c ≤ X ≤ d ) =

1 B( q , r )

[Bu (q, r ) − Bv (q, r )]

Where u=

d −a b−a

and

v=

c−a b−a

2.12. Binomial and geometric distributions

Assume that repeated trials of an event are made, the probability p of occurrence in each trial is constant and the trials are statistically independent. Then considering that each trial has only two outcomes either occurrence or non-occurrence, the problem may be modeled as a Bernoulli sequence. One may apply this to rainfall, flooding, earthquakes etc. Which affect the performance

of

geotechnical

structures?

The probability of exactly a occurrences among n trials in a Bernoulli sequence is given by the binomial

distribution:

⎛n⎞ P ( X = x) = ⎜ ⎟ p x (1 − p ) n − x , ⎝ x⎠

the

equation

for

the

PMF

being:

x=0,1,2,…..,n

Where, n! n = x x!(n − x)!

is the binomial coefficient and a and p are parameters. The probability of realizing one particular sequence of exactly x occurrences of the event among n trials is px(l —p)n-x. However, the sequence of trials can be permuted n times; therefore, the

number of sequences with exactly x occurrences is n x

29

The number of trials until an event occurs for the first time is governed by the geometric distribution, Let T be the random variable concerning the number of trials for first occurrence and Let this occur at T=t. Then we have P(T = t ) = pq t −1 ,

t=1,2,…

Where (q=p—1) This is known as the geometric distribution obtained by substituting x=1 and n= t in Equation. The first occurrence time is considered equal to the recurrence time and the latter also has a geometric distribution; the mean recurrence time also known as the return period is: α

T = E (T ) = ∑ t. pq t −1 = p (1 + 2q + 3q 2 + ...) t −1

Since q< 1.0 series summation gives T = 1 / p or average return period is the reciprocal of the probability of the event within one time unit. So far we have considered either the number of trials or time units until the first occurrence. The time until the next occurrence is governed by the negative binomial distribution. The probability of (Tk = t) where Tk is the number of trials until the kth occurrence is ⎛ t − 1 ⎞ k t −k ⎟⎟ p q P(Tk = t ) = ⎜⎜ ⎝ k − 1⎠

for t=k,k+1,…=0 from t < k

Models for simple discrete random trials

A basic situation at certain times is that if outcomes of experiments can be separated into two exclusive categories good or bad, failure or success etc. We are interested in the simplest kind of experiments, the out comes of which can be either failure or success. The two events are mutually exclusive, collectively exhaustive possible outcomes. This is called Bernoulli trial The binomial distribution is given by

()

b(x, N, R) = N R x p N − x x

=

N! R x P (N −x) (N − x )!

30

The binomial distribution models the outcomes of experiments for which the following properties hold are 1. An experiment is repeated N tims with the outcome of each trial being independent of others 2. Only two mutually exclusive outcomes are possible called success ot failure , x is the no of successes 3. The probability of success in each trial denoted by R remains the same for all trials , p is the probability of non-occurrence R + p =1 4. The experiment is performed under the same conditions for all N trials 5. The interest is the number of success x in the N trials and not in the order they occur. Geometric distribution

Assuming independence of trials and a constants values of P, the distribution of N , the number of trials for the first success can be found. The first success will occur only on the nth trial and if only (n-1) are failures P[N = n] = (1 − P )

n −1

P

This is called geometric distribution

FIGURE

The probability that there is at least one occurrence in n trials = 1-P[no. of occurrence in n trials] = 1 − (1 − P )

n

moments of geometric distribution

31

E(N)=1 / P

[ ]

Var [N ] = E N 2 − E 2 [N ] =

1− P P2

Design values and return periods , civil engineering systems must withstand the effect of rare events such as large floods or high winds . it is necessary to consider the risks involved in the choice of design capacity In a design, one can estimate the maximum magnitude of rare events which the structure can withstand (maximum wind velocity) Average return periods The expected value o geometric distribution is α

E [N ] = ∑ nP(1 − P )

n −1

n =1

= P * 2 + 2 P(1 − P ) + 3P(1 − P )

2

= P + 2 P − 2 P 2 + 3 P − 3P 2 = 6 P − 5P 2 = P(6 − 5P ) The probability that there will be no events greater than 50years m-flood in 0years is B[m,1 / m] 1⎞ ⎛ = ⎜1 − ⎟ ⎝ m⎠

m

⎛ 1 ⎞ m(m − 1) ⎛ 1 ⎞ = 1 − m⎜ ⎟ + ⎜ ⎟ + .......... 2 ⎝m⎠ ⎝m⎠ 2 3 4 4 4 = 1− + + + ................. 11 21 31 = 0.368 r

if n = m *

1 m

The probability that one or more events will occur in m years is (1-e-1)=0.632 thus a system can set affected by a rare event within its return period is 0.632

32

2.13. Poisson, exponential and gamma distributions

Many physical process occurrences of fatigue crakes, earthquakes occurring at any tie in an earthquake prone region, occurrence of accidents on a highway may be modeled using Bernaoulli sequence, dividing the time interval r space into smaller intervals and considering whether the event will occur or not occur. If the vent can occur at any instance and again can occur the event may be modeled as a poissons process. Assumptions: 1. An event can occur at random at any time or any point in space. 2. The occurrence of an event in a given time (space) is independent of any other in other interval. 3. The probability o occurrence of an event in a small interval ∆t is proportional to ∆t and can be given by r ∆t .where r is the mean rate of occurrence. The number of occurrence of an event in t is given by Poisson distribution i.e. if Xt is the number of occurrences in time (or space) then,

P( X t = x ) =

(γt )x e −vt

where x = 0,1,2......... x! γ is the mean occurence rate E ( X t ) = γt var inace of X t = γt Poissions distribution (derivation from binomial distribution)

()

PX (x ) = n * P x − (1 − P )

(n − x )

x

when the time distribution are reduced smaller and smaller the number of trials (n) increases and the probability Pof success decreases. But the expected number of events is np. Say np = γ as nÆ ∞ , PÆ 0, npÆ ∞

33

Substituting in the above equation

Px (x ) =

=

=

n! ⎛γ ⎞ ⎜ ⎟ (n − x )! x! ⎝ n ⎠

γx⎛

γ⎞

n

⎜1 − ⎟ x! ⎝ n ⎠

x

⎛ γ⎞ ⎜1 − ⎟ ⎝ n⎠

(n − x )

n! (n − x )!n x ⎛⎜1 − γ ⎞⎟ x ⎝ n⎠

γ⎞ ⎧

1.2...............n ⎫ 1 ⎜1 − ⎟ ⎨ ⎬ x! ⎝ n ⎠ ⎩1....(n − x + 1)(n − x ) ⎭ ⎧ ⎛ γ ⎞⎫ x ⎨ n⎜ 1 − ⎟ ⎬ ⎩ ⎝ n ⎠⎭

γx⎛

n

⎫ ⎧ ⎪ ⎪ n γ x ⎛ γ ⎞ ⎪ (n − x + 1)....(n − 3)(n − 2 )(n − 1)n ⎪ = ⎜1 − ⎟ ⎨ ⎬ x x! ⎝ n ⎠ ⎪ ⎧ ⎛ γ ⎞⎫ ⎪ ⎨ n⎜ 1 − ⎟ ⎬ ⎪ ⎪ ⎩ ⎝ n ⎠⎭ ⎭ ⎩

⎛ γ⎞ For large values of n this term is nearly 1 and ⎜1 − ⎟ = e −γ ⎝ n⎠ n

Hence PX ( X ) =

γ x e −γ x!

x = 0,1,2,...................

this is known as poisson distribution

34

∞

[X ] = ∑

γ x e −γ x!

x =0

∞

= γ∑ x =1

γ ( x −1)e −γ

(x − 1)!

∞

∑ put

y = ( x − 1)∑ [X ] =γ ∑

γ x e −γ

y =0

y!

V [x ] = γ we can also put V = λt and say Px ( x ) =

(λt )e −λt x!

This is called pisson process To be a poisson process 1. The probability of incident in a short interval of time t to t+h is approximately λh for any t 2. The probability of two or more events in a short interval of time is negligible 3. The number of incidents in any interval of time is independent of the number in any nonoverlapping interval Property of Poisson distribution E [xi ] = µ

v[xi ] = µ

β (1) =

1

β (2) =

µ

1 + 3µ

µ

The Poisson process is useful where an event may occur anywhere in a space and time framework and is based on the following assumptions (1) The occurrence of an event in a given interval is independent of its occurrence in other intervals. (2) The probability of occurrence of an event in a small interval is proportional to that interval and (3)

the The

proportionality mean

constant rate

of

v,

describes

the

mean

occurrence

v

is

rate

of

occurrence.

assumed

constant,

35

(4) The probability of two or more occurrences in the chosen small time interval ∆t is negligible. If X, is the number of occurrences in time (or space) interval t. P ( X t = x) =

(vt ) x −vt e , x!

x=0,1,2…..

It is obvious that

E ( X t ) = X t = vt It can be shown that the variance is the same as exception, i.e. S x2 = vt

The occurrences of an event between intervals are statistically independent as are the occurrence of an event between trials in the case of the Bernoulli sequence. An extension of the Poisson process is the important case where the occurrence of an event is influenced by the occurrence in the previous time interval. Thus the probability of occurrence is a conditional one and the model used for determining it is called the Markov process or Markov chain. If the Poisson process governs the occurrence of an event then the time T to first occurrence has an exponential distribution also referred to as the negative exponential. We have for the probability

that

no

event

occurs

in

time

t:

P (T1 > t ) = P ( X t = 0) = e vt

The PDF is

f T1 (t ) = ve −vt

t≥0

The CDF is

FT1 (t ) = P(T1 ≤ t ) = 1 − e − vt If

v

is

E (T1 ) = T1 =

independent

of

t

and

hence

constant

the

mean

of

T1

is

1 v

This is the mean recurrence time or return period and may be compared to 1/p for the Bernoulli sequence. For small time interval the two are nearly equal. If it is desired that the PDF for an exponential distribution should start at a value of greater than

36

zero (i.e. not through the origin), we may use the shifted exponential distribution. The time until the kth occurrence is described by the gamma distribution. If Tk denotes the time until the kth event

f Tk (t ) =

v(vt ) k −1 −vt e (k − 1)!

The exponential and gamma distributions are related to the Poisson process in the same way that the geometric and negative binomial distributions are related to Bernoulli sequence. Exponential distribution

The exponential distribution is related to the poisson distribution as follows. If the events occur as per poisson process, then the time T1 till the first occurrence of the event is an exponential distribution. This means that in the interval (T1>t) no evens occurs

P(T1 > t ) = P( X t = 0) = e −γt this is the first occurrence time in a poisson process Distribution function of T1 fT 1 (t ) = P (T 1 ≤ t ) = 1 − e − γ t fT 1 (t ) =

df = γ e −γt t ≥ 0 dt

if γ is a constant then mean value of T MT1=1 / γ 1 Mean recurrence time = γ

37

Gamma Distribution

If the occurrence of an event constitutes a poisson process, then the time until th kth occurrence of the event is described by gamma distribution. Let Tk denote the time till the kth event , then

(Tk

≤ t ) means that Kor more events occur in time t.

corresponding density function is

γ (γt )k −1 −γt f TK (t ) = e (k − 1)! mean time till the occurrence of kth event = E (Tk ) =

k v

Variance Var (Tk ) =

k v2

2.14. Hyper geometric distribution

In quality control, the use of a distribution for sampling acceptable from unacceptable items is desirable. Let m elements be defective among N elements then if a sample of items is taken randomly, the probability of x defective items in the sample is given by the hypergeometric

distribution.

⎛ m ⎞⎛ N − m ⎞ ⎟ ⎜⎜ ⎟⎟⎜⎜ x ⎠⎝ n − x ⎟⎠ ⎝ P ( X = x) = ⎛N⎞ ⎜⎜ ⎟⎟ ⎝n⎠

This

is

written

as

follows:

x=1,2,…m

N The number of samples of size n in the finite population N is n

⎛ m ⎞⎛ N − m ⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ x n − x ⎝ ⎝ ⎠ ⎠ The number of samples with x defective elements is

38

Assuming that all samples of size n are equally to be chosen the above equation is obtained. The hyper geometric distribution arises when samples from a finite population (for example, consisting two types of element like good or bad.) are examined. Consider a lot of N items, m of which are defective and the (N-m) are good. If a sample of n items is taken (at random) from this lot, the probability of x defective items in the sample is given by the hyper geometric distribution.

(m)⎛⎜⎜⎝ Nn −− mx ⎞⎟⎟⎠ P ( X = x) = (N ) x

x = 1,2,.............m

n

()

in the lot the number of sample of size n is N n

()

⎛ N − m⎞ ⎟⎟ Number of ways in which x defective samples can be shown = m ⎜⎜ n ⎝ n−x ⎠ 2.15. t distribution, covariance and correlation

If X and Y are two random variables then probabilities associated with any pair of values

x

and

y

may

be

described

by

a

t

distribution

function.

e.g.

FX ,Y ( x, y ) = P ( X ≤ x, Y ≤ y )

For discrete random variables the t PMF (Figure above) may be used p X ,Y ( x, y ) = P ( X = x, Y = y ) y

y

f x , y ( x, y )

p x , y ( x, y )

x (a)

x (b)

Figure 9 – (a) t PMF of X and Y (b) t PDF of X and Y

39

For a continuous random variable the t PDF ( Figure 19) may be defined by: f X ,Y ( x, y ) dxdy = P ( x < X ≤ x + dx. y < Y ≤ y + dy )

The marginal density functions of this t distribution are α

∫f

f x ( x) =

X ,Y

( x, y )dy

X ,Y

( x, y )dx

−α

f Y ( y) =

α

∫f

−α

The CDF is given by the volume under the surface f(x,y) and is given by x

FX ,Y ( x, y ) =

y

∫ ∫f

X ,Y

(u, v)dudv

−α −α

Description of a t distribution of two random variables requires five statistical parameters namely, the mean and standard deviation of each variable and the correlation coefficient between them. This coefficient is denoted by p and is the ratio of the covariance denoted

ρ X ,Y =

by

cov(x,y)

and

the

product

of

the

standard

deviations

cov( x, y ) SxSy

The covariance itself is defined as the t central second moment, that is, the expectation of the product ( X − x )(Y − y ) and hence

cov( X , Y ) = E[( X − x)(Y − y )] = E ( XY ) − E ( X ) E (Y ) If X and Y are statistically independent then E(XY)=E(X)E(Y)

and cov(X,Y)=0

If two variables are statistically independent then the variables are uncorrelated, however, the reverse is not true, if the variables are uncorrelated, they may not be statistically independent. The correlation coefficient p may vary from -1 to +1 and may be regarded as a normalized covariance. It is a measure of the linear relationship between the two random variables (Figure 20).

40

If X and Y (cohesion and friction) are two random variables, then the probability associated with any pair of values x and y may be described by a t distribution function. Eg : F( X ,Y ) ( x, y ) = P( X ≤ x, Y ≤ y ) For discrete random variables the t PDF is defined as PX ,Y (x, y ) = P( X = x, Y = y )

For a continuous random variable the t PDF is defined as f X ,Y ( x, y )dxdy = P( x ≤ X ≤ x + dx, y ≤ Y ≤ y + dy ) Marginal density functions of the t distribution ∞

f x ( x ) = ∫ f x , y ( x, y )dy −∞ ∞

f y ( y ) = ∫ f x , y ( x, y )dx −∞

at any give x at any give y

CDF of x and y is the total volume and are dependent on each other, another variable linking the two variables comes into picture. This is given by

ρ x, y =

CoV (x, y )

ρxρy

CoV = E [( X − x )(Y − y )]

= E ( XY ) − E ( X )E (Y )

This is similar to parallel axis theorem Given Constraints

Assigned probability distribution

∫ f (x )dx = 1

Uniform

∫ f (x )dx = 1 expected value

Exponential

∫ f (x )dx = 1 expected value, SD

Normal

∫ f (x )dx = 1 expected value, SD, min,

Beta

b

a

b

a

b

a

b

a

max

∫ f (x )dx = 1 mean rate of occurrence b

Poissons

a

between two independent events

41

Pearson’s system In binomial model, it is assumed that all the outcomes are equally likely which means that sampling is done with replacement from a finite population of N

⎛N⎞ b(x, N , R ) = ⎜⎜ ⎟⎟ R x P ( N − x ) ⎝x⎠ N! b ( x, N , R ) = R x P (N −x) (N − x )! x!

for binomial distribution

Suppose sampling is done without replacement from a collection N and the lot N contains K number of samples which have a particular characteristic and (N-k) which do not have it. If a sample in selected from this collection, either it is from K or (N-k). Suppose r random samples are drawn without replacement from N items. The probability that x of the r samples are of the type k is given by

⎛ k ⎞⎛ N − k ⎞ ⎟ ⎜⎜ ⎟⎟⎜⎜ x ⎠⎝ r − x ⎟⎠ ⎝ h ( x, N , r , k ) = ⎛N⎞ ⎜⎜ ⎟⎟ ⎝r⎠ This is called hyper geometric distribution Binomial distribution yields Normal distribution. The limit of the hyper geometric distribution must produce a more flexible continuous probability distribution capable of better representing skewed variation. For symmetrical distributions all moments of odd order about the mean are zero, which means that any odd-ordered moment may be used as a measure of the degree of skew ness. Moments N

M = ∑ fi i =1

Consider a system of discrete parallel forces f1, f2, ………..fN actually on a rigid beam at the respective distances x1, x2,………xN From statistics

42

N

M = ∑ fi i =1

and its point of application x is given by N

x=

∑x

i

fi

i =1

M

we can consider that f1, f2, ………..fN represent the probability of all possible occurrences of the N outcomes x1, x2,………xN Since the distribution is exhaustive M=1 N

x = E [x ] = ∑ xi f i i =1

where E[x] is the expected value of x. In general, it denote the central tendency of the distribution The expected value of the distribution can be considered as first moment of the distribution and the concept can be generalized to Kth moment as follows N

E [xi ] = ∑ xik fi k

i =1

we know from the statistics that the moment of inertia (MI)

I y = ∑ ( xi − x ) fi 2

we can have similar concepts and

V [xi ] = ∑ ( xi − x ) fi 2

Ex

(

V [xi ] = E ( xi − x ) = E xi2 + x − 2 xi x

[ ] = E [x ] − 2 x + x = E [x ] − (E [x ])

= E xi2 + E [x ] − 2 x E [xi ]

)

2

2 i 2 i

2

2

2

i

1. if x is a random variable a and b are constant E [ax + b] = aE [x ] + b

2. if x = x1+ x2 + x3…….+ xN then E [x ] = E [x1 ] + E [x 2 ] + E [x3 ] + ......E [x N ]

43

3. If f1(x) and f2(x) are two rvs, E [ f1 ( x ) + f 2 ( x )] = E [

]

4. It is seen that the variance has dimensions of square of the RV

σ [xi ] = V [xi ] 5. If xi is a random variable, a and b are constants V [axi + b] = a γ V1 [xi ] = a 2 (σ [xi ]) γ

2

For symmetrical distribution all moments of odd order about the mean must be zero. Consequently an odd ordered moment may be used as a measure of degree of skew ness . The

[

third moment E ( x1 − x 2 )

3

] of a probability distribution can be considered to represent the skew

ness. Since the units of that central moment are cube of the units of the variable. To provide an absolute measure of skew ness , Pearson designed an absolute term called coefficient of skew ness.

β (1) =

[

E ( xi − x )

3

(σ [xi ])

3

]

if β(1) is positive the corresponding distribution is negative.

44

Poisson also proposed a dimensionless coefficient of Kurtosis

β (2) =

[

E ( xi − x )

(σ [xi ])

4

4

]

This is a measure of peaked ness. A distribution is said to be flat if β(2)<3 f s (s ) =

1

2π φ (S P ) = P

* e (−1 2 )s

2

−∞ < s < ∞

S P = φ −1 (P )

The standard normal density function

45

φ (− S ) = 1 − φ (S ) f s (s )

N (0,1)

Reduction of data to standard Normal variate form Suppose we have a normal variate X with distribution N(µ,σ)

⎡ 1 ⎛ x − µ ⎞2 ⎤ ∫ exp⎢⎢− 2 ⎜⎝ σ ⎟⎠ ⎥⎥ 2π a ⎣ ⎦ 1

P(a ≤ x ≤ b ) = S=

x−µ

σ

b

dx = σ ds

P(a ≤ x ≤ b )

b− µ

1

∫ σσµ exp[− 1 2 s ]ds r

σ 2π

a−

The values of S correspondingly to probability of P<0.5 may be obtained as S = φ −1 (P ) = −φ −1 (1 − P ) Standard normal function: the density function is given by f (s ) =

1 2π

e (−1 2 )S

2

−∞ < S < ∞

Because of its wide usage a special notation φ(s) is commonly used to designate the distribution function of the standard normal variate

46

φ (S ) = P

where

⎛x−µ⎞ S =⎜ ⎟ ⎝ σ ⎠

The tables give the probability of only positive values of the variate .However the probability of negative values of the variate can be obtained as

φ (− S ) = 1 − φ (S )

S = φ −1 (P ) = −φ −1 (1 − P )

2.16. Moments of functions of random variables 2.16.1. Sum of variates x1,x2 etc

Consider a function Y which is dependent on two random variables X1 and X2. thus (a1 and a2 are constants)

Y = a1 X 1 + a 2 X 2 Then it can be shown that y = E (Y ) = a1 x1 + a 2 x 2

and V ( y ) = a12V ( x1 ) + a 22V ( x 2 ) + 2a1 a 2 cov( x1 x 2 )

Y = a1 X 1 + a 2 X 2 Now if y = a1 x1 − a 2 x 2 and V ( y ) = a12V ( x1 ) + a 22V ( x 2 ) − 2a1 a 2 cov( x1 x 2 )

47

Figure 10 – Coefficient of correlation ρ

In general, if n

n

Y = ∑ a1 X i ,

y = ∑ ai xi

i =1

i =1

n

n

n

i =1

i

j

V ( y ) = ∑ ai2V ( xi )∑∑ ai a j cov( xi x j ) Suppose Z is another function of random variable X, i.e. n

Z = ∑ bi X i i =1

n

n

i =1

i≠ j

cov(Y , Z ) = ∑ ai biV ( xi ) + ∑

n

∑a b i

j

cov( xi x j )

Product of independent variates x1,x2,x3 etc Z = X 1 X 2 X 3 ..... X n Z = E ( Z ) = x1 x 2 x3 .........x n 2 2 2 2 2 2 If S Z = E ( X 1 ) E ( X 2 ) E ( X 3 )......E ( X n ) − ( x1 x 2 x3 x 4 .......x n )

First order approximation for general functions Let Y=g(X)

48

Then by expanding g(X) in a Taylor series about the mean value x the following first-order approximation can be made 1 d 2g y = E (Y ) ≈ g ( x) + V ( x) 2 2 dx ⎛ dg ⎞ V ( y ) ≈ V ( x)⎜ ⎟ ⎝ dx ⎠

2

Good approximation of exact moments is obtained if g(X) is approximately linear for the entire range of values of X (even if the second term in the expression for the mean is neglected; this is generally done) Now if Y is a function of several variables X1, X2, X3 etc. Y = g ( X 1 , X 2 , X 3 ,...... X n ) the corresponding first-order approximations are

y = E (Y ) ≈ g ( x1 , x 2 , x3 ........x n ) 1 n + ∑ 2 i =1

⎛ d 2g ⎜ ∑ ⎜ j =1 ⎝ dx i dy j n

⎞ ⎟ cov( xi x j ) ⎟ ⎠

n

n

i =1

i≠ j

V ( y ) ≈ ∑ ci2V ( xi ) + ∑ where

ci,

evaluated at

and

n

∑c c i

cj

j

cov( xi x j )

are

the

partial

derivatives

(dg/dxi)

and

(dg/dxj)

x1 , x 2 , x3 ,.......x n .

The second term of the first equation is generally omitted. The second term of the second equation is not omitted but will vanish if x1,x2…..xn are uncorrelated or statistically independent.

49

Example on PDF and CDF The undrained shear strength cu of a stratum of clay has a uniform probability distribution, the maximum and minimum values of uniform distribution being 25 kN/m2 and 50 kN/m2. What is the probability that the undrained shear strength has magnitude (a)less than 40 kN/m2,(b)less than 30kN/m2 (c)less than 10 kN/m2 and (d) greater than 55 kN/m2.

fx(x)

PDF

h = 1/200 1.00

x

200

CDF

Fx (x)

Fx ( x) = 0

x

x 200 200

Figure 1 – Uniform probability density function with associated CDF The area under the probability density function must be unity. In this case the abscissa or the rectangle is (50-25)=25. Therefore the height or the base of the rectangle (i.e. the uniform probability density px ) is given by equating the area to 1. p cu × 25 = 1,∴ p cu =

1 25

We use this value as follows: (a)

p1 = P(cu ≤ 40) 1

This

probability

is

the

area

of

the

rectangle

between

the

ordinates

at cu=25 (minimum value) and cu=4O ∴ p1 = (40 − 25) ×

(b)

1 = 0.6 25

p 2 = P (cu ≤ 30) = (30 − 25) ×

1 = 0 .2 25

(c) 10 kN/m2 is outside the range 25—50, Accordingly

p3 = P(cu ≤ 10) = 0

(d) 55 kN/m2 is outside the range 25— 50. Accordingly, p 4 = P(cu > 55) = 0

Example on Normal distribution Example 1 From records, the total annual rainfall in a catchments area is estimated to be normal with a mean of 60 inches and standard deviation of 15 inches a. What is the probability that in future years the annual rainfall will be between 40 to 70 inches b. What is the probability that the annual rainfall will be at least 30 inches c. What is the 10 percentile annual rainfall

2

a.

⎡ 1 ⎛ x − µ ⎞2 ⎤ P(a ≤ X ≤ b ) = exp ⎢− ⎜ ⎟ ⎥ σ 2π ∫a ⎢⎣ 2 ⎝ σ ⎠ ⎥⎦ x−µ S= dx = σ ds 1

b

σ

⎛b−µ ⎞ ⎛a−µ ⎞ P(a ≤ X ≤ b ) = φ ⎜ ⎟ − φ⎜ ⎟ ⎝ σ ⎠ ⎝ σ ⎠ ⎛ 70 − 60 ⎞ ⎛ 40 − 60 ⎞ P(40 ≤ X ≤ 70) = φ ⎜ ⎟ − φ⎜ ⎟ ⎝ 15 ⎠ ⎝ 15 ⎠ = φ (0.67 ) − φ (− 1.33) = φ (0.67 ) − [1 − φ (1.33)] = 0.748571 − (1 − 0.908241) 0.6568

from table

b.

⎛ 30 − 60 ⎞ P( X ≥ 30) = φ (∞ ) − φ ⎜ ⎟ ⎝ 15 ⎠ = 1 − φ (− 2.0) = 1 − [1 − φ (2.0)] = 0.9772 P( X ≥ 60) = φ (∞ ) − φ (0) = 1.0 c. P[ X ≤ x0.1 ] = 0.10 ⎛ x 0.1 − 60 ⎞ ⎟ = 0.10 ⎝ 15 ⎠ x 0.1 − 60 = φ −1 (0.10 ) 15 = − φ −1 (0.90 )

φ⎜

= − 1.28 x0.10 = 60 − 19.2 = 40.8 inches

3

Example 2.

A structure is resting on three s A,B and C. Even though the loads from the roof s can be estimated accurately , this soil conditions under A, B and C are not completely predictable. Assume that the settlement ρ A , ρ B and ρ C are independent , normal variates with mean of 2,2.5 and 3 cm and CoV of 20%, 20% and 25% respectively. a. What is the probability that the maximum settlement will exceed 4cm b. If I is known that A and B have settled 2.5cm and 3.5cm respectively . What is the probability that the maximum differential settlement will not exceed .8m and also what if it does not exceed 1.5 cm Answer a. P(max ρ > 4 cm ) = 1 − ρ (max ρ ≤ 4 cm ) SD = Mean * CoV σ A = 2 * 0.2 = 0.4

σ B = 2.5 * 0.2 = 0.5 σ C = 3 * 0.25 = 0.75

= 1 − P(ρ A ≤ 4)P(ρ B ≤ 4)P(ρ C ≤ 4) ⎛ 4 − 2 ⎞ ⎛ 4 − 2.5 ⎞ ⎛ 4 − 3 ⎞ = 1 − φ⎜ ⎟φ ⎜ ⎟φ ⎜ ⎟ ⎝ 0.4 ⎠ ⎝ 0.5 ⎠ ⎝ 0.75 ⎠ = 1 − φ (5)φ (3)φ (1.333) = 1 − 1 * 0.9986 * 0.9088 = 0.0925

4

ρ (max ρ > 3 cm ) = 1 − P(max ρ ≤ 3) = ρ (max ρ > 3 cm ) = 1 − P( ρ A ≤ 3)P(ρ B ≤ 3)P(ρ C ≤ 3) ⎛ 3 − 2 ⎞ ⎛ 3 − 2 .5 ⎞ ⎛ 3 − 3 ⎞ = 1 − φ⎜ ⎟φ ⎜ ⎟φ ⎜ ⎟ ⎝ 0.4 ⎠ ⎝ 0.5 ⎠ ⎝ 0.75 ⎠ = 1 − 0.994 * 0.84 * 0.5 = 0.582

b. The differential settlement between A and B is ∆ AB =3.5cm – 2.5cm = 1 cm has already

( ) occurred. Hence, P ∆ max ≤ 0.8 cm = 0 irrespective of ρ C , however ρ C matters if P(∆ max ≤ 1.5 cm ) , it is necessary if we need the data with 95% or 99% or 99.9% of occurrence , we need to determine the following P(µ − 1.960σ < X ≤ µ + 1.96σ ) = 95% P(µ − 2.58σ < X ≤ µ + 2.58σ ) = 99%

P(µ − 3.29σ < X ≤ µ + 3.29σ ) = 99.9%

For the ∆ max to be more than 1.5cm, ρ A = 2.5 cm , either ρ C should be less than 1 or more than 4. Since ρ B = 3.5 cm , ρ C should be less than 2cm or more than 5 cm . Acceptable region for safety is ρ C should be between 2 to 4.

5

P(∆ max ≤ 1.5cm ) = P(2 cm ≤ ρ C ≤ 4 cm ) ⎛ 2 −3⎞ ⎛ 4 − 3⎞ = φ⎜ ⎟ ⎟ − φ⎜ ⎝ 0.75 ⎠ ⎝ 0.75 ⎠ = φ (− 1.333) − φ (− 1.333) = 0.9088 − 0.0912 = 0.8176 Example 3

The total load on the footing is the sum of dead load of the structure and the live load. Since each load is sum of various components, total dead load (X) and total live load (Y)can be considered as normally distributed. The data from building suggest that

µ x = 100 ton and

σ x = 10 ton and µ y = 40 ton

and

σ y = 10 ton

both x and y

are not correlated. What is the total design load that has 5 % probability? The total load Z = x + y = 100 + 40 = 140 ton and

σ z = σ x2 + σ y2 = 10 2 + 10 2 = 10 2 = 14.1ton We need to determine z such that it has only 5% probability of occurrence 1 − φ ( z ) = 0.05 φ ( z ) = 0.95 ⎛z−µ⎞ ⎟ = 0.95 ⎝ σ ⎠ ⎛ z − 140 ⎞ φ⎜ ⎟ = 0.95 ⎝ 14.1 ⎠ z − 140 = 1.65 14.1 z = 163.3 ton

φ⎜

Example 4

6

The mean and coefficient of variation of the angle of internal friction φ of a soil ing a multi-storeyed structure are φ =20 and V φ =3O%. What is the probability ο ο ο that φ will be less than (a) 16 , (b) 10 , (c) 5 , Assume that φ has a normal distribution

Solution

(a) Vφ =

Sφ

φ

= 0.3

S φ = 0.3 × 20° = 6° s=

φ −φ Sφ

=

16 − 20 − 4 = = −0.666 6 6

Φ (− s ) = 1 − Φ (− s ) = 1 − Φ (−0.666) where Φ () is obtained from tabulated values ∴ P( s ≤ 16°) = 0.253 (b) 10 − 20 5 = − = −1.666 6 3 ∴ Φ (−1.666) = 1 − Φ (1.666) = 1 − 0.952 = 0.048 s=

P ( s ≤ 10°) = 0.048 = 0.048 × 10 −1 (c)

5 − 20 15 5 = − = − = − 2 .5 6 6 2 ∴ Φ (− 2.5) = 1 − Φ (2.5) = 1 − 0.994 = 0.006 = 6 × 10 −3 s=

It should be noted that φ denotes the friction angle and φ (s) is cumulative distribution of standard normal variate. Example on cumulative distribution Example 1

In the case of previous car problem

7

0.00 6 + 0.04 + 0.12 + 0.21 + 0.25 = 0.620 becomes the probability that 5 or less cars take a left turn.

If x is a random variable and r is a real number then the CDF designated as F(r) is the probability that x will take an value equal to or less than r or F (r ) = P[x ≤ r ]

For binomial distribution F (r ) = P[x ≤ r ] = Σ all xt ,C ,r b( xi , N , R ) Though this distribution is quite simple, the model as such is quite useful in many engineering problems. For example in a series of soil boring, boulders may or may not be present. Though the distribution is for discrete variables it can also be applied to continuous variables in space and time with discretisation.

Example on lognormal distribution Example 1

The rainfall has lognormal distribution with mean and SD of 60 inches and 15 inches a. Calculate the probability that in future the annual rainfall in between 40 and 70 b. The probability that the annual rainfall is at least 30” c. What is the 10 percentile annual rainfall

15 = 0.25 60 1 λ = ln 60 − * 0.25 2 = 4.09 − 0.03 = 4.06 2

ξ=

8

a. The probability that the annual rainfall in between 40 and 70 is ⎛ ln 70 − 4.06 ln 40 − 4.06 ⎞ − P(40 < x ≤ 70 ) = φ ⎜ ⎟ 0.25 0.25 ⎝ ⎠ = φ (0.75) − φ (− 1.48) = 0.773 − 0.069 = 0.7039

b. The probability that annual rainfall is atleast 20 inches

⎛ ln 30 − 4.06 ⎞ P( x ≥ 30) = 1 − φ ⎜ ⎟ 0.25 ⎝ ⎠ =1 − φ (2.64) =1 − 0.9959 = 0.041 c. 10 percentile rainfall ⎛ ln x10 − 4.06 ⎞ ⎟ = 0.10 0.25 ⎝ ⎠ ln x10 − 4.06 = −1.28 0.25 ln x0.10 = 4.06 − 1.28 − 0.25 = 3.74

φ⎜

x0.10 = e 3.74 = 42.10

Example 2

A live load of 20ton acts on a structure of the loading is assumed to be log-normal distribution. Estimate the probability that a load of 40 will be exceeded. Assume CoV for live load = 25% We have

9

σ [x] = ln (CoV )2

σ [x] = ln E ( y ) − (CoV )2 σ [x] = ln (1 + 0.25)2 = 0.25 E [x ] = ln 20 − (0.25) / 2 2

As x = lnL the value of the normal variate x equivalent 20 =ln 40 =3.69 Hence

h=

3.69 − 2.96 = 2.92 0.25

P[40 ≤ L] = 0.5 − 4(2.92) = 0.5 − 0.498 = 0.002

Example on beta distribution

The φ of the soil samples in a locality varies between 20º to 40º with the coefficient of variation of 20º with mean value 30º. The design value is 35º. What is the probability ο that φ ≥ 35 .

q (b − a ) (q + r ) q 30 = 20 + * 20 (q + r ) q * 20 = 10 (q + r ) 10q + 10r − 20q = 0

µx = a +

10r − 10q = 0 r=q

10

qr (b − a )2 (q + r ) (q + r + 1) qr 36 = 20 2 2 (q + r ) (q + r + 1)

σ x2 =

2

q2

q2 0.09 = = = (q + r )2 (q + r + 1) (q + q )2 (q + q + 1) 4q 2 * (2q + 1) 1 4 = (2q + 1) * 0.09 qr

q = 5.05 r = 5.05 1.0

0.8

0.6

0.4

0.2

20

25

30

35

40

1− q (b − a ) 2−q−r 1 − 5.05 = 20 + * 20 2 − 5.05 * 2 − 4.05 = 20 + * 20 − 8.1 mode x = = 30 a+

2(q − r ) Coefficient of skew ness = (q + r )(q + r + 2 )σ x q < r , the distribution is positive and skewed to the left

11

q > r , the distribution is negative and skewed to the right

q =1.0 ; r = 4.0

q = r = 3.0 q=4;r=2 q = r = 1.0

a=20 b=40

µx = a +

0.5

1

q + 20 (q + r )

q 6 = ⇒ 6q + 6r − 20q = 0 20 q + r 6r − 14q = 0

σ x2 =

qr (b − a )2 (q + r )(q + r + 1)

SD = CoV * 26 ο (5.2 )

2

qr 20 2 (q + r )(q + r + 1) qr 0.0676 = (q + r )(q + r + 1) 6 r *r 14 * 6r 2 14 = = ⎛ 20r ⎞ ⎛ 20r + 14 ⎞ 20r (20r + 14 ) ⎟ ⎜ ⎟ *⎜ ⎝ 14 ⎠ ⎝ 14 ⎠

5.2 2 =

1.352 * r (20r + 14 ) = 84r 2

27.04r 2 + 18.93r − 84r 2 = 0 − 56.96r 2 = 18.93r r = 0.33 q = 0.14

12

Example on binomial distribution Example 1

Over a period of time it is observed that 40% of the automobiles traveling along a road will take a left turn at a given intersection. What is the probability that given a traffic stream of 10 automobiles 2 will take a left turn. N= 10 ; x=2 ; R= 0.40 Possibility

Probability

0

0.006

1

0.04

2

0.12

3

0.21

4

0.25

5

0.20

6

0.11

7

0.04

8

0.01

9

0.002

10

0.0001

13

Possibility Vs Probability 0.3 Probability

0.25 0.2 0.15 0.1 0.05 0 0

1

2

3

4

5

6

7

8

9 10

Possibility

Example 2

A dam has a projected life of 50years . What is the probability that 100years flood will occur during the life time of the dam? R = 1 / 100 = 0.01 , N = 50 years b(1.50,0.01) = 50 (0.01)1(0.99)49= 0.31 N 10

10*(0.01)1*(0.99)9

0.09

20

20*(0.01)1*(0.99)19

0.165

30

30*(0.01)1*(0.99)29

0.224

40

40*(0.01)1*(0.99)39

0.270

50

50*(0.01)1*(0.99)49

0.305

60

60*(0.01)1*(0.99)59

0.387

70

70*(0.01)1*(0.99)69

0.350

80

80*(0.01)1*(0.99)79

0.362

90

1

90*(0.01) *(0.99)

0.368

100

100*(0.01)1*(0.99)99

89

0.370

14

Example 3

A flood control system for a river , the yearly maximum flood of river is concern. The probability of the annual maximum flood exceeding some specified design level no. is 0.1 in any year. What is the probability that the level no. will be exceeded once in five years. Assuming binomial distribution means that there is only one occurrence or not in the year. Each occurrence or not occurrence is independent of the other events. The probability of occurrence in each trial is also considered.

()

b ( x, N , R ) = n p x q ( n − x ) x

n! 1! p x q (n− x ) = 0! 1! (n − x )! x! =

5! (0.1)1 (0.9)4 4! 1!

= 5 * (0.1) (0.9 ) 1

4

= 0.328

Hence

The probability that at least one exceedance of level no = P ( x ≤ 1) = P( x = 0 ) + P ( x = 1) = 0.590

Example on binomial distribution Example 1

1. Find the variance of the binomial distribution b(xi,N,R) = b(xi , 5 , 0.01)

15

E[ x / b( xi ,5,0.01)] = NR = 5 * 0.1 = 0.5 V [ xi ] = E[( xi − x)]2 2

2

V [ x i ] = E[ x i − 2 x i x + x ] 2

2

V [ xi ] = E[ xi ] − 2 xE[ xi ] + E[ x ] 2

2

V [ x i ] = E[ x i ] − 2 x + x 2

V [ x i ] = E[ x i ] − x

2

2

Variance has the dimension of a square of the random variable ,more meaningful measure of dispersion is the positive square root of variance called standard deviation

σ [ xi ] =

[ xi ]

The equivalent concept of standard deviation in static’s is radius of gyration. Another useful relative measure of scatter of radius of gyration called co-efficient of variation

V ( x) =

σ [ x] E[ x]

* 100%

Coefficient of Variation express a measure of central tendency .For example a mean value of 10 and SD of 1 indicated 10% CoV . 10% < Low 15-30% = moderate 30% = High For symmetrical distribution, all moment’s of odd order are zero. The third central moment E(xi- x )3 represents the degree of skew ness .The fourth moment provides a measure of peaked ness (Kurtosis) of the distribution to make these non-dimensional ,They are divided by standard deviation raised to cube or fourth order respectively.

Co-efficient of skew ness = β(1)=

E[( xi − x) 3 ] (σ [ xi ]) 3

β(1) is +ve ,the long tail is on the right side of the mean , β(2) is negative ,long tail is on the left side

16

Co-efficient of Kurtosis or peaked ness β(2) =

E[( xi − x )]4 σ [ x]4

A distribution is said to be flat if β(2) < 3 Example on geometric distribution Example 1

A structure is designed for a height 8m above the mean sea level. This height corresponds to 10 % probability of being exceeded by sea waves in a year. What is the probability that the structure will be subjected to waves exceeding 8m within return period of design wave.

8m

1 1 = = 10 years P 0.1 10 P[H > 8M in 10 years ] = 1 − (0.9) = 1 − 0.3487 = 0.6513

T=

The number of trials (t) until specified event occurs for the first time is given by geometric distribution

17

b(1,1, p ) =

1! 1 (t −1) pq = p * q (t −1) 0! 1!

Example 2

A transmission tower is designed for 50years period i.e. a wind velocity having a return period of 50yrs What is the probability that the design wind velocity will be exceeded for the first time in 5th year, after the completion of the structure? Every year is considered as scale and the probability of 50 years wind occurrence in any year is p=1 / 50 = 0.02 b ( x, N , p , q ) = p x q ( N − x )

b(1,5,0.02,0.98) = (0.02 ) (0.98) = 0.0184 1

4

what is the probability that first such wind velocity will occur with 5 years after the completion of the structure. 5

P(T ≤ 5) = ∑ (0.02 )(0.98)

t −1

i =1

= 0.02 + 0.0196 + 0.0192 + 0.0188 + 0.0184 = 0.096 Example on Poisson’s distribution Example 1

Record of rain storm in a town indicates that on the average there have been four rainstorms per year over the last years. If the occurrence is assumed to follow a poisons process what is the probability that there is no rainstorm next year?

18

4 0 −4 P( X t = 0) = e = 0.0018 0! 4 4 −4 = e = 0.195 4! The above result indicate that the average yearly occurrence of rainstorm is 4, the probability of having exactly 4 storm in a year is also 4 The probability of 2 or more rainstorms in a year is ∞

4 x −4 e x = 2 x!

P( X t ≥ 2) = ∑

1

4 x −4 e = 1 − 0.0018 − 0.024 = 0.908 x = 0 x!

= 1− ∑

Example 2

The probability that a structure fails is P(f) = 0.001 of 1000 such structures built what is the probability that two will fail b(x, N , P( f )) =

N! x P( f ) R ( N − x ) x! ( N − x )!

Stirlings Formula N !≈ 2πN N N e − N ln (N !) =

1 ln(2πN ) + N ln N − N 2

1000! (0.001)2 (0.999)998 2! 998! 998 * 1000 = = 0.184 2

b(x, N , P( f )) =

b

to use Poisson distributions we need the expected value of binomial distribution = n*p = 1000*0.001=1

19

12 e −1 = 0.184 2! f (0) = 0.37 f (1) = 0.37 f (3) = 0.06 f (4 ) = 0.02 f (2) =

f (2 ) = 0.18

Example 3

During world war II German dropped 54% bombs on London , an analysis was conducted to determine if the bombs were guided or not. It was reasoned that if bombs lacked guidance they should follow Poissons distribution. To check that, London was divided into 180 regions of approximately equal area and the number of hits in each area were recorded. No of hits

Observed No. of

Poissons

areas within xi

distribution with

Theoretical No.

f(xi) with µ=0.943 0

229

0.389

226

1

211

0.367

213

2

93

0.173

100

3

39

0.054

32

4

7

0.013

7

5 or more

1

0.004

2

580

1.00

580

The expected value per area = 547/580=0.943 From the above , one can say that bombing is a Poisson distribution Example on exponential distribution Example 1

Historical records of earthquake in San Francisco, California show that during the period 1836-1961 there were 16 earthquakes of intensity VI or more. If the occurrence of suc 20

high intensity earthquakes in the region is assumed to follow a poisson process then what is the probability that such earthquakes will occur within the next years. 16 = 0.128 quakes per year 125 P(T1 ≤ 2) = 1 − e −0.128(2 ) = 0.226

γ =

The probability that no earthquake of this high intensity will occur in the next 10 years is P(T1 ≥ 0) = e −10(0.128 ) = 0.278 E (T1 ) =

1

γ

=

1 = 7.8 years 0.128

Hence that an earthquake of at least VI intensity can be expected on an average once in very 7.8 years Hence, the general model in the area is P(T1 ≤ t ) = 1 − e −0.128t

P(T1 ≤ 7.8) = 1 − e −0.128*7.8 = 1 − e −1 = 0.632

Example on hyper geometric distribution Example 1

A box contains 25 strain gauges and out of them 4 is known to be defective. If 6 gauges were used in the experiment, what is the probability that there is one defective gauge in the container. N=25 , m=4 , n=6 The probability that one gauge is defective

21

⎛ 4 ⎞⎛ 21⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ 1 5 P( X = 1) = ⎝ ⎠⎝ ⎠ = 0.46 ⎛ 25 ⎞ ⎜⎜ ⎟⎟ ⎝6⎠ The probability that no gauge is defective 21 6 P( X = 0) = = 0.31 25 6

Example 2

An inspector on an highway project finds that two substandard test per 10 samples are a good measure of contractors ability to perform. Find the probability that I among the five samples selected at random a. There is one substandard test b. There are two such results

⎛ m ⎞⎛ N − m ⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟ x ⎠⎝ n − x ⎟⎠ ⎝ P( X = x ) = ⎛N⎞ ⎜⎜ ⎟⎟ ⎝n⎠ We have a. We have N = 10 , m = 2, x = 1, n = 5

⎛ 2 ⎞⎛ 8 ⎞ 21 81 ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ * 1 ⎠⎝ 4 ⎠ 111 4141 2 * 70 ⎝ = = = = 0.55 101 252 ⎛10 ⎞ ⎜⎜ ⎟⎟ 5151 ⎝5⎠ b. We have N = 50, m = 5, n = 10, x = 1

22

⎛ 5 ⎞⎛ 45 ⎞ 5! 45! ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ * 1 9 411! 3619! = ⎝ ⎠⎝ ⎠ = = 0.43 50! ⎛ 50 ⎞ ⎜⎜ ⎟⎟ 40! 10! ⎝ 10 ⎠ Example 3

In an area chosen for foundation structure at 50 locations were collected and shear strength determined, If the 50,10 are unsuitable from shear strength considerations. In order to improve shear strength insitu grouting is one of the methods proposed for 10 locations. What is the probability that a. the present location b. Two locations were initially unsuitable Answer N = 50, m = 10, n = 10, x = 1 and 2

⎛ m ⎞⎛ N − m ⎞ ⎜⎜ ⎟⎟⎜ ⎟ x ⎠⎝ n − x ⎠ ⎝ f (x ) = N n a. The present location is unsuitable

23

⎛10 ⎞⎛ 50 − 10 ⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟ 1 ⎠⎝ 10 − 1 ⎟⎠ ⎝ f (1) = = ⎛ 50 ⎞ ⎜⎜ ⎟⎟ ⎝ 10 ⎠

⎛10 ⎞⎛ 40 ⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎝ 1 ⎠⎝ 9 ⎠ = 0.267 ⎛ 50 ⎞ ⎜⎜ ⎟⎟ ⎝ 10 ⎠

⎛10 ⎞⎛ 50 − 10 ⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟ 2 ⎠⎝ 10 − 2 ⎟⎠ ⎝ f (2 ) = = ⎛ 50 ⎞ ⎜⎜ ⎟⎟ ⎝ 10 ⎠

⎛10 ⎞⎛ 40 ⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎝ 2 ⎠⎝ 8 ⎠ = 0.34 ⎛ 50 ⎞ ⎜⎜ ⎟⎟ ⎝ 10 ⎠

if the location is such that, the one or two locations can be discarded leaving the ones in unsuitable area, the it is considered as a distribution with replacement. Binomial distribution can be chosen for the purpose x

⎛ n ⎞⎛ M ⎞ ⎛ M ⎞ f ( x ) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜1 − ⎟ N⎠ ⎝ x ⎠⎝ N ⎠ ⎝ ⎛10 ⎞⎛ 10 ⎞ f (1) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎝ 1 ⎠⎝ 50 ⎠

1

⎛10 ⎞⎛ 10 ⎞ f (2) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎝ 2 ⎠⎝ 50 ⎠

n− x

9

⎛ 10 ⎞ ⎜1 − ⎟ = 0.268 ⎝ 50 ⎠

2

9

⎛ 10 ⎞ ⎜1 − ⎟ = 0.30 ⎝ 50 ⎠

Example 4

Records collected by a contractor over 40 years period indicate that there has been 240days of indigent weather, because of which the operations were closed down. On the basis of past record , what is the probability that no days were lost next year Answer Mean value of occurrence λ = 240 / 40 = 6 per year

24

f (x ) =

λx e −λ

x! 6 e −6 f (0 ) = = 2.48 *10 −3 0! 0

No of days

Probability of occurrence of 0.6

0

0.0025

1

0.0149

2

0.0446

3

0.0892

4

0.1339

5

0.1606

6

0.1606

7

0.1377

8

0.1033

9

0.0688

10

0.0413

11

0.0225

12

0.0113

The life period of materials / radioactivity are normally characterized in of exponential distribution.

f T (t ) = λe − λt f T (0) = λ f T (∞ )

the corresponding distribution is f T (T ) = ∫ λe −λt t

0

25

t

⎡ 1 ⎤ = λ ⎢ − e − λt ⎥ ⎣ λ ⎦0

[

]

= − e − λt − 1 = 1 − e − λ t Mean or expected value ∞

E (k ) = ∫ xf x dx −∞

∞

Var ( x ) = ∫ xf ( x )dx −∞

hence ∞

µ = E (T ) = ∫ tλe −λt dt −∞

Example 5

A storm event occurring at a pint in a space is characterized by two variables, duration of the storm X and its intensity Y . if both the variables are exponentially distributed as

Fx ( x ) = 1 − e − ax

x≥0 a>0

Fy ( y ) = 1 − e −by

y≥0 b>0

Assuming that the t distributions of both X and Y is also exponential then,

(

)(

FX ,Y = 1 − e − ax 1 − e − by

)

= 1 − e − ax − e −by + e −(ax + by )

δFX ,Y = + ae − ax − ae −(+ ax + ay ) δx δ 2 FX ,Y = abe −(ax + by ) δ x δy f X (x ) =

∫

∞

0

abe −(ax + by ) dy ∞

⎡ ab −(ax + by ) ⎤ = ⎢− e ⎥⎦ ⎣ b 0

[

= − ae −(ax + by )

]

∞

0

= + ae − ax as y → ∞ f y ( y ) = be −by

e −(ax + by ) → 0

26

f x ( x ) = abe − (ax +by ) dxdy =∫

∞

0

∫

∞

0

abe −(ax +by )

− ( ax + by ) ∞ ⎡ abe ⎤ =∫ ⎢ ⎥ 0 ⎣ −a ⎦

= =

∫ − [0 − e ]dy ∞

−by

0

∫

∞

0

e −by dy

[ ]

1 −by ∞ e 0 b = − [0. − 1] = 1 =

The t PDF for the concentration much of two pollutants (x,y) in ppms If (x,y) = 2-x-y 0 ≤ x, y ≤ 1

Show that a. f(x,y) is a probability distributions b. Determine CDF c. The t probability that x ≤ 1 2

, y≤3 4

d. Marginal distribution of x and y e. Are they independent? f. If the concentration of y is 0.5ppm, determine the probability x ≤ 0.25 Answer a. if f(x,y) in pd the volume has to be

27

⎛

⎞ ⎟ 2 ⎟⎠

y ∫0 ∫0 (2 − x − y )dxdy = ∫0 ⎜⎜ 2 y − xy − 1 1

1

⎝

[

1

2

]

= ∫ (2 − x − 1 2)dx = 2 x − x 2 2 − x 2 0 0

1

= 2 *1 − 1 2 − 1 2 =1 b.

F( x , y ) = ∫

x

0

∫ (2 − 4 − 0) y

0

⎡ ⎤ y2 = ∫ ⎢2 y − − 4 y ⎥ dy 0 2 ⎣ ⎦ 2 ⎡ ⎤ xy = ⎢2 xy − − xy ⎥ 2 ⎣ ⎦ x

c. ⎤ ⎡ 1 2 * 32 4 2 F (1 2 , 3 4) = ⎢2 * 1 2 * 3 4 − − 1 2 * 3 4⎥ 2 ⎦ ⎣ = 0.52

1

⎡ ⎤ x2 1 ⎡ ⎤ − xy ⎥ = ⎢2 − − y ⎥ = 1.5 − y f y ( y ) = ∫ (2 − x − y )dx = ⎢2 x − 0 2 2 ⎦ ⎣ ⎦0 ⎣ 1

1

⎤ ⎡ y2 1 ⎡ ⎤ − xy ⎥ = ⎢2 − − x ⎥ = 1.5 − x f x ( x ) = ∫ (2 − x − y )dx = ⎢2 y − 0 2 2 ⎦ ⎣ ⎦0 ⎣ 1

d.

28

P ( A ∩ B ) = P ( A) * P (B ) f x ( x ) * f y ( y ) = (1.5 − x )(1.5 − y ) should be equal to f ( xy ) = 2.25 − 1.5 y − 1.5 y =2 − x − y similarly f x ( x ) = ae − ax

f y ( y ) = be −by

f x f y = ab e −(ax +by )

29

Learning objectives 1. Understanding the concept of random field theory. 2. Understanding the significance of Auto covariance function. 3. Understanding the various functions of random fields.

1

MODULE - 3

Motivation and Highlights

Motivation: Importance of understanding the random field theory

Highlights: Knowledge of Stationary process, Auto covariance function

1

3 SPATIAL VARIABILITY USING RANDOM FIELDS 3. 1 Need for spatial variability characterization in design Many quantities such as properties of materials, concentrations of pollutants, loads etc in civil engineering have spatial variations. Variations are expressed in of mean or average values and the coefficients of variation defined in of the ratio of standard deviation and mean value expressed as percentage. In addition, the distance over which the variations are well correlated also plays a significant role. A successful design depends largely on how best the designer selects the basic parameters of the loading/site under consideration from in-situ and/or laboratory test results. Probabilistic methods in civil engineering have received considerable attention in the recent years and the incorporation of soil variability in civil/geotechnical designs has become important. Considerable work was carried out in the area of geotechnical engineering. Guidelines such as those of JCSS (2000) have also been developed in this context. Dasaka (2005) presented a comprehensive compilation on spatial variability of soils. In the following sections, spatial soil variability of soils is addressed and the concepts are applicable to any other property variations as well. Soil has high variability compared to manufactured materials like steel or cement, where variability in material properties is less, as they are produced under high quality control. 3.2 Characterization of variability of design parameters It is generally agreed that the variability associated with geotechnical properties should be divided in to three main sources, viz., inherent variability, measurement uncertainty, and transformation uncertainty (Baecher and Christian 2003; Ang and Tang 1984).

3.2.1 Inherent variability

The inherent variability of a soil parameter is attributed to the natural geological processes, which are responsible for depositional behaviour and stress history of soil under consideration. The fluctuations of soil property about the mean can be modelled using a zeromean stationary random field (Vanmarcke 1977). A detailed list of the fluctuations in of coefficients of variation for some of the laboratory and in-situ soil parameters, along with the respective scales of fluctuation in horizontal and vertical directions are presented in Baecher and Christian (2003).

3.2.2 Measurement uncertainty Measurement uncertainty is described in of accuracy and is affected by bias (systematic error) and precision (random error). It arises mainly from three sources, viz., equipment errors, procedural-operator errors, and random testing effects, and can be evaluated from data provided by the manufacturer, operator responsible for laboratory tests and/or scaled tests. Nonetheless the recommendations from regulatory authorities regarding the quality of produced data, the measuring equipment and other devices responsible for the measurement of in-situ or laboratory soil properties often show variations in its geometry, however small it may be. There may be many limitations in the formulation of guidelines for testing, and the understanding and implementation of these guidelines vary from operator to operator and contribute to procedural-operator errors in the measurement. The third factor, which contributes to the measurement uncertainty, random testing error, refers to the remaining scatter in the test results that is not assignable to specific testing parameters and is not caused by inherent soil variability.

3.2.3 Transformation uncertainty 9

Computation models, especially in the geotechnical field contain considerable uncertainties due to various reasons, e.g. simplification of the equilibrium or deformation analysis, ignoring 3-D effects etc. Expected mean values and standard deviations of these factors may be assessed on the basis of empirical or experimental data, on comparison with more advanced computation models. Many design parameters used in geotechnical engineering are obtained from in-situ and laboratory test results. To for this uncertainty, the model or transformation uncertainty parameter is used.

3.3.4 Evaluation design parameter uncertainty The total uncertainty of design parameter from the above three sources of uncertainty is combined in a consistent and logical manner using a simple second-moment probabilistic method. The design parameter may be represented as

ξ d = T (ξ m , ε )

(1)

where ξ m is the measured property of soil parameter obtained from either a laboratory or insitu test. The measured property can be represented in of algebraic sum of nonstationary trend, t, stationary fluctuating component, w, and measurement uncertainty, e. ε is the transformation uncertainty, which arises due to the uncertainty in transforming the in-situ or laboratory measured soil property to the design parameter using a transformation equation of the form shown in Equation 1. Hence, the design property can be represented by Equation 2.

ξ d = T (t + w + e, ε )

(2)

Phoon and Kulhawy (1999b) expressed the above equation in of Taylor series. Linearizing the Taylor series after terminating the higher order at mean values of soil

10

parameters leads to the Equation 3 for soil design property, subsequently the mean and variance of design property are expressed as given in Equations 4 and 5.

ξ d ≈ T (t ,0) + w

∂T ∂T +e ∂w (t ,0 ) ∂e

+ε (t , 0 )

∂T ∂ε

(3) (t , 0 )

mξd ≈ T (t ,0 )

(4) 2

2 ξd

SD

2

2

⎛ ∂T ⎞ ⎛ ∂T ⎞ ⎛ ∂T ⎞ 2 2 2 =⎜ ⎟ SDw + ⎜ ⎟ SDe + ⎜ ⎟ SDε ∂ ∂ ∂ ε w e ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

(5)

The resulting variance of design parameter after incorporating the spatial average is given by 2

2

2

⎛ ∂T ⎞ 2 ⎛ ∂T ⎞ ⎛ ∂T ⎞ 2 2 2 SD = ⎜ ⎟ Γ (L ) SDw + ⎜ ⎟ SDe + ⎜ ⎟ SDε ⎝ ∂ε ⎠ ⎝ ∂w ⎠ ⎝ ∂e ⎠ 2 ξa

(6)

Of the above, the treatment and evaluation of inherent soil variability assumes considerable importance as the uncertainties from measurements and transformation process can be handled if proper testing methods are adopted and transformation errors are quantified. Approaches for evaluation of inherent soil variability are developed based on random fields and a brief description of the theory and its relevance to characterisation of soil spatial variability is described in the following sections.

3.4 Random field Theory Soil properties exhibit an inherent spatial variation, i.e., its value changes from point to point. Vanmarcke (1977a; 1983) provided a major contribution to the study of spatial variability of geotechnical materials using random field theory. In order to describe a soil property stochastically, Vanmarcke (1977a) stated that three parameters are needed to be described: (i) the mean (ii) the standard deviation (or the variance, or the coefficient of variation); and (iii) the scale of fluctuation. He introduced the new parameter, scale of fluctuation, which

11

s for the distance within which the soil property shows relatively strong correlation from point-to-point.

Figure 3.1(a) shows a typical spatially variable soil profile showing the trend, fluctuating component, and vertical scale of fluctuation. Small values of scale of fluctuation imply rapid fluctuations about the mean, whereas large values suggest a slowly varying property, with respect to the average.

(b)

(a)

Figure 3.1(a). Definition of various statistical parameters of a soil property (Phoon and Kulhawy 1999a); (b) approximate definition of the scale of fluctuation (Vanmarcke 1977a) Vanmarcke (1977a) demonstrated a simple procedure to evaluate an approximate value of the vertical scale of fluctuation, as shown in Figure 3.1(b), which shows that the scale of fluctuation is related to the average distance between intersections, or crossings, of the soil property and the mean. A random field is a conceivable model to characterize continuous spatial fluctuations of a soil property within a soil unit. In this concept, the actual value of a soil property at each location within the unit is assumed to be a realization of a random variable. Usually, parameters of the random field model have to be determined from only one realization. Therefore the random

12

field model should satisfy certain ergodicity conditions at least locally. If a time average does not give complete representation of full ensemble, system is non-ergodic. The random field is fully described by the autocovariance function, which can be estimated by fitting empirical autocovariance data using a simple one-parameter theoretical model. This function is commonly normalized by the variance to form the autocorrelation function. Conventionally, the trend function is approximately removed by least square regression analysis. The remaining fluctuating component, x(z), is then assumed to be a zero-mean stationary random field. When the spacing between two sample points exceeds the scale of fluctuation, it can be assumed that little correlation exists between the fluctuations in the measurements. Fenton (1999a & b) observed that the scale of fluctuation often appears to increase with sampling domain.

3.4.1 Statistical homogeneity Statistical homogeneity in a strict sense means that the entire t probability density function (t pdf) of soil property values at an arbitrary number of locations within the soil unit is invariable under an arbitrary common translation of the locations. A more relaxed criterion is that expected mean value and variance of the soil property is constant throughout the soil unit and that the covariance of the soil property values at two locations is a function of the separation distance. Random fields satisfying only the relaxed criteria are called stationary in a weak sense. Statistical homogeneity (or stationarity) of a data set is an important prerequisite for statistical treatment of geotechnical data and subsequent analysis and design of foundations. In physical sense, stationarity arises in soils, which are formed with similar material type and under similar geological processes. Improper qualification of a soil profile in of the statistical homogeneity leads to biased estimate of variance of the mean observation in the

13

soil data. The entire soil profile within the zone of influence is divided into number of statistically homogeneous or stationary sections, and the data within each layer has to be analysed separately for further statistical analysis. Hence, the partition of the soil profile into stationary sections plays a crucial role in the evaluation of soil statistical parameters such as variance.

3.4.2 Tests for statistical homogeneity The methods available for statistical homogeneity are broadly categorised as parametric tests and non-parametric tests. The parametric tests require assumptions about the underlying population distribution. These tests give a precise picture about the stationarity (Phoon et al. 2003a). In geostatistical literature, many classical tests for verification of stationarity have been developed, such as Kendall’s τ test, Statistical run test (Phoon et al. 2003a). Invariably, all these classical tests are based on the important assumption that the data are independent. When these tests are used to the spatially correlated data, a large amount of bias appears in the evaluation of statistical parameters, and misleads the results of the analysis. To overcome this deficiency, Kulathilake and Ghosh (1988), Kulathilake and Um (2003), and Phoon et al. (2003a) proposed advanced methods to evaluate the statistical homogeneous layers in a given soil profile. The method proposed by Kulathilake and Ghosh (1988), Kulathilake and Um (2003) is semi-empirical window based method, and the method proposed by Phoon et al. (2003a) is an extension of the Bartlett test.

3.4.2.1 Kendall’s τ test The Kendall τˆ statistic is frequently used to test whether a data set follows a trend. Kendall’s τˆ is based on the ranks of observations. The test statistic, which is also the measure of association in the sample, is given by

14

τˆ =

S n ( n − 1) / 2

(7)

where n is the number of (X,Y) observations. To obtain S, and consequently τˆ , the following procedure is followed. 1. Arrange the observations (Xi, Yi) in a column according to the magnitude of the X’s, with the smallest X first, the second smallest second, and so on. Then the X’s are said to be in natural order. 2. Compare each Y value, one at a time, with each Y value appearing below it. In making these comparisons, it is said that a pair of Y values (a Y being compared and the Y below it) is in natural order if the Y below is larger than the Y above. Conversely, a pair or Y values is in reverse natural order if the Y below is smaller than the Y above. 3. Let P be the number of pairs in natural order and Q the number of pairs in reverse natural order. 4. S is equal to the difference between P and Q; ⎛ n ⎞ n ( n − 1) possible comparisons of Y values can be made in this manner. If all A total of ⎜⎜ ⎟⎟ = 2 ⎝ 2⎠ the Y pairs are in natural order, then P =

τ=

n (n − 1) n (n − 1) n (n − 1) , Q=0, S = , and hence −0 = 2 2 2

n (n − 1) / 2 = 1 , indicating perfect direct correlation between the observations of X and Y. n (n − 1) / 2

On the other hand, if all the Y pairs are in reverse natural order, we have P=0, Q =

S= 0−

n ( n − 1) − n (n − 1) , and = 2 2

τ=

n (n − 1) , 2

− n (n − 1) / 2 = −1 , indicating a perfect inverse n (n − 1) / 2

correlation between the X and Y observations.

15

Hence τˆ cannot be greater than +1 or smaller than -1, thus, τˆ can be taken as a relative measure of the extent of the disagreement between the observed orders of the Y. The strength of the correlation is indicated by the magnitude of the absolute value of τˆ .

3.4.2.2 Statistical run test In this procedure, a run is defined as a sequence of identical observations that is followed and preceded by a different observation or no observation at all. The number of runs that occur in a sequence of observations gives an indication as to whether or not results are independent random observations of the same random variable. In this the hypothesis of statistical homogeneity, i.e., trend-free data, is tested at any desired level of significance, α, by comparing the observed runs to the interval between rn;1−α / 2 and rn;α / 2 . Here, n=N/2, N being the total number of data points within a soil record. If the observed number of runs falls outside the interval, the hypothesis would be rejected at the α level of significance. Otherwise, the hypothesis would be accepted. For testing a soil record with run test, the soil record is first divided into number of sections, and variance of the data in each section is computed separately. The computed variance in each section is compared with the median of the variances in all sections, and the number of runs (r) is obtained. The record is said to be stationary or statistically homogeneous at significance level of α, if the condition given below is satisfied. rn; 1−α / 2 < r ≤ rn ; α / 2

(8)

3.4.2.3 Bartlett’s approach The classical Bartlett test is one of the important tests, which examines the equality of two or multiple variances of independent data sets. The following steps are involved in the Bartlett’s test.

16

The sampling window is divided into two equal segments and sample variance ( s12 or s 22 ) is calculated from the data within each segment separately. For the case of two sample variances, s12 and s 22 , the Bartlett test statistic is calculated as Bstat =

2.30259(m − 1) 2 log s 2 − log s12 + log s 22 C

[

(

)]

(9)

where m=number of data points used to evaluate s12 or s 22 . The total variance, s2, is defined as s12 + s 22 s = 2 2

(10)

The constant C is given by C = 1+

1 2(m − 1)

(11)

While choosing the segment length, it should be ed that m≥10 (Lacasse and Nadim 1996). In this technique, the Bartlett statistic profile for the whole data within the zone of influence is generated by moving sampling window over the soil profile under consideration. In the continuous Bartlett statistic profile, the sections between the significant peaks are treated as statistically homogeneous or stationary layers, and each layer is treated separately for further analysis.

3.4.2.4 Modified Bartlett technique Phoon et al. (2003a, 2004) developed the Modified Bartlett technique to test the condition of null hypothesis of stationarity of variance for correlated profiles suggested by conventional statistical tests such as Bartlett test, Kendall’s test etc, and to decide whether to accept or reject the null hypothesis of stationarity for the correlated case. The modified Bartlett test statistic can also be used advantageously to identify the potentially stationary layers within a soil profile. This procedure was formulated using a set of numerically simulated correlated

17

soil profiles covering all the possible ranges of autocorrelation functions applicable to soil. In this procedure, the test statistic to reject the null hypothesis of stationarity is taken as the peak value of Bartlett statistic profile. The critical value of modified Bartlett statistic is chosen at 5% significance level, which is calculated from simulated soil profiles using multiple regression approach, following five different autocorrelation functions, viz., single exponential, double exponential, triangular, cosine exponential, and second-order Markov. The data within each layer between the peaks in the Bartlett statistic profile are checked for existence of trend. A particular trend is decided comparing the correlation length obtained by fitting a theoretical function to sample autocorrelation data. If the correlation lengths of two trends of consecutive order are identical, it is not required to go for higher order detrending process. However, it is suggested that no more than quadratic trend is generally required to be removed to transform a non-stationary data set to stationary data set (Jaksa et al. 1999). The following dimensionless factors are obtained from the data within each layer. Number of data points in one scale of fluctuation, k =

Normalized sampling length, I 1 =

T

Normalized segment length, I 2 =

W

δ

δ

δ Δz

(12)

=

nΔz n = kΔz k

(13)

=

mΔz m = kΔz k

(14)

where δ is the scale of fluctuation evaluated, and ‘n’ is the total of data points in a soil record of T. The Bartlett statistic profile is computed from the sample variances computed in two contiguous windows. Hence, the total soil record length, T, should be greater than 2W. To ensure that m≥10, the normalized segment length should be chosen as I2=1 for k≥10 and I2=2 for 5≤k<10 (Phoon et al. 2003a).

18

Equations 15 and 16 show the typical results obtained from regression analysis for I2 equals to 1 and 2 respectively for the single exponential simulated profiles. Similar formulations have also been developed for other commonly encountered autocorrelation functions and reported in Phoon et al. (2003a). Bcrit=(0.23k+0.71) ln(I1)+0.91k+0.23

for I2=1

(15)

Bcrit=(0.36k+0.66) ln(I1)+1.31k-1.77

for I2=2

(16)

B

B

A comparison is made between the peaks of the Bartlett statistic within each layer with Bcrit obtained from the respective layer. If Bmax Bcrit, reject the null hypothesis of stationarity, and treat the sections on either side of the peaks in the Bartlett statistic profile as stationary and repeat the above steps and evaluate whether these sections satisfy the null hypothesis of stationarity. However, while dividing the sections on either side of the peaks in the Bartlett statistic profile, it should be checked for m≥10, where ‘m’ is the number of data points in a segment.

3.4.2.5 Dual-window based method Kulathilake and Ghosh (1988) and Kulathilake and Um (2003) proposed a simple window based method to the statistical homogeneity of the soil profile using cone tip resistance data. In this method, a continuous profile of ‘BC’ distance is generated by moving two contiguous sub-windows throughout the cone tip resistance profile. The distance ‘BC’, whose units are same as qc, is the difference of the means at the interface between two contiguous windows. In this method it is verified whether the mean of the soil property is constant with depth, which is a prerequisite to satisfy the weak stationarity. At first, the elevation of the window is taken at a level that coincides with the level of first data point in the qc profile. After evaluating the BC distance, the whole window is moved down at a shift each time. The

19

computed distance ‘BC’ is noted each time at the elevation coinciding the centre of the window (i.e., the intersection of two contiguous sub-windows). This length of sub-window is selected based on the premise that at least 10 data points are available within the sub-window. The data within the two sub-windows is treated separately, and checked for linear trend in the data of 10 points. The reason behind ing the data with only linear trend is that within 0.2 m profile, higher-order trends are rarely encountered. In addition, in normally consolidated soils, the overburden stress follows a linear trend with depth. Kulathilake and Um (2003) suggested that the demarcation between existence and non-existence of a linear trend in the data be assumed at a determination coefficient (R2) of 0.9. It means that if the R2 value of theoretical linear fit is greater than 0.9, then the data set is said to be having a linearly trend in it, if not the mean value is said to be constant throughout the sub-window. Hence, within a window length (i.e., two contiguous windows) there exist four sets of possibility of trend in the mean values. They are 1. Constant trend in both the contiguous sub-windows 2. Constant trend in upper sub-window and a linear trend in the lower sub-window 3. Linear trend in the upper sub-window and constant trend in the lower sub-window, and 4. Linear trend in both the contiguous sub-windows. The above four sets possibilities of trend within the contiguous windows are shown in Figure 3.2. As the distance ‘BC’ increases, the heterogeneity of the qc at the intersection between two sub-sections increases.

20

A

C

D

B

R2 for linear fit<0.9

R2 for linear fit<0.9

B

A

C R2 for linear fit>0.9

C

A

B

C R2 for linear fit<0.9

A

B

D

D

D

Figure 3.2. Evaluation of ‘BC’ distance in various possible combinations

3.4.3 Trend removal Once the statistically homogeneous layers are identified within a soil profile, the individual statistical layers are checked for the existence of trend, and the same is removed before evaluating the variance and autocorrelation characteristics of the data. In general, all soil properties exhibit a trend with depth. The deterministic trend in the vertical soil profile may be attributed to overburden stress, confining pressure and stress history of soil under study. Generally, a smooth curve can be fitted using the Ordinary Least Square (OLS) method, except in special cases such as varved clays, where periodic trends are clearly visible (Phoon et al. 2003a). In most of the studies, the trend line is simply estimated by regression analysis using either linear or polynomial curve fittings. Other methods have also been applied, such as normalization with respect to some important physical variables, differencing technique, which is routinely used by statisticians for transforming a non-stationary time series to a stationary one. The normalization method of trend removal with respect to a physical quantity s for systematic physical effects on the soil profiles. In general, the detrending process is not unique. Different trend removal

21

procedures will in most cases result in different values of the random fluctuating components and different shapes of the autocorrelation function. Baecher (1987) commented that the selection of a particular trend function is a decision on how much of the spatial variability in the measurements is treated as a deterministic function of space (i.e., trend) and how much is treated statistically and modelled as random processes. However, the detrending process cannot be entirely arbitrary. After all, the fluctuating components remaining in the detrended soil records must be stationary for meaningful statistical analyses to be undertaken on limited data points. Clearly, the chosen trend function should be reasonable in view of this stationary constraint. The scale of fluctuation or autocorrelation distance evaluated from the non-stationary data is always higher than the corresponding stationary data. In other words, the trend removal invariably reduces the scale of fluctuation of the soil properties. One of the simplest methods to evaluate whether a linear or 2nd order polynomial trend is sufficient to be removed from the experimental data is to calculate the scale of fluctuation for the above both detrended data. If the evaluated scales of fluctuation are closer to each other, a detrending process using the lesser degree polynomial is chosen. In the limit, the scale of fluctuation is zero when the entire profile is treated as a ‘‘trend’’ with zero ‘‘random’’ variation (Phoon et al. 2003a). If a trend is evident in the measurements, it should be decided whether or not it should be removed before statistical analysis of a set of raw data. An observed trend that has no physical or geological basis or is not predictable must not be removed prior to statistical analysis, since it is a part of the uncertainty to be characterized (Fenton 1999b). After selecting a proper trend function for the data, the residuals off the trend are calculated. Phoon et al. (2004) pointed out that trend removal is a complex problem, and there is at present no fully satisfactory solution to it. The identified trend in the data is removed by employing any of the following three widely used detrending methods.

22

3.4.3.1 Decomposition technique In this method the data set is divided into stationary random field and nonstationary trend, by using the results obtained from either a non-parametric test or a parametric test discussed in the last section. Initially a linear trend is selected and removed from the original data. The linearly detrended data is tested for the weak stationarity. If the residuals off the linear trend do not satisfy the stationarity hypothesis, the above procedure is repeated by choosing a higher order polynomial. However, it is suggested that no more than quadratic trend is normally sufficient to transform a non-stationary data set to stationary data set (Jaksa et al. 1999), and keep them fairly stationary, as complete removal of the trend in the data is rarely achieved.

3.4.3.2 Normalization technique Normalisation of the data set with respect to a dominant parameter, such as cone tip resistance, qc, effective overburden pressure, σ vo' , is also used in geotechnical engineering to make the data trend free.

3.4.3.3 Differencing technique In this method, a nonstationary data set is made stationary by using first, second or higher order differencing technique. This method of testing a time series is suggested by Bowerman and O'Connell (1983), which is suitable for data containing no seasonal variations. According to Bowerman and O'Connell (1983) if the sample autocorrelation function for experimental data dies down fairly quickly, the original data set can be treated as stationary. However, if the sample autocorrelation function dies down extremely slow, then the original data set can be transformed to a stationary set by taking first or second difference of original data set. However, the term “fairly quickly” is rather subjective and extensive judgment is involved in it. Moreover, it is observed that if no seasonal variations exist in the data, no more than

23

second difference is rarely needed to transform a nonstationary data to stationary data (Jaksa et al. 1999).

2.4.4 Estimation of autocorrelation Available methods for estimating the sample autocorrelation functions differ in their statistical properties such as the degree of bias, sampling variability, ease of use, computational requirements, etc.. The methods that are commonly used for this purpose are method of moments, Bartlett’s approach, method based on maximum likelihood principle, Geostatistics, etc. However, the method of moments is the most common used to estimate sample correlation function of soil properties.

3.4.4.1 Method of moments A classical way of describing random functions is through the autocorrelation function, ρ(Δz). It is the coefficient of correlation between values of a random function at separation of k. The spatial correlation of a soil property can be modelled as the sum of a trend component and a residual term (Vanmarcke 1977a), as shown in Equation 2.17. x=z+e

17)

where x is the measurement at a given location, z is the trend component, and e is the residual (deviation about the trend). The residuals off the trend tend to exhibit spatial correlation. The degree of spatial correlation among the residuals can be expressed through an autocovariance function.

[

]

c(k ) = E [P (Z i ) − t (Z i )] P (Z j ) − t (Z j )

(18)

where k is the vector of separation distance between point i and j, E[.] is the expectation operator, P(Zi) is the data taken at location i, and t(Zi) is the value of the trend at location i.

24

The normalized form of the autocovariance function given in Equation 19 is known as the autocorrelation function. ρ(k)= c[k]/c[0]

(19)

where c[0] is the autocovariance function at zero separation distance, which is nothing but variance data. It is not possible to evaluate ‘ck’ nor ‘ρk’ with any certainty, but only to estimate them from samples obtained from a population. As a result, the sample autocovariance at lag k, c k* , and sample autocorrelation at lag k, rk, are generally evaluated. The sample autocorrelation function (ACF) is the graph of rk for lags k=0,1,2, …h, where ‘h’ is the maximum number of lags allowable. Generally, ‘h’ is taken as a quarter of total number of data points in time series analysis of geotechnical data (Box and Jenkins 1970; Lumb 1975a). Beyond this number, the number of pairs contributing to the autocorrelation function diminishes and produces unreliable results. The sample ACF at lag k, rk, is generally evaluated using

(

)(

N −k 1 ∑ X i − X X i+k − X ( N − k − 1) i =1 rk = N 2 1 Xi − X ∑ (N − 1) i =1

(

) (20)

)

If no measurement error or noise is present, r becomes equal to 1 at a lag distance of zero. Statistically homogeneous data are used to evaluate the sample autocorrelation functions. The autocorrelation characteristics of soil properties can be characterized either by autocorrelation distance, or scale of fluctuation, which is theoretically equal to the area under the correlation function. The scale of fluctuation (or correlation radius) for one dimensional real field is defined as shown in Equation 21 (Vanmarcke 1977a). ∞

δ = 2∫ ρ (τ ) dτ

(21)

0

25

More generally, the scale of fluctuation δ is defined as the radius of an equivalent “unit step” correlation function, i.e., ρ(τ)=1 for τ≤δ and =0 for τ>δ , τ being the Euclidian lag (JCSS 2000). The autocorrelation distance (or scale of fluctuation) is evaluated from the sample autocorrelation function using method of fitting or based on Bartlett limits, which are described in the following sections. 3.4.4.1.1 Method of fitting

Analytical expressions are fitted to the sample autocorrelation functions using regression analysis based on least square error approach. The least square error is generally characterised by the determination coefficient of the fit. Frequently used single-parameter theoretical auto-correlation functions are exponential, squared exponential, though models such as triangular, second order auto-regressive, spherical, etc. are also not uncommon to fit the sample autocorrelation data in geotechnical engineering. Some of these models are given in Table 3.1. Table 3.1. Theoretical autocorrelation functions used to determine the autocorrelation distance and scale of fluctuation, δ (Jaksa et al. 1999)

Theoretical Model autocorrelation No. function 1

2 3

Triangular Single exponential Double exponential

4

Second-order Markov

5

Cosine exponential

Autocorrelation distance, ρ

Scale of fluctuation, δ

a

a

b

2b

c

πc

⎟ ρ Δz = exp(− Δz / d )⎜⎜1 + d ⎟⎠ ⎝

d

4d

⎛ Δz ⎞ ⎟ ⎝ e ⎠

e

e

Autocorrelation function

ρ Δz

⎧ Δz ⎪1 − =⎨ a ⎪0 ⎩

for Δz ≤ a for Δz ≥ a

ρ Δz = exp(− Δz / b )

(

ρ Δz = exp − ( Δz / c )

2

Δz ⎞

⎛

ρ Δz = exp(− Δz / e )cos⎜

26

)

Table 3.1 shows the autocorrelation distance and corresponding scale of fluctuation for theoretical autocorrelation functions. A small scale of fluctuation (δ) implies rapid fluctuations about the mean and vice versa. and a large reduction in variance over any failure plane; this results in a small “spread” of the performance function. Conversely a large δ means much longer variations about the mean and results in smaller reduction in variance over a failure plane (Mostyn and Soo 1992). 3.4.4.1. 2 Bartlett limits

In the field of time series analysis, the most commonly used method to compute the autocorrelation distance is by Bartlett’s approximation. In this method the computed scale of fluctuation corresponds to two standard errors of the estimate, i.e., the lag distance at which the positive Bartlett’s limits given by Equation 2.21, superimposed on the autocorrelation plot crosses the autocorrelation function (Jaksa et al. 1999). rh = ±

1.96

(22)

N

The scale of fluctuation of cone tip resistance varies from site to site. Moreover, it also varies with type of soil, as Jaksa et al. (2004) reports smaller scales of fluctuation in sands than clays due to their nature of formation. Further, Fenton and Vanmarcke (1998) argue that the scale of fluctuation depends largely on the geological processes of transport of raw materials, layer deposition, and common weathering rather than on the actual property studied. Nonetheless, DeGroot and Baecher (1993) observed that the scale of fluctuation is also function of sampling interval on in-situ measured property.

3.4.5 Effect of anisotropy in correlation scales Most soils in nature are usually anisotropic due to their mode of sedimentation and consolidation that cause preferred particle orientations. There are generally two types of

27

anisotropy. Inherent or initial anisotropy manifests itself in the soil deposits as a result of applied stresses at the time of formulation in the form of first-structure on a macroscopic scale or as a fabric orientation on the microscopic scale. Stress or induced anisotropy arises from changes in the effective stress state produced by subsequent loading history. This anisotropy can cause the elastic, strength and compressibility parameters of the soil deposits to vary with direction, and hence cannot be ignored. The soil properties exhibit large variations and their directional behaviour is observed by many researchers (Vanmarcke 1983; Jaksa et al. 1999; Phoon and Kulhawy 1999a; Griffiths and Fenton 2000; Nobahar and Popescu 2002; Fenton and Griffiths 2003; Jaksa et al. 2004; Sivakumar Babu and Mukesh 2004; and Uzielli et al. 2005; Wei et al. 2005). The autocorrelation distances in vertical and horizontal directions are never the same, but in general, differ by an order of magnitude, with horizontal scale of fluctuation being higher than that in the vertical (Uzielli et al. 2005). Attempts have been made in the literature to formulate autocorrelation models for 1, 2, and 3-dimensional soil space (Vanmarcke 1977a; and Kulathilake and Miller 1987). The effect of anisotropy of soil properties on the bearing capacity in a probabilistic framework has not been studied extensively in the literature. Many times, due to economic feasibility, speed of exploration, availability of equipment and time constraints vertical cone penetration data alone is obtained and used in the evaluation of strength properties (Wei et al. 2005). The autocovariance structure is called isotropic if the normalized autocovariance depends on the Euclidian distances between field points only, instead of the axis directional distance components, components, i.e.,

(

ρ (Δx, Δy, Δz ) = ρ Δx 2 + Δy 2 + Δz 2

)

(23)

28

Isotropy implies that the autocorrelation function is invariant to orthonormal transformation of the field coordinates. Also the autocorrelation structure may be partly isotropic, for example with respect to horizontal field directions:

(

ρ (Δx, Δy , Δz ) = ρ Δx 2 + Δy 2 , Δz

)

(24)

For complete anisotropy, the exponential correlation function in 3-D space is ⎛

Δx

⎝

Dx

ρ (Δx, Δy, Δz ) = exp⎜⎜ −

−

Δy Dy

−

Δz ⎞ ⎟ D z ⎟⎠

(25)

If an isotropy in the horizontal direction is assumed, then the exponential correlation function shown in Equation 2.25 is reduced to

⎛ Δx 2 + Δy 2 Δz ⎞⎟ ⎜ − ρ (Δx, Δy, Δz ) = exp − ⎜ Dh Dz ⎟ ⎝ ⎠

(26)

Similar theoretical autocorrelation functions in 3-D field for other distributions can also be formulated on the similar lines shown above.

3.4.6 Spatial averaging Parameters in geotechnical analyses usually refer to averages of a soil property over a sliding surface or a rupture zone in an ultimate failure analysis or significantly strained volumes in a deformation analysis. If the dimensions of such surfaces or volumes exceed the scales of fluctuation of the soil property, spatial averaging of fluctuations is substantial. This implies that the variance of an averaged soil property over a sliding surface or affected volume is likely to be substantially less than the field variance, which is mainly based on small sample tests (e.g. triaxial tests) or small affected volumes in insitu tests (JCSS 2002). Because of the spatial variability of soil properties, encountering a sufficiently low strength to induce failure in localized areas is more likely than such an encounter over the entire zone of

29

influence. Both the conventional analyses based on the factor of safety and the simplified probabilistic analyses fail to address this issue of scale of failure. Over the depth interval ΔZ the spatial average soil property is given as u ( ΔZ ) =

1 ΔZ

∫ u (ΔZ )dz

(27)

ΔZ

The spatial average of the soil property u(x,y,z) over a volume V is given in the same way as

uv =

1 u ( x, y, z )dxdydz V ∫∫∫ V

(28)

Averaging distance depends on the nature of the problem in hand. For design of shallow foundations in shear criterion, this distance is equal to the extent of shear failure zone within the soil mass (Cherubini 2000). This distance for shallow foundations in cohesionless soil subjected to vertical loading is approximately taken as 2B below the base of footing in the vertical direction and 3.5B from the centre of footing in the horizontal direction, where B is the width of the footing.

3.4.7 Evaluation of variance reduction function The combined effect of spatial correlation and spatial averaging of soil properties over the failure domain are beneficially utilized to reduce the variance of the measured data within the zone of interest. The derivation of the variance reduction functions in of spatial correlation and spatial average is described in the following section. JCSS (2002) presents the evaluation of variance reduction function by both exact approach and simplified approach.

3.4.7.1 Variance reduction for data in 1-D space The variability of soil property ui from point to point is measured by standard deviation σi and the standard deviation of the spatial average property uΔZ is by σΔΖ. The larger the length

30

(or the volume) over which the property is averaged, higher is the fluctuation of ui that tends to cancel out in the process of spatial averaging. This causes reduction in standard deviation as the size of the averaging length or volume increases, which is given by

σ Δz σi

Γu (ΔZ ) =

(29)

A simple relationship of the variance reduction function in of scale of fluctuation and averaging distance is given in Equation 2.30 (Vanmarcke 1977a).

δ

L

ΔZ

δ

Γ ( Δ Z ) = 1 .0

L

Γ 2 ( ΔZ ) = 2

δ

> 1 .0

(30) ≤ 1 .0

The Equation 2.30 indicates that with decrease in scale of fluctuation and increase in averaging distance, the value of variance reduction function reduces, which in turn reduces standard deviation of the spatially averaged soil property. In other words, the more erratic the variation (i.e., less correlated the soil property) of the soil property with distance and larger the soil domain considered, larger will be the reduction in variability of the average property. This phenomenon is a result of the increasing likelihood that unusually high property values at some point will be balanced by low values at other point (Vanmarcke 1977a). However, Vanmarcke (1983) emphasized that the variance reduction function γ(T) is related to the autocorrelation function ρ(τ) as given by.

γ (T ) =

1 T2

T T

∫ ∫ ρ (t

1

− t 2 )dt1 dt 2

(31)

0 0

which reduces to

γ (T ) =

2 ⎛ τ⎞ ⎜1 − ⎟ ρ (τ )dτ T ∫0 ⎝ T ⎠ T

(32)

31

From Equation 2.32, the variance reduction functions for triangular, exponential, and squared exponential autocorrelation functions can be worked out as given in Equations 2.33 to 2.35, respectively. ⎧ T ⎪⎪1 − γ (T ) = ⎨ 3a a ⎞ ⎛ a ⎞⎛ ⎪ ⎜ ⎟⎜1 − ⎟ ⎪⎩ ⎝ T ⎠⎝ 3T ⎠

for T ≤ a

(33)

for T ≥ a

2

⎛ b ⎞ ⎛T ⎞ − 1 + exp(− T / b )⎟ ⎝T ⎠ ⎝ b ⎠

γ (T ) = 2⎜ ⎟ ⎜

⎛d⎞ γ (T ) = ⎜ ⎟ ⎝T ⎠

2

(34)

2 ⎛ ⎛ ⎞ ⎞ ⎜ π T E ⎛⎜ T ⎞⎟ + exp⎜ − ⎛⎜ T ⎞⎟ ⎟ − 1⎟ ⎜ ⎝d⎠ ⎟ ⎟ ⎜ d ⎝d⎠ ⎝ ⎠ ⎠ ⎝

(35)

where a, b, d are referred to as the autocorrelation distances, T is the averaging length, the distance over which the geotechnical properties are averaged over a failure surface, and E(·) is the error function, which increases from 0 to 1 as its argument increases from 0 to ∞. In of standard Gaussian cumulative distribution function E(u)=2[FU(u)-0.5]. As the averaging length, T→∞ the variance reduction function, γ(T) →0. In other words, the chances associated with failure of huge volume of soil are very rare. In addition, γ(T) is inversely proportional to T at very large values of T. The variance reduction factor for averaging in one, 2 or 3-D random field may be approximated as given in Equations .

Γ 2 (L1 K Ln ) = 1 =

αn

(L1 K Ln )

for (L1 K Ln ) ≤ α n

(36)

for (L1 K Ln ) ≥ α n

where n=1, 2, 3, and L1, L2 and L3 are the lengths over which averaging takes place and α1, α2, α3 are the correlation radii. In case of “separable” autocorrelation functions, i.e. which can

32

be written as a multiplication of factors for each of the dimensions of a 2- or 3-D surface or volume, the total variance reduction factor can, for the 3-D case be written as: Γ 2 (L1 L2 L3 ) = Γ 2 (L1 ) Γ 2 (L2 ) Γ 2 (L3 )

(37)

Similar to the above, Vanmarcke (1977a) also proposed an approximate and simplified resultant variance reduction factor in 2-D space as the product of individual variance reduction factors in vertical and horizontal directions in of scale of fluctuation (δ) and spatial averaging distance (L) in the respective directions as shown in Equation (38). ΓA2 = Γv2 × Γh2

(29).

The above propositions have been used in the analysis of spatial variability of soils and the influence of spatial variability in foundation design is presented in Sivakumar Babu et al (2005) Dasaka et al (2005) and Sumanta Haldar and Sivakumar Babu (2006).

33

Learning objectives 1. Understanding the concept of sampling 2. Understand the various sampling techniques, sampling plans. 3. Understanding the significance of making decision based on samplings.

1

MODULE – 4


Motivation: Importance of understanding the various sampling

Highlights: Knowledge of sampling techniques, decision based on sampling

1

4. Sampling 4.1. Concepts of Sampling Many variables in Civil engineering are spatially distributed. For example concentration of pollutants, variation of material properties such as strength and stiffness in the case of concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates of population parameters (e.g. means, variances, covariance’s) to characterize the entire population distribution without observing and measuring every element in the sampled population. Sampling theory for spatial processes principally involves evaluation of estimator’s sampling distributions and confidence limits. A very good introduction to these methods and the uses and advantages of sampling is provided by Cochran (1977) and Beacher and Christian (2003). An estimate is the realization of a particular sample statistic for a specific set of sample observations. Estimates are not exact and uncertainty is reflected in the variance of their distribution about the true parameter value they estimate. This variance is, in turn, a function of both the sampling plan and the sampled population. By knowing this variance and making assumptions about the distribution, shape, confidence limits on true population parameters can be set. A sampling plan is a program of action for collecting data from a sampled population. Common plans are grouped into many types: for example, simple random, systematic, stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or in deg a specific program once the type plan has been chosen, one attempts to obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a fixed precision or a specified confidence interval. 4.2. Common Spatial Sampling Plans Statistical sampling is a common activity in many human enterprises, from the national census, to market research, to scientific research. As a result, common situations are encountered in many different endeavors, and a family of sampling plans has grown up to 1

handle these situations. Simple random sampling, systematic sampling, stratified random sampling, and cluster sampling are considered in the following section. 4.2.1. Simple random sampling The characteristic property of simple random sampling is that individual are chosen at random from the sampled population, and each element of population has an equal probability of being observed. An unbiased estimator of the population mean from a simple random x={x1………..xn} is the sample mean x=

1 n ∑ xi n i =1 --------------------------------(1)

This estimator has sampling variance. Var ( x) =

σ2 N −n n

N

--------------------------------(2)

where σ2 is the (true) variance of the sampled population and N is the total sampled population size. The term (N-n)/N is called the finite population factor, which for n less than about 10% of N, can safety be ignored. However, since σ2 is usually unknown. it is estimated by the sample variance s2 =

1 n ∑ ( xi − x) 2 n − 1 i =1 --------------------------------(3)

in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- offreedom due to estimating the mean from the same data. The estimator is unbiased but does not have minimum variance. The only choice (i.e. allocation) to be made in simple random sampling is the sample size n. Since the sampling variance of the mean is inversely proportional to sample size. Var ( x ) ∝ n −1 , a given estimator precision can be obtained by adjusting the sample size, if σ is known or assumed. A sampling plan can be optimized for total cost by assuming some relationship between Var (x ) and cost in

2

construction or design. A common assumption is that this cost is proportional to the square root of the variance, usually called the standard error of the mean, σ x = Var 1 2 ( x ) . It is usually assumed that the estimates of y and Y are normally distributed about the corresponding population values. If the assumption holds, lower and upper confidence limits for the population mean and total mean are as follows: Mean:

YL = y −

ts 1− f , n

YU = y +

ts 1− f n

Total:

YL = N y −

tNs 1− f , n

YU = N y +

tNs 1− f n

The symbol t is the value of the normal deviate corresponding to the desired confidence probability. The most common values are tabulated below: Confidence probability (%) Normal deviate, t

50

80

90

95

99

0.67

1.28

1.64

1.96

2.58

If the sample size is less than 60, the percentage points may be taken from Student’s t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated s2. The t distribution holds exactly only if the observations yi are themselves normally distributed and N is infinite. Moderate departures from normality do not affect it greatly. For small samples with very skew distributions, special methods are needed. An example of the application is as follows. Example. In a site, the number of borehole data sheets to characterize the substrata to obtain design parameters is 676. In each borehole data, 42 entries reflecting the various characteristics 3

of soils viz. compressibility, shear strength, compaction control, permeability etc are indicated. In an audit conducted, it was revealed that in some datasheets, all the data are not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the results are indicated in Table.1 Table 21 Results for a sample of 50 petition sheets Number of signatures, yi Frequency, fi 42 23 41 4 36 1 32 1 29 1 27 2 23 1 19 1 16 2 15 2 14 1 11 1 10 1 9 1 7 1 6 3 5 2 4 1 3 1 ∑ fi 50 We find n = ∑ fi = 50,

y = ∑ fi yi = 1471,

∑ fi yi 2 = 54,497

Hence the estimated total number of signatures is Y =Ny=

( 676 )(1471) = 19,888 50

For the sample variance s2 we have s2 =

(∑ fi yi ) 2 ⎤ 1 1 ⎡ 2 − [∑ fi ( yi − y ) 2 ] = f y ⎢∑ i i ⎥ n −1 n − 1 ⎢⎣ ∑ fi ⎥⎦

4

=

1 ⎡ (1471) 2 ⎤ 54, 497 − = 229.0 49 ⎢⎣ 50 ⎥⎦

The 80% confidence limits are given by 19,888 ±

(1.28)( 676 )(15.13) 1 − 0.0740 tNs 1 − f = 19,888 ± n 50

This gives 18,107 and 21,669 for the 80 % limits. A complete count showed 21,045 entries and is close to the upper estimate.

4.2.2. Systematic sampling In systematic sampling the first observation is chosen at random and subsequent observations are chosen periodically throughout the population. To select a sample of n units, we take a unit at random from the first k units and every kth unit thereafter. The method involves the selection of every kth element from a sampling frame, where k, the sampling interval, is calculated as: k = population size (N) / sample size (n) Using this procedure each element in the population has a known and equal probability of selection. This makes systematic sampling functionally similar to simple random sampling. It is however, much more efficient (if variance within systematic sample is more than variance of population) and much less expensive to carry out. The advantages of this approach are that 1) the mistakes in sampling are minimized and the operation is speedy, 2) it is spread uniformly over the population and is likely to be more precise than the random sampling. An unbiased estimate of the mean from, a systematic sample is the same as above equation .The sampling variance of this estimate is

5

⎛ N − 1 ⎞ 2 ⎛ k (n − 1) ⎞ Var ( x ) = ⎜ ⎟ ⎟σ w + ⎜ ⎝ N ⎠ --------------------------------(4) ⎝ N ⎠

where it is the interval between samples (k = N/n) and σ w2 is the variance of elements within the same systematic sample 2 1 k ∑ j =1 ( xij − x) s = ∑ k i =1 n −1 --------------------------------(5) n

2 w

in which xij is the jth member of the ith interval of the sample. When only one systematic sample has been taken (i.e. one set of n observations at spacing k) the variance of the mean cannot be evaluated unless an assumption is made about the nature of the sampled population. The conventional assumption is that the population can be modeled by a linear expression of the form, xi = μ + ei , in which μ is the mean and ei is a zero-mean random perturbation. For constant mean, this leads to (Cochran 1977) 2 1 ⎛ N − n ⎞⎛⎜ ∑ ( xi − x) Var ( x) = ⎜ ⎟ n ⎝ N ⎠⎜⎝ n −1

⎞ ⎟ ⎟ ⎠ --------------------------------(6)

and for linearly trending mean, μ = μ 0 + bi

Var ( x) =

1 ⎛ N − n ⎞⎛⎜ ∑ ( xi − 2 xi + k + xi + 2 k ) ⎞⎟ ⎜ ⎟ ⎟ n ⎝ N ⎠⎜⎝ 6( n − 2) ⎠

--------------------------------(7)

One must ensure that the chosen sampling interval does not hide a pattern. Any pattern would threaten randomness. A random starting point must also be selected. Systematic sampling is to be applied only if the given population is logically homogeneous, because systematic sample units are uniformly distributed over the population. Example: Suppose the auditor in the previous example wants to use systematic sampling, then he can choose every 25th or 50th sheet and conduct the study on this sample.

6

A starting point is chosen at random, and thereafter at regular intervals. For example, suppose you want to sample 25th sheet from 676 sheets, 676/25=27, so every 27th sheet is chosen after a random starting point between 1 and 15. If the random starting point is 11, then the sheets selected are 11, 28, 65, 92 etc. 4.2.3. Stratified random sampling When sub-populations vary considerably, it is advantageous to sample each subpopulation (stratum) independently. Stratification is the process of grouping of the population into relatively homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then random or systematic sampling is applied within each stratum. This often improves the representativeness of the sample by reducing sampling error. It can produce a weighted mean that has less variability than the arithmetic mean of a simple random sample of the population. There are several possible strategies: 1) Proportionate allocation uses a sampling fraction in each of the strata that is proportional to that of the total population. If the soil sample consists of 60% of boulders (boulder stratum) and 40% sand (sand stratum), then the relative size of the two types of samples should reflect this proportion. 2) Optimum allocation (or Disproportionate allocation) - Each stratum is proportionate to the standard deviation of the distribution of the variable. Larger samples are taken in the strata with the greatest variability to generate the least possible sampling variance. Estimates of the total population characteristics can be made by combining the individual stratum estimates. For certain populations, stratifying before sampling is more efficient than taking samples directly from the total population. Sampling plans that specify a simple random sample in each stratum are called stratified random sampling plans.

7

An unbiased estimator of the mean of the total sampled population is x=

1 N

m

∑N h =1

h

x h ------------------- (8)

Where x is the population mean, in is the number of strata and h denotes the stratum (i.e. N is the size of the hth stratum, and x h is the corresponding mean). The variance of this estimate is s h2 Var ( x) = ∑ w (1 − f h ) nh h --------------------------------(9) 2 h

where wh = N h / N and f h = nh / N h Since the sample from each stratum is simple random the estimate of the variance within each can be taken from above equation. Then, an estimate of the variance of the total population is Var ( x) =

1 N

∑s

2 h

2 + s amongmeans

h

--------------------------------(10)

4.2.4. Cluster sampling In cluster sampling, aggregates or clusters of elements are selected from the sampled population as units rather than as individual elements, and properties of the clusters are determined. From the properties of the clusters, inferences can be made on the total sampled population. Plans that specify to measure every element within clusters are called single-staged-cluster plans, since they specify only one level of sampling: plans that specify that cluster properties be estimated by simple random sampling are called two-staged cluster plans, since they specify two levels of sampling. Higher order cluster plans are sometimes used. We consider simplest case first: of M possible clusters, in are selected: the ratio

f1 = m M called, as before, the sampling fraction. Each cluster contains the same

8

number of elements, N, some number n of which are selected for measurement (f2= n/N). An unbiased estimate of the average of each cluster is xi =

1 ni

∑x

----------------------------------(11)

ij

j

where, xij is the jth element of the ith cluster. An unbiased estimate of the average of the total population is xi =

1 1 xj = ∑ ∑∑ xij m i mn i j

and the variance of this estimator is

Var ( x) =

(1 − f1 ) 2 f1 (1 − f 2 ) 2 s1 + s2 m mn

in which s12 is the estimated variance among cluster means

2 1

s

∑ (x =

i

−x

)

2

n −1

and s 22 the estimated variance within clusters

s

2 2

∑∑ (x =

ij

− xi

m(n − 1)

)

2

----------------------------(12)

In the more general case, not all of the clusters are of equal size. For example, the numbers of ts appearing in different outcrops are different. With unequal sized clusters the selection plan for clusters is not as obvious as it was previously. The relative probability, of selecting different sized clusters is now a parameter of the plan. Commonly, the zj are either taken all equal (simple random sampling of the clusters) or proportional to size. The precisions of these two plans are different. For selection with equal probability an unbiased estimate of the true total population mean is 9

∑N x x= ∑N i

i

i

i

i

and the variance of this estimator is

var( x ) =

(1 − f1 ) ( N i xi − x m ) 2 f1 + 2 2 ( m − 1) mN m2 N

N i2 (1 − f 2i ) s 22i ∑ ni

in which N is the average value of Ni and x m = ∑i N i x i / m if the assumption is made that ni ∝ N i then the plan is self weighting and this simplifies to

(1 − f i ) Σ(N i xi − xm )

2

Var ( x) =

mN

(m − 1)

2

+

f1 (1 − f 2 ) 2 2 ΣN i s 2i mnN

--------------------------------(14)

This assumption (ni α N i ) is frequently valid: for example, proportionally more ts are typically sampled from larger outcrops than from smaller outcrops. For selection with probability proportional to size, an unbiased estimate of the total population mean is x=

1 ∑ xi n i

and the variance is Σ (x i − x ) m(n − 1)

2

Var ( x ) =

--------------------------------(15)

In all eases, the variance of the total population can be estimated from the variances between elements within clusters and the variance between the means of the clusters:

σ 2 = σ 2 among means + σ 2 within clusters

10

t surveys and many other types of civil engineering sampling may be based on cluster plans because the cost of sampling many individual ts on one outcrop (i.e. a cluster) is less than the cost of traveling between outcrops. The variance of geological populations is often a function of spatial extent. Indeed, this is the principal argument in the geo-statistical literature for favoring variograms over auto covariance functions. If we consider the strength of soil specimens taken dose together, the variance among specimens is usually smaller than the variance among specimens taken from many locations in one area of the site, which, in turn, is smaller than the variance among specimens taken from all across the site. Cluster techniques allow us to evaluate variance as a function of the “extent’’ of the distribution in space by nesting the variances.

11

Learning objectives 1. Understanding importance of reliability analysis.

2. Understanding different levels of reliability.

3. Understanding the various methods adopted such as Moment method (FOSM), Hasofer-Lind approach.

1

MODULE -5 Motivation and Highlights

Motivation: Importance of understanding the concept of reliability analysis

Highlights Knowledge of such as Levels of reliability, FOSM .

1

5. Levels of reliability methods There are different levels of reliability analysis, which can be used in any design methodology depending on the importance of the structure. The term 'level' is characterized by the extent of information about the problem that is used and provided. The methods of safety analysis proposed currently for the attainment of a given limit state can be grouped under four basic “levels” (namely levels IV, III, II, and I ) depending upon the degree of sophistication applied to the treatment of the various problems. 1. In level I methods, the probabilistic aspect of the problem is taken into by introducing into the safety analysis suitable “characteristic values” of the random variables, conceived as fractile of a predefined order of the statistical distributions concerned. These characteristic values are associated with partial safety factors that should be deduced from probabilistic considerations so as to ensure appropriate levels of reliability in the design. In this method, the reliability of the design deviate from the target value, and the objective is to minimize such an error. Load and Resistance Factor Design (LRFD) method comes under this category. 2. Reliability methods, which employ two values of each uncertain parameter (i.e., mean and variance), supplemented with a measure of the correlation between parameters, are classified as level II methods. 3. Level III methods encom complete analysis of the problem and also involve integration of the multidimensional t probability density function of the

1

random variables extended over the safety domain. Reliability is expressed in of suitable safety indices, viz., reliability index, β and failure probabilities. 4. Level IV methods are appropriate for structures that are of major economic importance, involve the principles of engineering economic analysis under uncertainty, and consider costs and benefits of construction, maintenance, repair, consequences of failure, and interest on capital, etc. Foundations for sensitive projects like nuclear power projects, transmission towers, highway bridges, are suitable objects of level IV design. 5. 1. Space of State Variables For analysis, we need to define the state variables of the problem. The state variables are the basic load and resistance parameters used to formulate the performance function. For ‘n’ state variables, the limit state function is a function of ‘n’ parameters . If all loads (or load effects) are represented by the variable Q and total resistance (or capacity) by R, then the space of state variables is a two-dimensional space as shown in Figure 1. Within this space, we can separate the “safe domain” from the “failure domain”; the boundary between the two domains is described by the limit state function g(R,Q)=0. Since both R and Q are random variables, we can define a t density function fRQ (r, q). A general t density function is plotted in Figure 2. Again, the limit state function separates the safe and failure domains. The probability of failure is calculated by integration of the t density function over the failure domain [i.e., the region in which g(R, Q) <0]. As noted earlier, this probability is often very difficult to evaluate, so the concept of a reliability index is used to quantify structural reliability.

2

Figure 1 - Safe domain and failure domain in two dimensional state spaces.

Figure 2 - Three-dimensional representation of a possible t density function fRQ 5.3. RELIABILITY INDEX Reduced Variables It is convenient to convert all random variables to their “standard form;’ which is a non dimensional form of the variables. For the basic variables R and Q, the standard forms can be expressed as

3

ZR =

R − μR

σR R − μQ ZQ = σQ

-------------------------------- (1)

The variables ZR and ZQ, are sometimes called reduced variables. By rearranging Equation no.1, the resistance R and the load Q can be expressed in of the reduced variables as follows:

R = μ R + Z Rσ R R = μ Q + Z Qσ Q

-------------------------------- (2)

The limit state function g(R, Q) = R-Q can be expressed in of the reduced variables by using Eqs. 2. The result is

g ( Z R , Z Q ) = μ R + Z Rσ R − μ Q − Z Qσ Q = ( μ R − μ Q ) + Z Rσ R − Z Qσ Q

------------(3)

For any specific value of g(ZR, ZQ), Equation no.3 represents a straight line in the space reduced variables ZR and ZQ. The line corresponding to g(ZR, ZQ) =0 separates the safe and failure domain in the space of reduced variables. The loads Q and resistances R are some times indicated in of capacity C and demand D as well in literature. Design for a given reliability index

⎛ C−D ⎞ ⎟ Ps = R = 1 − φ ⎜ ⎜ σ 2 +σ 2 ⎟ C D ⎠ ⎝ ⎡ ⎧⎪ C 1 + V 2 ⎫⎪ D ⎢ ln ⎨ ⎬ 2 D VC ⎪⎭ ⎢ ⎪⎩ Ps = 1 − Pf = 1 − φ ⎢ 2 2 ⎢ ln 1 + V D 1 + VC ⎢ ⎢⎣

(

we define CFS =

)(

if var iable are normal

)

− − − − − − (1)

⎤ ⎥ ⎥ ⎥ if C and D are log normal − − − − − − − −(2 ) ⎥ ⎥ ⎥⎦

C and write (1) and (2) in of CFS D

4

β=

CFS − 1

(CFS ) V 2

2 C

+V

for C and D are normal

2 D

⎛ 1 + VD2 ⎞⎟ ⎜ ln CFS ⎜ 1 + VC2 ⎟⎠ ⎝ β= ln 1 + VD2 1 + VC2

(

)(

)

5.3.1. General Definition of the Reliability Index

A version of the reliability index was defined as the inverse of the coefficient of variation The reliability index is the shortest distance from the origin of reduced variables to the is illustrated in Figure 3, line g(ZR, ZQ) = 0 .This definition, which was introduced by Hasofer and Lind (1974) following formula: Using geometry we can calculate the reliability index (shortest distance) from the following formula:

β=

μ R − μQ σ R2 + σ Q2

-------------------------------- (4)

where β is the inverse of the coefficient of variation of the function g(R, Q) = R-Q When R and Q are uncorrelated for normally distributed random variables R and Q, it can be shown that the reliability index is related to the probability of failure by

β = −Φ −1 ( Pf ) or

Pf = Φ ( − β )

5

Figure 3 - Reliability index defined as the shortest distance in the space of reduced variables.

Table 1 provides an indication of how β varies with Pf. Table 1- Reliability index β and probability of failure Pf Pf

β

10-1

1.28

10-2

2.33

10-3

3.09

10-4

3.71

10-5

4.26

-6

10

4.75

10-7

5.19

10-8

5.62

10-9

5.99

The definition for a two variab1e case can be generalized for n variables as follows. Consider a limit state function

g(X1, X2…… Xn), Where the Xi variables are all

uncorrelated. The Hasofer-Lind reliability index is defined as follows: 1. Define the set of reduced variables {Z1, Z2, . . . , Zn} using

6

Zi =

Xi − μX

σX

i

i

--------------------------------(5)

2. Redefine the limit state function by expressing it in of the reduced variables (Z1,Z2,..,,Zn). 3. The reliability index is the shortest distance from the origin in the n-dimensional space of reduced variables to the curve described by g(Z1, Z2, . . . , Zn) = 0.

5.4. First-order second moment method (FOSM) This method is also referred to as mean value first-order second moment (MVFOSM) method, and it is based on the first order Taylor series approximation of the performance function linearized at the mean values of the random variables. It uses only secondmoment statistics (mean and variance) of the random variables. Originally, Cornell (1969) used the simple two variable approaches. On the basic assumption that the resulting probability of Z is a normal distribution, by some relevant virtue of the central limit theorem, Cornell (1969) defined the reliability index as the ratio of the expected value of Z over its standard deviation. The Cornell reliability index (βc) is the absolute value of the ordinate of the point corresponding to Z = 0 on the standardized normal probability plot as given in Figure 4 and equation.

7

z=0

fZ(z) Limit surface

σZ βc

μZ

Figure 4- Definition of limit state and reliability index

βc =

μ − μS μz -----------------------------------(6) = R σz σ R2 + σ S2

On the other hand, if the t probability density function fX(x) is known for the multi variable case, then the probability of failure pf is given by

p f = ∫ f X ( x) dX -----------------------------------(7) L

where L is the domain of X where g(X)<0. In general, the above integral cannot be solved analytically, and an approximation is obtained by the FORM approach. In this approach, the general case is approximated to an ideal situation where X is a vector of independent Gaussian variables with zero mean and unit standard deviation, and where g(X) is a linear function. The probability of failure pf is then: n

p f = P(g(X) < 0) = P(∑ α i X i − β < 0) = Φ(−β) -----------------------------------(8) i =1

8

where αi is the direction cosine of random variable Xi, β is the distance between the origin and the hyper plane g(X)=0, n is the number of basic random variables X, and Φ is the standard normal distribution function. The above formulations can be generalized for many random variables denoted by the vector X. Let the performance function is in the form given as Z= g(X) = g (X1, X2….Xn) -----------------------------------(9) A Taylor series expansion of the performance function about the mean value is given by equation.

∂g 1 n n ∂2g ( X i − μ X i ) + ∑∑ ( X i − μ X i )( X j − μ X j ) + ......... ------(10) 2 i =1 j =1 ∂X i ∂X j i =1 ∂X i n

Z = g (μ X ) + ∑

where derivatives are evaluated at the mean values of the random variables (X1, X2… Xn) and μ X i is the mean value of Xi. Truncating the series in linear , the first order mean and variance of Z can be obtained as

μ Z ≈ g ( μ X , μ X ,.....μ X ) 1

2

n

n

∂g ∂g var( X i , X j ) -----------------------------------(11) j =1 ∂X i ∂X j n

and, σ Z2 ≈ ∑∑ i =1

where var (Xi, Xj) is the covariance of Xi and Xj. If the variances are uncorrelated, then the variance for z is given as

9

⎛ ∂g σ ≈ ∑ ⎜⎜ i =1 ⎝ ∂X i 2 Z

n

2

⎞ ⎟⎟ var( X i ) ----------------------------------- (12) ⎠

The reliability index can be calculated by taking the ratio of mean (μ Z ) and standard deviation of Z (σ z ) as in Equation 2.66.

β=

μz σz

5.4.1. Linear limit state functions

Consider a linear limit state function of the form n

g ( X 1 , X 2 , X 3 .......... X n ) = a0 + a1 X 1 + a2 X 2 + a3 X 3 ............an X n = a0 + ∑ ai X i i =1

where the ai (i = 0, 1, 2……. n) are constants and the Xi are uncorrelated random variables. If we apply the three-step procedure outlined above for determining the Hasofer-Lind reliability index, we would obtain the following expression for β: n

β=

a 0 + ∑ ai μ X i =1

n

∑ (a σ i =1

i

Xi

i

)2 -------------------------------- (13)

Observe that the reliability index ,β , in Eq. 5.18 depends only on the deviations of the random variables Therefore this is called a second moment measure of structural safety because

only

the

first

two

moments

(mean

and

variance)

to calculate . There is no explicit relationship between β and the distributions of the type of probability to calculate β variables. If the random variables are all normally distributed

10

and uncorrelated then this formula is exact in the sense that β and Pf are related by Eq. 5.15. Otherwise, Eq. 5.15 Provides only an approximate means of Probability of failure. The method discussed above has some limitations and deficiencies. It does not use the distribution information about the variable and function g( ) is linearized at the mean values of the Xi variables. If g( ) is non-linear, neglecting of higher order term in Taylor series expansion introduces significant error in the calculation of reliability index. The more important observation is that the Equations 2.58 and 2.64 do not give constant value of reliability index for mechanically equivalent formulations of the same performance function. For example, safety margin R-S<0 and R/S <1 are mechanically equivalent yet these safety margins will not lead to same value of probability of failure. Moreover, MVFOSM approach does not use the distribution information about the variables when it is available. Nonlinear limit state functions

Now consider the case of a nonlinear limit state function. When the function is nonlinear, we can obtain an approximate answer by linearizing the nonlinear function using a Taylor series expansion. The result is n

g ( X 1 , X 2 , X 3 ........ X n ) ≈ g ( x1* , x2* .....xn* ) + ∑ ( X i − xi* ) i =1

where

∂g ∂X i

evaluated at ( x1* , x2* ..... xn* )

( x1* , x2*.....xn* ) is the point about which the expansion is performed. One choice for

this linearization point is the point corresponding to the mean values of the random variables.

Thus

Eq.

5.19 n

g( X1 , X 2 , X 3 ........X n ) ≈ g(μx1 , μx2 ,.............μxn ) + ∑( X i − μxi ) i =1

becomes

∂g ∂X i

evaluatedat ( x1* , x2* .....xn* )

11

Since Eq. 5.20 is a linear function of the Xi variables, it can be rewritten to be exactly like Eq. 5.17. Thus Eq. 5.18 can be used as an approximate solution for ti reliability index . After some algebraic manipulations, the following expression for results:

β=

g ( μ X , μ X ,.......μ X ) 1

2

n

n

∑ (a σ i =1

i

Xi

)2

where ai =

∂g ∂X i

evaluated at ( x1* , x2* ..... xn* )

The reliability index defined in Eq. 5.21 is called a first-order second-moment mean value reliability index. It is a long name, but the underlying meaning of each part of the name is very important: First order because we use first-order in the Taylor series expansion , Second moment because only means and variances are needed. Mean value because the Taylor series expansion is about the mean values. 5.4.2. Comments on the First-Order Second-Moment Mean Value Index

The first-order second-moment mean value method is based on approximating non normal CDFs of the state variables by normal variables, as shown in Figure 5.15 for the

12

Figure 5 - Mean value second - moment formulation.

simple case in which g(R, Q) = R - Q. The method has both advantages and disadvantages in structural reliability analysis. Among its advantages: 1. It is easy to use. 2. It does not require knowledge of the distributions of the random variables.

Among its disadvantages: 1. Results are inaccurate if the tails of the distribution functions cannot be approximate by a normal distribution. 2. 2. There is an invariance problem: the value of the reliability index depends on the specific form of the limit state function. The invariance problem is best clarified by an example.

13

5.5. Advanced first-order second moment method (AFOSM) It is essential that irrespective of method of evaluation of reliability of a limit state, all the mechanically equivalent performance functions must produce same safety indices. However the MVFOSM method fails to satisfy the above condition in some cases, such as in case of correlated variables and nonlinear limit state formulations. Hence, a new approach, called Hasofer-Lind reliability index (Hasofer and Lind 1974) was developed to tackle the problem of variant reliability indices produced using Cornell index. In this method the reduced variables are defined as given in Equation 2.67.

X i' =

Xi − μXi

σX

, i=1, 2 …n

-----------------------------------(14)

i

where X i' is a random variable with zero mean and unit standard deviation. The above equation is used to transform the original limit state g(X) =0 to reduced limit state g’(X) =0. X is referred to as the original co-ordinate system and X′ reduced co-ordinate system. Note that if Xi is normal in original co-ordinate system it will be standard normal in reduced co-ordinate system. The Hasofer-Lind reliability index (βHL) can be defined as the minimum distance from the origin of the axes in the reduced co-ordinate system to the limit state surface. The minimum distance point on the limit state surface is called the design point or checking point. Considering the limit state function in two variables as given in Equation 2.68, wherein R and S should be normal variables, the reduced variables can be written as given in equations.

14

Z = R−S =0 R' =

R − μR

S' =

S − μS

σR

σS

Substituting values of R ′ and S ′ in the above equation, the limit state equation in the reduced co-ordinate system can be written as g ( ) = σ R R ' − σ S S ' + μ R − μ S = 0 -----------------------------------(15)

The position of the limit state surface relative to the origin in the reduced coordinate system is a measure of the reliability of the system. By simple trigonometry, the distance of the limit state line from the origin can be calculated and it will give the reliability index value.

β HL =

μR − μS σ R2 + σ S2

-----------------------------------(16)

This is same as the reliability index defined by the MVFOSM method, if both R and S are normal. In this definition the reliability index is invariant, because regardless of the form in which the limit state equation is written, its geometric shape and the distance from the origin remains constant. To be specific, β is the First-order second moment reliability index, defined as the minimum distance from the origin of the standard, independent normal variable space to the failure surface as discussed in detail by Hasofer and Lind (1974). Figure 2.5 shows the plot depicting the functional relationship between probability of failure (pf) and

15

reliability index (β), and classifies the performance of designs based on these two values. As seen from the figure, the performance is high if the reliability index is equal to 5, which corresponds to a probability of failure of approximately 3×10-7. 5.5.1. Hasofer-Lind Reliability Index

Hasofer and Lind proposed a modified reliability index that did not exhibit the invariance problem illustrated in Example 5.3. The “correction” is to evaluate the limit state function at a point known as the “design point” instead of the mean values. The design point is a point on the failure surface g = 0. Since this design point is generally not known a priori, an iteration technique must be used (in general) to solve for the reliability index. 5.5.2. AFOSM Method for Normal Variables

The Hasofer-Lind (H-L) method is applicable for normal random variables. It first defines the reduced variables as

X i' =

Xi − μX

σX

i

(i = 1,2,......., n) -----------------------------------(17)

i

'

where X i is a random variable with zero mean and unit standard deviation. Above equation is used to transform the original limit state g(X) = 0 to the reduced limit state g(X`)= 0. The X coordinate system is referred to as the original coordinate syatem. The X` coordinate system is referred to as the transformed or reduced coordinate system. `

Note that if Xi is normal, X i is standard normal.. The safety index

β Hi is defined as the

minimum distance from the origin of the axes in the reduced coordinate system to the limit state surface (failure surface). It can he expressed as

β Hi = ( x `* ) I ( x `* ) -----------------------------------(18)

16

The minimum distance point on the limit state surface is called the design point or checking point. It is denoted by vector x* in the original coordinate system and by vector x`* in the reduced coordinate system. These vectors represent the values of all the random variables, that is, X1, X2. ..., Xn, at the design point corresponding to the coordinate system being used. This method can be explained with the help of figure shown Consider the linear limit state equation in two variables. Z=R—S=0 This equation is similar to above equation. Note that R and S need not be normal variables. A set of reduced variables is introduced as

R`=

R − μR

σR

-----------------------------------(19)

and

S `=

S − μS

σS

-----------------------------------(20)

Figure 6 - Hasofer -Lind Reliability Index: Linear Performance Function

17

If we substitute these into above equation the limit state equation in the reduced coordinate system becomes g ( ) = σ R R'−σ S S '+ μ R − μ S = 0 -----------------------------------(21) The transformation of the limit state equation from the original to the reduced coordinate system is shown in above figure. The safe and failure regions are also shown. From the figure, it is apparent that if the failure line (limit state line) is closer to the origin in the reduced coordinate system the failure region is larger and if it is farther away from the origin, the failure region is smaller. Thus, the position of the limit state surface relative to the origin in the reduced coordinate system is a measure of the reliability of the system. The coordinates of the intercepts of above equation on the R` and S` axes can be shown to

be

[− (μ R − μ S ) / σ R ,0] and [0, (μ R − μ S ) / σ S ] ,

respectively

.Using

simple

trigonometry, we can calculate the distance of the limit state line from the origin as

β HL =

μR − μS σ R2 + σ S2

-----------------------------------(22)

This distance is referred to as the reliability index or safety index. It is the same as the reliability index defined by the MVFOSM method in above equation if both R and S are normal variables. However, it is obtained in a completely different way based on geometry. It indicates that if the limit state is linear and if the random variables R and S arc normal, both methods will give an identical reliability or safety index. In general, for many random variables represented by the vector X = (x1,x2,……..xn) in the originates the safe state and g(X’) < 0 denotes the failure state, Again, the HasoferLind reliability index is defined coordinated system and X = X 1 , X 2 , X 3 ........X n in `

`

`

`

`

the reduced coordinate system the limit state g(X`) = 0 is a nonlinear function as shown `

in the reduced coordinates for two variables in figure .At this stage, X i ' s are assumed to be uncorrelated. Here g(X`) > 0 denoted as the minimum distance from the origin to the design point on the limit state in the reduced coordinates and can be expressed by above equation , where x` - represents the coordinates of the design point or the point of minimum distance from the origin to the limit state. In this definition the reliability index is invariant, because regardless of the form in which the limit state equation is written, its

18

geometric shape and the distance from the origin remain constant. For the limit state surface where the failure region is away from the origin,

Figure 7 - Hasofer - Lind Reliability Index: Nonlinear Performance Function it is easy to see from figure that x` is the most probable failure point. The Hasofer-Lind reliability index can be used to calculate a first-order approximation of the failure probability as

This is the integral of the standard normal density function

along the ray ing the origin and x`* it is obvious that the nearer x`* is to the origin, the larger is the failure probability. Thus the minimum distance point on the limit state surface is also the most probable failure point. The point of minimum distance from the origin to the limit state surface, x`*, represents the worst combination of the stochastic variables and is appropriately named the design point or the most probable point (MPP) of failure. For nonlinear limit states, the computation of the minimum distance becomes an optimization problem: Minimized D = x 'l x'

Subjected to the constra int g ( x ) = g ( x') = 0 where x` represents the coordinates of the checking point on the limit state equation in the reduced coordinates to be estimated. Using the method of Lagrange multipliers, we can obtain the minimum distance as

19

⎛ ∂g ⎞ ⎟ x ⎜⎜ ∑ ∂X i' ⎟⎠ i =1 ⎝ =− n

β HL

*

'* i

⎛ ∂g ⎞ ⎟ ' ⎟ i =1 ⎝ i ⎠ n

∑ ⎜⎜ ∂X

-----------------------------------(23)

2*

(∂g / ∂X ) is the ith partial derivative evaluated at the design point with coordinates (x , x ..... x ) . The asterisk after the derivative indicates that it is evaluated at (x , x .....x ) . The design point in the reduced coordinates is given by: ' * i

Where

'* 1

'* 1

'* 2

'* 2

'* n

'* n

xi'* = −α i β HL (i = 1,2,.......n )

Where

αi =

⎛ ∂g ⎞ ⎜⎜ ⎟ ' ⎟ ⎝ ∂X i ⎠

*

⎛ ∂g ⎞ ⎟ ' ⎟ i =1 ⎝ i ⎠ n

∑ ⎜⎜ ∂X

2*

- ----------------------------------(24)

are the direction cosines along the coordinate axes X i' in the space of the original coordinates and using equation, we find the design point to be

xi* = μ X i − α iσ X i β HL An algorithm was formulated by Rackwitz (1976) to compute β HL and xi'* as follows: • Step 1. Define the appropriate limit state equation. • Step 2. Assume initial values of the design point xi'* , i = 1,2,...........n . Typically, the initial design point may be assumed to be at the mean values of the random variables.

(

)

Obtain the reduced variates xi'* = xi − μ X i σ X i . • Step 3. Evaluate (∂g ∂X i'* ) and α i at xi'* • Step 4. Obtain the new design point xi'* in of β HL as in equation. • Step 5. Substitute the new xi'* in the limit state equation g(x`*) = 0 and solve for β HL .

20

• Step 6. Using the value β HL obtained in Steps 5. Re-evaluate xi'* = −α i β HL • Step 7. Repeat Steps 3 through 6 until converges. This algorithm is shown geometrically in figure. The algorithm constructs a linear approximation to the limit state at every search point and finds the distance from the origin to the limit state. In figure, Point B represents the initial design point. usually assumed to be at the mean values of the random variables, as noted in Step 2. Note that B is not on the limit state equation g ( X ' ) = 0 the tangent to the limit state at B is represented by the line BC. Then AD will give an estimate of β HL in the first iteration, as noted in Step 5. As the iteration continues,

value converges.

Ditlevsen (1979a) showed that for a nonlinear limit state surface lacks comparability; the ordering of β HL values may not be consistent with the ordering of actual reliabilities. An example of this is shown in figure with two limit state surfaces: one flat and the other curved. The shaded region to the right of each limit state represents the corresponding failure region. Clearly, the structure with the flat limit state surface has a different reliability than the one with the curved limit state surface; however, the β HL values are identical for both surfaces and suggest equal reliability. To overcome this inconsistency, Ditlevsen (1979a) introduced the generalized reliability index, β g defined as

⎡

⎤ ' ' ' ' ' ' ......... φ x φ x ........ φ x d x dx ....... x ⎥ -----------------------------------(25) 1 2 n 1 2 n ∫ ∫ ⎢⎣ g (X ' )>0 ⎥⎦

β g = Φ −1 ⎢

( )( )

( )

where Φ and φ are the cumulative distribution function and the probability density function of a standard normal variable. Respectively, Because the reliability index in this definition includes the entire sale region, it provides a consistent ordering of secondmoment reliability. The integral in the equation looks similar to that in equation and is difficult to compute directly. Hence, Ditlevsen (1979a) proposed approximating the nonlinear limit state by a polyhedral surface consisting of tangent hyper-planes at selected points on the surface.

21

Figure 8 - Algorithm for finding βHL Note: A number in parentheses indicates iteration numbers

Consider a limit state function g(X1, X2, . . . , Xn) where the random variables Xi are all uncorrelated. (If the variables are correlated, then a transformation can be used to obtain uncorrelated variables. See Example 5.15.) The limit state function is rewritten in of the standard form of the variables (reduced variables) using

Zi =

Xi − μX

σX

i

i

As before, the Hasofer-Lind reliablity index is defined as the shortest distance from the origin of the reduced variable space to the limit state function g = 0. Thus far nothing has changed from the previous presentation of the reliability index. In fact, if the limit state function is linear, then the reliability index is still calculated as in Eq 5.18 n

β=

a 0 + ∑ ai μ X i =1

n

∑ (a σ i =1

i

Xi

i

)2

-------------------------------- (26)

22

If the limit state function is nonlinear , however, iteration is required to find the design point

{Z1* , Z 2* ......Z n* } in reduced variable space such that still corresponds to the

Shortest distance. This concept is illustrated in Figures 5.17 through 5.19 for the case of two random variables.

Figure 9 - Hasofer - Lind reliability index

Figure 10 - Design point on the failure boundary for the linear limit state function g = R – Q

23

Figure 11 - Design point and reliability index for highly nonlinear limit state function.

The iterative procedure requires us to solve a set of (2n + 1) simultaneous equation with (2n + I) unknowns: β,α1, α2 ……., αn , Z 1* , Z 2* .......Z n* where

−

αi =

∂g ∂Z i

⎛ ∂g ⎜ ∑ ⎜ k =1 ∂Z k ⎝ n

evaluated at design po int

⎞ ⎟ ⎟ evaluated at design po int ⎠

2

∂g ∂g ∂X i ∂g = = σ Xi ∂Z i ∂X i ∂Z i ∂X i n

∑ (α ) i =1

2

i

z i* = βα i

(

)

g z1* , z 2* .............z n* = 0

Equation 5.23b is just an application of the chain rule of differentiation Equation 5.23c is a requirement on the values of the αi variables, which can be confirmed by looking at

24

Eq.5.23a. Equation 5.25 is a mathematical statement of the require that the design point must be on the failure boundary . There are two alternative procedures for Performing the iterative analysis: the simultaneous equation procedure and the matrix procedure The steps in the simultaneous

equation procedure are as follows: 1. Formulate the limit state function and appropriate parameters for all random variables involved 2.

Express

the

limit

state

function

in

of

reduced

variates

zi.

3. Use Eq. 5.24 to express the limit state function in of β and αi. 4 Calculation then a values. Use Eq. 5.24 here also to express each αi as a function of all β and αi. 5. Conduct the initial cycle: Assume numerical values of β and αi , noting that the αi values must satisfy Eq. 5.23c. 6. Use the numerical values of β and αi on the right-hand sides of the equations formed in Steps 3 and 4 above. 7. Solve the n + 1 simultaneous equations in Step 6 for β and αi. 8. Go back to Step 6 and repeat. Iterate until the β and αi values converge.

The matrix procedure consists of the following steps: 1. Formulate the limit state function and appropriate parameters for all random variables Xi(i = 1,2, . . . , n) involved. 2. Obtain an initial design point {xi* } by assuming values for n-1 of the random variables Xi. (Mean values are often a reasonable initial choice.) Solve the limit state equation g = 0 for the remaining random variable. This ensures that the design point is on the failure boundary. 25

3. Determine the reduced variates {Z i* } corresponding to the design point {xi* } using

Z = * i

xi* − μ X

σX

i

i

4. Determine the partial derivatives of the limit state function with respect to the reduced variates using Eq. 5.23b. For convenience, define a column vector {G} as the vector whose elements are these partial derivatives multiplied by -1:

⎧ G1 ⎫ ⎪G ⎪ {G} = ⎪⎨ 2 ⎪⎬ where Gi = − ∂g ∂Z i ⎪ ⎪ ⎪⎩Gn ⎪⎭

-------------------- (27) evaluated at design po int

5. calculate an estimate of β using the following formula:

⎧ z1* ⎫ ⎪ *⎪ T { G} {z *} ⎪z ⎪ where {z *} = ⎨ 2 ⎬ -------------------------(28) β= {G}T {G} ⎪ ⎪ ⎪⎩ z n* ⎪⎭ The superscript T denotes transpose. If the limit state equation is linear, then Eq 5.28 reduces to Eq. 5.18. 6. Calculate a column vector containing the sensitivity factors using

{α } =

{G} -------------------------------{G}T {G}

(29)

7. Determine a new design point in reduced variates for n-1 of the variables using Z i* = α i β

26

8. Determine the corresponding design point values in original coordinates for the n-1 values in Step 7 using

X i* = μ X + Z i*σ X i

i

9. Determine the value of the remaining random variable (i.e., the one not found in Steps 7 and 8) by solving the limit state function g = 0. 10. Repeat Steps 3 to 9 until and the design point {xi* }converge.

27

Problems 1. Most phytoplankton in lakes are too small to be individually seen with the unaided eye. However, when present in high enough numbers, they may appear as a green discoloration of the water due to the presence of chlorophyll within their cells. Climate and water quality are among the factors influencing the quantity of phytoplankton in shallow lakes. Assume that the rate of increase of phytoplankton can be expressed as a linear function, g(X1, X2, X3) of three variables, namely X1 temperature of water, X2 global radiations and X3 concentrations of nutrients. X1, X2, X3 can be modeled as normal random variables. Positive growth rates must be avoided. Although it is observed that temperature and radiation have no effect on effect on concentration of nutrients, so that ρ13 = ρ23 = 0, mutually they are highly correlated with ρ12 = 0.8. The equilibrium function is given by g(X1, X2, X3) = a0 + a1*X1 + a2* X2 + a3 *X3, where a0 = -1.5 mg/cubic m, a1 = 0.08 mg/ (cubic m. degree Celsius), a2 = 0.01 mg/mW and a3 = 0.05 Solution. Equilibrium (limit state), g(X1, X2, X3) = 0 RANDOM VARIABLE,X X1,degree Celsius X2,W/square M X3,mg/cubic m

Mean ,µ

Coefficient of Variation, V 0.5 0.3 0.7

16 150 100

Standard Deviation, σ 8 45 70

Other variables are included in a0 because of difficulty in computing separately. Reliability index β = a 0 + ∑ ai * µi

β = 6.28

13.32

∑ ai 2σi 2

= 1.72

γ = Φ(1.72) = 0.957

There is a chance that the equilibrium situation 96 percent. Hence the risk that algal biomass will increase is only 4 percent. 2. The economic performance of the irrigation barrage located at a place in along a river could be improved by installing a hydropower station to meet the local energy demand. An engineer estimates the power demand X3 to be 600kW on average with variability µ 600kw. If standard turbo axial turbine was installed, power output can be estimated as 7.5X1X2.Discharge X1 is measured in cubic m/s and hydraulic head in m;7.5 is coefficient ing for gravity, density of water, and overall efficiency of installed equipment. Accordingly, power is given in units of kW. Although average discharge of 22 cubic m/s and an average head of 5.2m are available, discharge head availability depends on natural flow variability; it is also subjected to the construction of barrage handling, which is operated with priority for irrigation demand. Discharge and head can be assumed to be independent normal variables, X1 and X2, with coefficients of variation 0.2 and 0.15 respectively. Assuming that demand X3 normal and independent of discharge and head, evaluate that reliability of the plant.

Performance function g(X1, X2, X3) = 7.5*X1*X2 – X3 RANDOM VARIABLE

UNIT

MEAN

Normal Discharge X1 Normal Hydraulic Head X2 Normal Power Demand X3

Cubic m/s

22

COEFFICIENT STANDARD OF DEVAITAION VARIATION 0.22 4.4

m

5.2

0.15

0.78

kW

600

0.10

60

Step1.Partial differentiation of performance functions with respect to each random variable.

⎛⎜ ∂ g ⎞⎟ f = 7 . 5 * x 2 * σ 1 x ∂ 1 ⎝ ⎠ ⎛⎜ ∂ g ⎞⎟ f = 7 . 5 * x 1 * σ 2 x ∂ 2 ⎝ ⎠ ⎛⎜ ∂ g ⎞ f = −σ 3 ∂ x 3 ⎟⎠ ⎝ Step2.computation of direction cosines, α .

α =

⎛⎜ ∂g ⎞⎟ f ∂ Xi ⎝ ⎠ 2

⎛ ∑⎛ ∂g ⎞ ⎛ ∂g ⎞⎞ ⎜ ⎜ ∂Xi ⎟ * ⎜ ∂Xi ⎟ ⎟ ⎠ ⎝ ⎠⎠ ⎝ ⎝

Step3.calcualtion of new Xif Xif = α β Step4.using estimated values of mean and standard deviation, calculate Xif new Xifnew = µ i + σ Xif Step5.repeat the iteration until reliability index value converge to single value. Evaluation of reliability Limiting state of interest g(X1, X2, X3) = 7.5*X1*X2 – X3 = 0. Iteration process is illustrated in the following sections. MEAN INITIAL x1f INITIAL x2f INITIAL X3f ⎛⎜ ∂g ⎞⎟ f ∂ X 1 ⎝ ⎠

22.0 5.2 600 171.6

17.8 4.6 620 153.2

COEFFICINET OF VARIATION 17.7 4.7 623 154.6

STANDARD DEVIATION 17.7 4.7 623 154.8

⎛⎜ ∂g ⎞⎟ f ⎝ ∂X 2 ⎠ ⎛⎜ ∂g ⎞⎟ f ⎝ ∂X 3 ⎠ ⎞⎟ f ∑⎛⎜ ∂g ∂ Xi ⎝ ⎠

128.7

104.2

103.7

103.6

-60.0

-60.0

-60.0

-60.0

49,610

37,922

38,254

38,276

α1 f α2 f α3 f

0.770 0.578 -0.269 17.8 4.6 620 -4.5*10e-5

0.787 0.535 -0.308 17.7 4.7 622.8 7.5*10e-6

0.790 0.530 -0.307 17.7 4.7 622.7 1.7*10e-5

0.791 0.529 -0.307 17.7 4.7 622.7 1.8*10e-5

1.24

1.23

1.23

1.23

NEW x1f NEW x2f NEW X3f G(.) 7.5 x1f * x2f - X3f

β

It can be noted that last two iterations give identical value. This corresponds to reliability of 89% and the risk of failure of 11%.

3. Consider a harbor breakwater constructed with massive concrete tanks filled with sand. It is necessary to evaluate the risk that the breakwater will slide under pressure of a large wave during major storm.

The following information/data is necessary for analysis.

Resultant horizontal force, Rh, depends on the balance between the static and dynamic

pressure components, and it can be taken as quadratic function of Hb (indicated in Figure) under simplified hypothesis on the depth of the breakwater. Random deep water value X4 = Hs, which is found from frequency analysis of extreme

storms in the area. Resultant vertical force, Rv = X2 - FV

Where X2, weight of the tank reduced for buoyancy. FV , a vertical component of dynamic uplift pressure due to the braking wave. It is

proportional to height of the height of the design wave, Hb, when the slope of sea bottom is known. Coefficient of friction, cf, can interpret as a random variable, X1, which represents

inherent uncertainty associated with its field evaluation. if Rh

Rv

< cf

,

stability against sliding will exist.

Additional variate X3 is introduced to represent the uncertainties caused the

simplifications adopted to model the dynamic forces FV and Rh . Simplification of the shoaling effects indicates that the height Hb of the design wave is proportional to random deepwater value X4. All random variables are assumed to be independent. The constants a1, a2, a3 are depends on geometry of system. ing for the sea-bottom profile and the geometry, one estimate constants, a1=7, a2 =17m/KN, a3=145. Limiting state equation

g(X1, X2,X3.X4) = X1X2- 70X1X3X4 -17X3X4 -17X3X4X4 -145X3X4 = 0

(1)

Random variables X1 X2 X3 X4

Mean

Coefficient of variation 0.64 0.15 3400 KiloNewton/m 0.05 1 0.20 5.16 0.18

Standard deviation 0.096 108.80 0.2 0.93

Partial derivatives of performance function with respect to each random variable evaluated at failure point. ⎞⎟ = (x -70x x ) σ ⎛⎜ ∂g f 2 3* 4 1 ⎝ ∂X 1⎠ ⎛⎜ ∂g ⎞⎟ f = x1* σ2 ∂ X 2 ⎝ ⎠ ⎛⎜ ∂g ⎞⎟ f = - (70*x1*x4 +17x4*x4+145 x4) σ3 ∂ X 3 ⎝ ⎠ ⎛⎜ ∂g ⎞⎟ f = - (70*x1*x3+34*x3*x4+145*x3) σ4 ∂ X 4 ⎝ ⎠

For first iteration, we should take expectations as the initial values. ⎛⎜ ∂g ⎞⎟ = (3400-70*1*5.09)0.096=292.19, f ∂ X 1 ⎝ ⎠ ⎛⎜ ∂g ⎞⎟ f = 0.64*170 = 108.80. X 2 ∂ ⎝ ⎠ ⎛⎜ ∂g ⎞⎟ f = - (70*0.64*5.09+17*5.09*5.09+145*5.09)0.02 = -283.97. X 3 ∂ ⎝ ⎠ ⎛⎜ ∂g ⎞⎟ f = - (70*0.64*1.0+34*1.0*5.09+145*1.0)0.889 = -322.67 ⎝ ∂X 4 ⎠

Direction cosines ,αi

⎛⎜∂g ⎞⎟f ∂Xi⎠ αi = ⎝ 2

⎛∑⎛∂g ⎞*⎛∂g ⎞⎞ ⎜ ⎜ ∂Xi⎟ ⎜ ∂Xi⎟⎟ ⎠⎝ ⎠⎠ ⎝ ⎝

α1 = 292.19

(2.8 *10e5)

= 0.550

α2 =108.80

(2.8 *10e5)

= 0.205

α3 = -283.97

(2.8 *10e5)

α4 = -322.67

(2.8 *10e5)

= -0.535

= -0.608

New failure point is given by x1(new) =µ1- α1 * σ1 * β =0.64-0.053 β

- (2)

x2(new) =µ2- α2 * σ2 * β =3400-34.85 β

-(3)

x3(new)=µ3- α3 * σ3 * β =1+0.107 β

- (4)

x4(new)=µ4- α4 * σ4 * β =5.09+0.541 β

- (5)

By substituting (2), (3), (4), (5) in limit state equation (1), we get solution for reliability index, β =1.379. Iteration process Initial x1f Initial x2f Initial x3f Initial x4f F(x*4f)

I iteration o.64 3400 1.00 5.16 0.570

II iteration 0.576 3352 1.147 5.825 0.799

III iteration 0.603 3378 1.088 5.637 0.784

IV iteration 0.594 3370 1.105 5.704 0.767

V iteration 0.597 3373 1.009 5.681 0.760

F(x*4f) invФ[F(x*4f)] Ø{invФ[F(x*4f)]} Mean of X*4f Standard deviation of X*4f ⎞⎟ ⎛⎜ ∂g f ∂ X 1 ⎝ ⎠ ⎞⎟ f ⎛⎜ ∂g ∂ X 2 ⎝ ⎠ ⎞⎟ f ⎛⎜ ∂g ⎝ ∂X 3 ⎠ ⎞⎟ f ⎛⎜ ∂g ⎝ ∂X 4 ⎠ ⎞⎟ f ∑⎛⎜ ∂g ⎝ ∂Xi ⎠ α1 α2 α3 α4 New x1f New x2f Newx3f Newx4f g β

4.4*10e-1 0.177 0.393 5.090 0.899

2.5*10e-1 0.838 0.281 5.590 1.136

3.0*10e-1 0.668 0.319 5.424 1.065

2.8*10e-1 0.729 0.306 5.481 1.090

2.9*10e-1 0.708 0.311 5.461 1.081

292.19

278.69

284.59

282.86

283.47

108.80

96.42

102.58

100.94

101.53

-283.97

-312.40

-311.15

-314.95

-313.6

-322.67

-488.34

-430.68

-449.30

-442.7

2.8*10e5

2.8*10e5

2.8*10e5

2.8*10e5

2.8*10e5

0.550 0.205 -0.535 -0.608 0.567 3352 1.147 5.836 4*10e-5 1.379

0.525 0.182 -0.605 -0.920 0.603 3378 1.088 6.348 6.3*10e-5 0.726

0.536 0.193 -0.586 -0.811 0.594 3370 1.105 6.201 -3*10e-5 0.899

0.533 0.190 -0.593 -0.846 0.597 3373 1.099 6.525 -2.1*10e-5 0.837

0.534 0.191 -0.590 -0.831 0.591 3372 1.101 6.234 -2.5*10e-5 0.83

Reliability Ф(β) = 0.805 Risk 1- Ф(β) = 0.195

Learning objectives 1. Understanding

importance

of

sampling

simulation

method. 2. Understanding the method of generation of random numbers. 3. Understanding various variance reduction method.

1

MODULE – 6


Motivation: Importance of understanding the importance of simulation methods

Highlights Knowledge of such as simulation methods , random numbers and variance reduction methods.

1

6. Simulation Methods It is well established that simulation techniques have proven their value especially for problems where the representation of the state function is associated with difficulties. Such cases are e.g. when the limit state function is not differentiable or when several design points contribute to the failure probability. The basis for simulation techniques is well illustrated by rewriting the probability integral in equation

Pf =

∫f

X

( X )dX

g ( x )≤0

by means of an indicator function as shown in Equation

Pf =

∫f

g ( x )≤0

X

( X )dX = ∫ I [ g ( X ) ≤ 0] f X ( X )dX

where the integration domain is changed from the part of the sample space of the vector X =(X1, X2, .. Xn)T for which g(x)≤0 to the entire sample space of X and where I[g(x)≤0] is an indicator function equal to 1 if g(x)≤0 and otherwise equal to zero. Above equation is in this way seen to yield the expected value of the indicator function I[g(x)≤0]. Therefore if now N realizations of the vector X, i.e.

xˆ j , j = 1,2,.........N

are sampled it

follows from sample statistics that

1 N Pf = ∑ I [ g ( x ) ≤ 0] N j =1 Is an unbiased estimator of the failure probability Pf.

6.1.1. Crude Monte Carlo Simulation The principle of the crude Monte Carlo simulation technique rests directly on the application of above equation. A large number of realisations of the basic random variables X, i.e.

xˆ j , j = 1,2,.........N

are generated (or simulated) and for each of the

1

outcomes

xˆ j

, it is checked whether or not the limit state function taken in

xˆ j

is positive.

All the simulations for which this is not the case are counted (nf) and after N simulations the

pf =

failure

probability

pf

may

be

estimated

through

nf N

which then may be considered a sample expected value of the probability of failure, In fact for N → ∞ , the estimate of the failure probability becomes exact. However, simulations are often costly in computation time and the uncertainty of estimate is thus of interest. It is easily realized that the coefficient of variation of the estimate is proportional to

1

nf

meaning that if Monte Carlo simulation is pursued to estimate a probability in

the order of 10-6 it must be expected that approximately 108 simulations are necessary to achieve an estimate with a coefficient of variance in the order of 10%. A large number of simulations are required using Monte Carlo simulation and all refinements of this crude techniques have the purpose of reducing the variance of the estimate. Such methods are for this reason often referred to as variance reduction methods. Simulation of the N outcomes of the t density functions, in above equation principle simple and may be seen as consisting of two steps. Here we will illustrate the steps assuming

that

the

n

components

of

the

random

vector

X

are

independent. In the first step a “pseudo random” number between 0 and 1 is generated for each of the components in

xˆ j xˆ ji , j = 1,....., n .

The generation of such numbers may be facilitated by

build-in functions of basically all programming languages and spreadsheet software In the second step the outcomes of the “pseudo random” numbers Zji are transformed to outcomes of

xˆ j

by

xˆ ji = F ji−1 ( Z ji )

2

where

FX

i

is the probability distribution function for the random variable Xi, principle is

also illustrated in Figure.

This process is continued until all components of the vector

xˆ j

have been generated.

6.1.2. Importance of Sampling Simulation Method As already mentioned the problem in using above equation is that the sampling function f(x) typically is located in a region far away from the region where the indicator function I[g(x)≤0] attains contributions. The success rate in the performed simulations are thus low. In practical reliability assessment problems where typical failure probabilities are in the order of 10-3 to 10-6 this in turn leads to the effect that the variance of the estimate of failure probability will be rather large unless a substantial amount of simulations are performed. To overcome this problem different variance reduction techniques have been proposed aiming at, with the same number of simulations to reduce the variance of the probability estimate. The importance sampling method takes basis in the utilization of prior information about the domain contribution to the probability integral, i.e. the region that contributes to the indicator function. Let us first assume that we know which point in the sample space x* contributes the most to the failure probability. Then by centering the simulations on this point, the important point, we would obtain a higher success rate in the simulations and

3

the variance of the estimated failure probability would be reduced. Sampling centered on an important point may be accomplished by rewriting above equation 1in the following way Pf = ∫ I [g (x ) ≤ 0] f X ( X )dX = ∫ I [g ( x ) ≤ 0]

f X (X ) f S ( X )dX f S (X )

which fs(x) is denoted the importance sampling density function. It is seen the integral in above equation represents the expected value of the term I [g ( x ) ≤ 0]

f X (X ) where the f S (X )

components of s are distributed according to fv(x).The question in regard to the choice of an appropriate importance sampling function fs(x), however, remains open One approach to the selection of an importance sampling density function fs(x) to select a n-dimensional t Normal probability density function with uncorrelated components, mean values equal to the design point as obtained from FORM analysis, i.e. µs=x* and standard deviations standard deviation of the component of X i.e. σs= σX. In this case the above equation may written as Pf = ∫ I [g ( x ) ≤ 0]

f X (X ) f (X ) f S ( X )dX = ∫ I [g [x ] ≤ 0] X ϕ ( X )dX f S (X ) ϕ (X )

In the equivalence of the equation leading to Pf =

1 N

N

f X (s )

∑ I [g (s ) ≤ 0] ϕ (s ) j =1

which may be assessed by sampling over realizations of s as described in the above. Application of these equations greatly enhances effiency of the simulations. If the limit state function is not too non-linear around the design point x* the success rate of the simulations will be close to 50%. If the design point is known in advance in a reliability problem where the probability of failure is in the order of 10-6 the number of simulations required to achieve a coefficient of variance in the order of 10% is thus around 200. This number stands in strong contrast to the 108 required using the crude Monte Method discussed before, but of course also requires knowledge about the design point.

4

6.2. GENERATION OF RANDOM NUMBERS 6.2.1. Random Outcomes from Standard Uniform Variates The probability integral transform indicates that generation of uniform (0, 1) random numbers is the basic generation process used to derive the outcomes from a variate with known probability distribution. Current methods of generating standard uniform variates are deterministic, in the sense that systematic procedures are used after one or more initial values are randomly selected. For example, system-supplied random number generators in most digital computers are almost always linear congruent generators. This algorithm is based on recursive calculation of sequence of integers k1, k2, k3,.., each between 0 and m-1 (a large number) from a linear transformation: k i −1 = (ak i + c )(mod ulo m ) Here a and c are positive integers called the multiplier and the increment respectively, and the notation (modulo m) signifies that k, is the remainder obtained after dividing

(ak i + c )by m

where m denotes a (large) positive integer, hence denoting

η i = Int [(ak i + c )m] the corresponding residual is defined as k i +1 = ak i + c − mη i Hence, u i +1 = k i +1 m = ak i x m + c x m = Int [(ak i + c )m] where the ui are uniform(0,1 ). Because these numbers are repeated with a given period, they are usually called pseudo-random numbers. The quality at the results depends on the magnitudes of the constants a,c and m and their relationships ,but the particular computer used will impose constraints. Because the period of the cycle is not greater than m and it increases with m, the main criterion is that the periods alter which the original numbers are unavoidably repeated should be as long as possible .In practice, in is set equal to the word length that is, the number of bits retained as a unit in the computer. Moreover, the constants c and m should not have any common factors, and the value of a should be sufficiently high. Because all possible integers between 0 and in m-1 occur after sonic interval of time, regardless of the generator used, any initial choice of the seed k0 is as good as any other.

5

Linear congruential algorithm Suppose we assume low values for the constants in equation, a = 5 , c =1 and m = 8. Let k0 = 1 be the seed for generating a sequence of random integers ki i=1,2,3……. For i=1. One has k1 = ak 0 + c − m Int [(ak 0 + c ] = 5 x 1 + 1 − 8 x Int [(5 x 1 + 1)8] = 5 + 1 − 8 x Int (0.75) = 5 + 1 − 8 x 0 = 6 and from equation a1 = k1 m = 6 x 8 = 0.75

The sec ond iteration yeilds k 2 = ak1 + c − m Int [(ak1 + c ] = 5 x 6 + 1 − 8 x Int [(5 x 6 + 1)8] = 30 + 1 − 8 x Int (3.875) = 30 − 1 − 8 x3 = 7 u 2 = k 2 m = 7 x 8 = 0.875

The subsequent iterations yeild the following sequence 0.5, 0.625, 0.25, 0.375, 0, 0.125 , 0.75, 0.875, 0.5, 0.625, 0.25 , 0.375, 0 0.125, 0.75, 0.875, 0.5, 0.625, 0.25, 0.375, 0, 0.125, 0.75, 0.875, 0.5, 0.625, 0.25, 0.375, 0, 0.125 which is seen to be cyclic with a period of 8, because the underlined sequence of 8 values is repeated indefinitely. This is clearly shown by plotting ui+1 against u, in

Trajectory of 100 sequentially generated standard uniform random numbers with a = 5, c = 1, and m = 8 (soild line) and with a = 2+1, c = 1 and m = 235 (dotted line)

Also shown in figure are results from the generator a = 2 7 + 1, c = 1 and m = 2 35 , which

yields a much larger period of cyclicity. This choice gives satisfactory results for binary computers: and a 101,c = 1, and m = 2b for a decimal computer with a word length b. 6

The advantage of the linear congruencies method when applied through a digital computer is the speed of implementation. Because only a few operations are required each time, its use has become widespread. A disadvantage is that once the seed is specified, the entire series is predictable. The pseudorandom numbers generated by these procedures can be tested for uniform distribution and for statistical independence. Goodness-of-fit tests, such as the chisquared and the Kolmogorov-Smiruov tests can be used to that these numbers are uniformly distributed. Both parametric and nonparametric methods, such as the runs test, can be used to check for randomness between successive numbers in a sequence in spite of the fact that these procedures are essentially deterministic. Pseudo-random numbers generated with large m and accurate choices of a and c generally appear to be uniformly distributed, and stochastically independent, so that they can be properly used to perform Monte Carlo simulations. Algorithms to generate pseudo-random numbers which closely approximate mutually independent standard uniform variates are a standard feature in statistical software. Standard uniform random numbers are available as a system-supplied function in digital computers, as well as in most customary computational and data management facilities such as spreadsheets and data bases. Now a days, software tools MATLAB, MS Excel and Mathcad have built in options to generate random numbers and evaluation of probability of failure. Problems solved using Hasofer –Lind approach and FORM methods earlier can be reworked using simulation procedures.

7

1. The problem solved in reliability can be solved using simulations as follows. The following are the variables with the values of parameters along with the performance function. Performance function g(X1, X2, X3) = 7.5*X1*X2 – X3. RANDOM VARIABLE

UNIT

MEAN

Normal Discharge X1 Normal Hydraulic Head X2 Normal Power Demand X3

Cubic m/s

22

COEFFICIENT STANDARD OF DEVAITAION VARIATION 0.22 4.4

M

5.2

0.15

0.78

kW

600

0.10

60

The MATLAB program with the following commands is used to evaluate probability of failure. Number of simulations is taken as 100000 and all the three variables are normally distributed. n = 10000; x1 = ( randn(n,1) * 4.4) + 22; x2 = (rand(n,1)*0.78)+5.2; x3=(rand(n,1)*60)+600; y = 7.5.*x1.*x2-x3; hist(y,100); y_mean = mean(y) y_std = std(y) beta1 = y_mean/y_std pf = normcdf(-beta1,0,1)

Answer: Probability density distribution of G is shown in Figure. Mean value of performance function G_mean =292.8984 G_std = 188.5695 Reliability index and probability of failure are 1 .5533 and 0.0602 respectively which is close to the result obtained in the previous case.

350

300

250

200

150

100

50

0 -600

-400

-200

0

200

400

600

800

1000

1200

Consider the harbor breakwater problem in the previous case. It is necessary to evaluate the risk that the breakwater will slide under pressure of a large wave during major storm. Resultant horizontal force, Rh, depends on the balance between the static and dynamic pressure components, and it can be taken as quadratic function of Hb under simplified hypothesis on the depth of the breakwater. Random deep water value X4 = Hs, which is found from frequency analysis of extreme storms in the area. Resultant vertical force, Rv = X2 - FV Where X2, weight of the tank reduced for buoyancy. FV , a vertical component of dynamic uplift pressure due to the braking wave. It is proportional to height of the height of the design wave, Hb, when the slope of sea bottom is known.

Coefficient of friction, cf, can interpret as a random variable, X1, which represents inherent uncertainty associated with its field evaluation.

if Rh

Rv

< cf

,

stability against sliding will exist.

Additional variate X3 is introduced to represent the uncertainties caused the simplifications adopted to model the dynamic forces FV and Rh . Simplification of the shoaling effects indicates that the height Hb of the design wave is proportional to random deepwater value X4. All random variables are assumed to be independent. The constants a1, a2, a3 are depends on geometry of system. ing for the sea-bottom profile and the geometry, one estimate constants, a1=7, a2 =17m/KN, a3=145. Limiting state equation g(X1, X2,X3.X4) = X1X2- 70X1X3X4 -17X3X4 -17X3X4X4 -145X3X4 = 0

Random variables X1 X2 X3 X4

Mean

Coefficient of variation 0.64 0.15 3400 KiloNewton/m 0.05 1 0.20 5.16 0.18

Standard deviation 0.096 108.80 0.2 0.93

Solution: calculation of reliability index using Monte Carlo simulation All random variables are in normal distribution.

Mean of y =340.4513 Standard deviation of y =327.4135 Reliability index =1.0398 Risk =0.1492

(1)

Learning objectives 1. Understanding importance of logical tree, fault tree analyses and event tree analysis 2. Understanding the symbols used in fault tree analyses and fault tree representation. 3. Understanding the symbols used in event tree analyses and event tree representation. 4. Understanding the representation of decision tree .

1

MODULE – 7


Motivation: Importance of understanding the use of fault tree analysis.

Highlights Knowledge of such as logical tree, fault tree analysis, event tree, decision tree and cause consequence charts.

1

Module 7 Logical Trees Different sources of risk for an engineering system and / or activity can be analyzed with respect to their chronological and causal components using logical trees. They are useful for the analysis of the overall risk as well as for the assessment of the risk contribution from the Individual components. Fault tree and event tree diagrams are the most well known and most widely applied type of logical trees in both qualitative and quantitative risk analysis. Even though more modern risk analysis techniques such as eg. Bayesian probabilistic nets have been developed over last years, fault tree and event tree are still main methods recommended (for US nuclear safety studies). Fault trees and event trees are in many ways similar and the choice of using one other or a combination of both in reality depends more on the traditions/preferences within a given industry than the specific characteristics of the logical tree. A significant difference between the two types of trees is though that the fault trees take basis in deductive (looking backwards) logic and the event trees are inductive (looking forward). In practical applications, a combination of fault trees and event trees is typically used. In this case, the fault tree part of the analysis is concerned about the representation of the sequences of failures, which may lead events with consequences and the event tree part of the analysis is concerned with the representation of the subsequent evolution of the consequence inducing events. Intersection between the fault tree and the event tree is in reality a matter of preference of the engineer performing the study. Small event tree / large fault tree and large event tree / small fault tree techniques may be applied to the same problem to supplement each other and provide additional insight with regard to the reliability of the considered system. Decision trees are often seen as a special type of event tree, but may be seen in much wider perspective and if applied consistently within the framework of decision theory, provide the theoretical basis for risk analysis. The detailed analysis of the various types of logical trees

1

requires that the performance of the individual components of the trees already has been assessed in of failure rates and or failure probabilities.

7.1. Fault Tree Analysis As mentioned earlier a fault tree is based on a deductive logic starting by considering an event of system failure and then aims to deduct which causal sequence of component failures could lead to the system failure. The system is thus often referred to as a top event. The logical interrelation of the sequences of component failures is represented through logical connections

(logical

gates)

and

the

fault

tree

forms

in

effect

a

tree-like structure with the top event in the top and basic events at its extremities. The basic events are those events, for which failure rate data or failure probabilities are available and which cannot be dissected further. Sometimes the events are differentiated into initiating (or triggering) events and enabling events, where the initiating events are always the first event in a sequence of the enabling events are events, which may increase the severity of the initiated failure. The fault tree is a Boolean logical diagram comprised primarily of AND and OR gates. The output event of an AND gate occurs only if all of the input events occur simultaneously and the output event of an OR gate occur if any one of the input occurs. Figure 1 illustrates different commonly used symbols for AND and OR gates.

Figure 1 – Illustration of commonly used symbols for AND and OR gates

2

Several other types of logical gates exists such as QUANTIFICATION and COMPARISON, however, these will not be elaborated in present text. Top events and basic events also have their specific symbols as shown in figure below

Figure 2 – Symbols commonly used in fault tree representations In the Figure 2 diamond shaped symbol represents an undeveloped scenario which has not been developed in to a system of sub events due to lack of information and data.

Figure 3 – Principal shape of fault tree

3

It is noted that a fault tree comprising an AND gate represents a parallel system, i.e. all components must fail for the system to fail. Such a system thus represents some degree of redundancy because the system will still function after one component has failed. Fault trees comprising an OR gate on the other hand represents a series system, i.e. a system without any redundancy in the sense that it fails as soon as any one of its components has failed. Such as system is often denoted a weakest component system. Systems may be represented alternatively by reliability block diagrams, see Figure 4.

Figure 4 – Reliability block diagrams for OR and AND gates In accordance with the rules of probability theory the probability of the event for an AND gate is evaluated by n

P = ∏ pi i =1

And for an OR gate by n

P = 1 − ∏ (1 − pi ) i =1

Where n is the number of ingoing event to the gate .Pi are the probabilities of the failure of ingoing events and it is assumed that the ingoing are independent. System failure modes are defined by so-called cut-sets, i.e. combinations of basic events, which with certainty will lead to the top event. The number of such combinations can be rather large several hundreds for a logical tree with about 50 basic events. It is important to note that the top event may still occur event though not all basic events in a cut set occur. A minimal cut set is the 4

cut set that represents the smallest combination of basic events leading to the top event, sometimes denoted the critical path. The top event will only occur if all events in die minimal cut set occur. An important aspect of fault tree analysis is the identification of the minimal cut sets as this greatly facilitates the numerical evaluations involved.

7.2. Event trees An event tree is a representation of the logical order of events leading to some (normally adverse) condition of interest for a considered system. It should be noted that several different states for the considered system could be associated the important consequences. In contrast to the fault tree it starts from a basic initiating event and develops from there in time until all possible states with adverse consequences have been reached. The initiating events may typically arise as top events from fault tree analysis. The event tree is constructed from event definitions and logical vertices (outcomes of events), which may have a discrete sample space as well as a continuous sample space. Typical graphical representations of event trees are shown in Figure 5.

Figure 5 – Illustration of the principal appearance of an event tree Event trees can become rather complex to analyze, This is easily realized by noting that for a system with n two-state components the total number of paths is 2n. If each component has m states the total number of branches is mn.

5

7.3. Cause Consequence Charts Cause consequence charts are in essence yet another representation of combined fault trees and event trees in the sense that the interrelation between the fault tree and the event tree, namely the top event for the fault tree (or the initiating event- for the event tree) is represented by a rectangular gate with output event being either YES or NO, each of which will lead to different consequences. The benefit of the cause consequence chart is that the fault tree need not be expanded in the representation, enhancing the overview of the risk analysis greatly. An example of a gate in a cause consequence chart is shown in Figure 6.

Figure 6 – Gate in a cause consequence chart

7.4. Decision trees As already indicated, decision trees applied within the framework of the decision theory form the basic framework for risk analysis. This may be realized by recognition of the fact that risk analysis serves the purpose of decision-making. Either the risk analysis shows that the risks are acceptable and one does nothing, or it is found that the risks are not acceptable and one has to do something. The decision analysis is the framework for the assessment of the risks as well as for

6

the evaluation of how to reduce the risks most efficiently. An example of a decision tree is Figure 7

Figure 7 – Principle representation of a decision tree The decision tree is constructed as a consecutive row of decisions followed by uncertain events thus reflecting the uncertain out come of the possible actions may follow from the decisions. In the end of the decision tree, consequences (or utilities) are assigned in accordance with the decisions and the outcomes of the uncertain events. Depending on the number of decisions and or action involved in the decision analysis and thus represented in the decision tree various types of decision analysis are required, ranging from the most simple so called prior decision analysis to the most advanced pre-posterior analysis. It is important to note that the probabilities for the different events represented in the decision tree may be assessed by fault tree analysis, event tree analysis, reliability analysis or any combination of these and thus the decision tree in effect includes all these aspects of systems and component modeling in addition to providing a framework for the decision making.

7

Example 1 A power supply system is composed of an engine, a main fuel supply for the engine and electrical cables distributing the power to the consumers. Furthermore, as a backup fuel a reserve fuel with limited capacity is installed. The power supply system fails if the consumer is cut of from the power supply. This in turn will happen if either the power supply cables fail or the engine stops, which in turn is assumed only to occur if the fuel supply to the engine fails.

Solution A fault tree system model for the power supply is illustrated in figure below also the probabilities for the basic events are illustrated.

Figure 1 – Illustration of fault tree for a power supply system Use the rules of probability calculus we obtain that the probability of engine failure PEF is equal to (AND gate)

PEF =0.01*0.01 = 0.000 Along the same lines we obtain that the probability of engine failure PEF is equal to (OR gate) equal to (OR gate) PEF =0.0001+0.01-0.0001*0.01 = 0.0101 Example 2 The event tree in figure below models the event scenarios in connection with nondestructive testing of a concrete structure. Corrosion of the reinforcement may be present and the inspection method applied may or may not detect the corrosion, given corrosion is present and given that corrosion is not present.

Figure 2 – Illustration of event tree for the modeling of inspection of a reinforced concrete structure In the Figure 7 the event C/ denote that corrosion is present, and the event / that the corrosion is found by inspection. The bars over the events denote the complementary events. On the basis of such event trees e.g. the probability that corrosion is present given that it is found by inspection may be evaluated. In many cases the event trees may be reduced significantly after some preliminary evaluations. This is e.g. the case when it can be shown that the branching probabilities become negligible. This is often utilized e.g. when event trees are used in connection with inspection and maintenance planning. In such cases the branches corresponding to

failure events after repair events may often omitted at least for systems with highly reliable components. In Figure 6.14a combined fault tree and event tree is illustrated showing how fault trees often constitute the modeling of the initiating event for event tree.

Figure 3 – Illustration of combined fault tree and event tree

Learning objectives 1. Understanding

importance

of

system

reliability

assessment. 2. Understanding the various steps involved in system reliability assessment. 3. Understanding the methods adopted in system reliability assessment.

1

MODULE -8


Motivation: Importance of understanding the concept of system reliability.

Highlights Knowledge of such as system reliability.

1

Ang, A.H.-S. and Tang, W.H. (1975). Probability Concepts in Engineering Planning and Design, Vol. 1, Basic Principles, John Wiley, New York. Ang, A.H.-S. and Tang, W.H. (1984). Probability concepts in engineering planning and design. Volume II – decision, risk and reliability, John Wiley & Sons, Inc., New York. Baecher, G.B. and Christian, J.T. (2003). Reliability and Statistics in Geotechnical Engineering, John Wiley and Sons, London and New York. Becker, D.E. (1996a). Eighteenth Canadian Geotechnical Colloquium: Limit states Design for foundations, Part I. An overview of the foundation design process, Canadian Geotechnical Journal, Vol. 33, pp.956-983. Becker, D.E. (1996b). Eighteenth Canadian Geotechnical Colloquium: Limit States Design for Foundations. Part II. Development for the National Building Code of Canada, Canadian Geotechnical Journal, Vol. 33, pp. 984-1007. Benjamin, J. R. and Cornell, C. A. (1970). Probability, Statistics, and Decision for Civil Engineers. McGraw-HillBook Co., New York. Cherubini, C. (2000). Reliability evaluation of shallow foundation bearing capacity on c ,

soils, Canadian Geotechnical Journal, No. 37, pp. 264-269.

Christian, J.T., Ladd, C.C., and Baecher, G.B. (1992). Reliability and probability in stability analysis, In Proceedings of Stability and Performance of Slopes and Embankments, ASCE, GSP No.31, Vol. 2, pp. 1071-1111. Christian, J. T., Ladd, C. C., and Baecher, G. B. (1994). Reliability applied to slope stability analysis. Journal of Geotechnical Engineering, ASCE, Vol. 120, No. 12, pp. 2180-2207. Christian, J.T. and Urzua, A. (1998). Probabilistic evaluation of earthquake-induced slope failure, Journal of Geotechnical and Geoenvironmental Engineering, ASCE, Vol. 124, No.11, pp.1140-1143.

References

Christian, J.T. (2004). Geotechnical Engineering Reliability: How well do we know what we are doing?, Geotechnical and Geoenvironmental Engineering, ASCE, Vol. 130, No.10, pp. 985-1003. William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977. Dasaka S N Murthy (2006) Probabilistic site characterization and reliability analysis of shallow foundations and slopes (Doctoral thesis, unpublished) Indian Institute of Science, Bangalore. Dasaka, Satyanarayana Murthy and Rao, Rajaparthy Seshagiri and Babu, Sivakumar GL (2005) Reliability Analysis of Allowable Pressure of Strip Footing in Spatially Varying Cohesionless Soil. In Augusti, G and Schueller, GI and Ciampoli, M, Eds. Proceedings International Conference on Structural Safety and Reliability, pages 909915, Rome, Italy. De Mello, V.F.B. (1977). Reflections on design decisions of practical significance to embankment dams, Geotechnique, Vol. 27, No. 3, pp. 279-355. Duncan, J.M. (2000). Factors of safety and reliability in geotechnical engineering, Journal of Geotechnical and Geoenvironmental Engineering, ASCE, Vol.126, No.4, pp. 307-316. Fenton, G.A. (1999a). Estimation for Stochastic Soil Models, Journal of Geotechnical and Geoenvironmental Engineering, ASCE, Vol. 125, No.6, pp. 470-485. Fenton, G.A. (1999b). Random Field Modeling of T Data, Journal of Geotechnical and Geoenvironmental Engineering, ASCE, Vol. 125, No.6, pp. 486-498. Haldar, A. and Mahadevan, S. (2000). Probability, Reliability and Statistical Methods in Engineering Design, John Wiley & Sons, 304 p. Harr, M.E. (1987). Reliability based Design in Civil Engineering, McGraw-Hill Publishers, New York. JCSS (2000). Probabilistic model code, part 1: basis of design, 12th draft, t Committee on Structural Safety.

250

References

Kulatilake, P.H.S.W. and Ghosh, A. (1988). An investigation into accuracy of spatial variation estimation using static cone penetrometer data, In Proceedings of First International Symposium on Penetration Testing, Orlando, Fla., pp. 815–821. Kulathilake, P.H.S.W. and Um, J.G. (2003). Spatial variation of cone tip resistance for the clay site at Texas A&M University, Geotechnical and Geological Engineering, Vol. 21, pp. 149–165. Lacasse, S. and Nadim, F. (1996). Uncertainties in characterising soil properties, Plenary paper for ASCE Conference on Uncertainties, Madison, Wisconsin, USA. Lacasse, S. and Nadim, F. (1998). Risk and reliability in geotechnical engineering, In Proceedings 4th International Conference on Case Histories in Geotechnical Engineering, St. Louis, MO. Nathabandu T. Kottegoda and Renzo Rosso (1998) Statistics, Probability, and Reliability for Civil and Environmental Engineers, McGraw-Hill International edition. Phoon, K. K., Quek, S. T., Chow, Y. K., and Lee, S. L. (1990). Reliability analysis of pile Settlement. Journal of Geotechnical Engineering, ASCE, Vol. 116, No. 11, pp. 1717–1735. Phoon, K.K. and Kulhawy, F.H. (1999a). Characterization of geotechnical variability, Canadian Geotechnical Journal, Vol. 36, pp. 612-624. Phoon, K.K. and Kulhawy, F.H. (1999b). Evaluation of geotechnical property variability, Canadian Geotechnical Journal, Vol. 36, pp. 625-639. Phoon, K.K. (2002). Potential application of reliability-based design to geotechnical engineering, In Proceedings of 4th Colombian Geotechnical Seminar, Medellin, pp. 122. Phoon, K.K., Quek, S.T., and An, P. (2003a). Identification of Statistically Homogeneous Soil Layers using Modified Bartlett Statistics, Journal of Geotechnical and Geoenvironmental Engineering, ASCE, Vol. 129, No. 7, pp. 649-659.

251

References

USACE (1997). Risk-based analysis in Geotechnical Engineering for of Planning Studies, Engineering and Design. US Army Corps of Engineers, Department of Army, Washington D. C., 20314-100. Vanmarcke, E.H. (1977). Probabilistic modeling of soil profiles, Journal of the Geotechnical Engineering Division, ASCE, Vol.103, No.GT11, pp.1227-1246. Vanmarcke, E.H. (1983). Random fields: analysis and synthesis. MIT Press, Cambridge. Whitman, R.V. (1984). Evaluating calculated risk in Geotechnical Engineering, The Seventeenth Terzaghi Lecture, Journal of Geotechnical Engineering, ASCE, Vol. 110, No. 2, pp. 145-188. Whitman, R. V. (2000). Organizing and evaluating uncertainty in geotechnical engineering, Journal of Geotechnical and Geoenvironmental Engineering, ASCE, Vol. 126, No.7, pp. 583-593.

252

Curso Nptel - Reliability Engineering 4r4x3x

Overview 26281t

More details 6y5l6z

Related Documents 3h463d

Curso Nptel - Reliability Engineering 4r4x3x

Practical Reliability Engineering 43255w

Reliability Engineering Notes j59o

Life Cycle Reliability Engineering 5p124i

Curso Nptel - Finite Element Analysis a4557

Nptel 2tv12