Analysis of Type-II Hybrid Censored Competing Risks Data

# Analysis of Type-II Hybrid Censored Competing Risks Data

Arnab Koley, Debasis Kundu & Ayon Ganguly
###### Abstract

Kundu and Gupta [22] provided the analysis of Type-I hybrid censored competing risks data, when the lifetime distribution of the competing causes of failures follow exponential distribution. In this paper we consider the analysis of Type-II hybrid censored competing risks data. It is assumed that latent lifetime distributions of the competing causes of failures follow independent exponential distributions with different scale parameters. It is observed that the maximum likelihood estimators of the unknown parameters do not always exist. We propose the modified estimators of the scale parameters, which coincide with the corresponding maximum likelihood estimators when they exist, and asymptotically they are equivalent. We obtain the exact distribution of the proposed estimators. Using the exact distributions of the proposed estimators, associated confidence intervals are obtained. The asymptotic and bootstrap confidence intervals of the unknown parameters are also provided. Further, Bayesian inference of some unknown parametric functions under a very flexible Beta-Gamma prior is considered. Bayes estimators and associated credible intervals of the unknown parameters are obtained using Monte Carlo method. Extensive Monte Carlo simulations are performed to see the effectiveness of the proposed estimators and one real data set has been analyzed for the illustrative purposes. It is observed that the proposed model and the method work quite well for this data set.

Key Words and Phrases: Competing Risks; Type-II Hybrid censoring; Beta-Gamma distribution; Maximum likelihood estimator; Bootstrap; asymptotic distribution.

Department of Mathematics and Statistics, Indian Institute of Technology Kanpur, Pin 208016, India.

Corresponding author. E-mail: kundu@iitk.ac.in, Phone no. 91-512-2597141, Fax no. 91-512-2597500.

Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati, Pin 781039, India.

## 1 Introduction

In medical or reliability analysis, it is often observed that an item can fail due to different causes. For example, in a medical study it is observed that a person can die due to different diseases or in a reliability experiment it is observed that an automobile may fail due to different factors. In such a situation, an investigator is often interested in the assessment of a specific cause, in presence of other causes. In the statistical literature it is well known as the competing risks problem. In a competing risk problem, the data consists of a failure time and an indicator denoting the causes of failure. Among different methods, the two most popular approaches to analyze competing risks data are the following: (i) latent failure time model as suggested by Cox [9] or (ii) cause specific hazard functions model as suggested by Prentice et al. [25]. Several studies have been carried out over the last three decades under the above model assumptions based on both parametric and non-parametric set up. For the parametric set up, it is assumed that lifetimes follow some specific distributions, whereas under the non-parametric set up no specific distributional assumptions are needed. Interested readers are referred to the monograph of Crowder [10] for a comprehensive review on the analysis of different competing risks models.

Type-I and Type-II are the two most common censoring schemes which are used in practice. A mixture of Type-I and Type-II censoring schemes is known as hybrid censoring scheme (HCS). Epstein [13] first introduced this HCS, and it is also known as Type-I HCS. Since the introduction of the Type-I HCS of Epstein [13], extensive work has been done on Type-I hybrid censoring schemes, see for example Fairbanks et al. [14], Chen and Bhattacharyya [6], Gupta and Kundu [15], Dube et al. [11], Kundu [20], and the references cited therein. Childs et al. [7] introduced a new hybrid censoring scheme and it can be described as follows. Suppose, items are put on a test at the time point 0. Let the ordered lifetimes of these experimental units be denoted by , respectively. Suppose is a pre-fixed integer and is a pre-fixed time point. The test is terminated at a random time . This hybrid censoring scheme is named as the Type-II HCS. The main advantage of the Type-II HCS is that it guarantees at least failures before the end of the experiment, and if failures occur before time , the experiment continues till time , which might lead to more than failures before the experiment stops. A detailed discussion on Type-I and Type-II hybrid censoring schemes can be obtained in Balakrishnan and Kundu [2].

Kundu and Gupta [22] provided the analysis of Type-I hybrid censored competing risks data. Based on the latent failure time model assumption of Cox [9], and the latent failure distributions to be exponential, the maximum likelihood estimators (MLEs) of the unknown parameters are obtained when they exist. It is observed that the MLEs of scale parameters may not always exist. The exact conditional distributions of the MLEs are also provided. The authors further considered the Bayesian inference of the unknown parameters based on the independent gamma priors, and obtained the Bayes estimates and the associated credible intervals. For some of the recent references in this topic, interested readers are referred to Iliopolus [17] and Balakrishnan et al. [1].

The aim of this paper is to provide the analysis of Type-II hybrid censored competing risks data. In this paper also we have made the latent failure time model assumption of Cox [9]. In latent failure time modeling, it is assumed that competing causes of failures are independent random variables. In this paper it is assumed that we have only two competing causes of failures, and the lifetimes of the competing causes of failure follow exponential distribution with different scale parameters. Therefore, if denotes the lifetime on an item, then

 Z=min{T1,T2},

where and are the latent failure times of two different causes of the item. It may be mentioned that although the assumption of independence of the two failure time distributions and seems to be very restrictive, it has been shown by Tsiatis [26] that without the presence of covariates the independence between and cannot be tested using the data only, see also Kalbfleisch and Prentice [18] in this respect. Moreover, it is observed by Kundu [19] that in case of exponential or Weibull lifetime distributions, both the approaches namely the latent failure time model of Cox [9] or the cause specific hazard functions model of Prentice et al. [25], provide the same likelihood function, although their interpretations are different.

It is observed that in this case also the MLEs may not always exist. We propose new estimators of the scale parameters which always exist. They coincide with the MLEs when the later exist, and asymptotically they are equivalent. We obtain the exact distributions of the proposed estimators, and it can be written as a generalized mixture of shifted gamma distributions. Based on the monotonicity assumption as in Chen and Bhattacharyya [6], the confidence intervals based on the exact distributions are also provided. It may be mentioned that the main purpose to propose the new estimators is to provide unconditional inference and also to provide confidence set of the scale parameters even though MLEs do not exist.

For comparison purposes, we have provided the asymptotic and bootstrap confidence intervals also. We further consider the Bayesian inference of the unknown parameters. For Bayesian analysis, we need to assume certain priors on the unknown parameters. In this case we have considered a very flexible Beta-Gamma prior as suggested by Pena and Gupta [24] for the scale parameters. The Bayes estimates can be obtained in explicit forms, and we have also provided joint credible set of the unknown parameters. Extensive simulations are performed to see the effectiveness of the different methods, and one data set has been analyzed for illustrative purposes.

The main differences of the present paper with Kundu and Gupta [22] are the following. In Kundu and Gupta [22], the authors considered the analysis of Type-I hybrid censored competing risks data, whereas in this paper we consider the analysis of Type-II hybrid censored competing risks data. The exact distributions of the estimators are quite different. Moreover all the results available till date are based on the conditional distribution, whereas in this paper the results are obtained without any conditioning argument. Using the exact distributions of the proposed estimators we have provided confidence intervals of the unknown parameters. It is observed that even when the MLEs do not exist, it is possible to provide confidence set of the parameters. Finally, in this paper we have considered the Bayesian inference of the unknown parameters based on a very general Beta-Gamma priors, where as most of the existing results are based on independent gamma priors on the scale parameters which is a special case of the Beta-Gamma priors. Based on this general prior, we have provided the Bayes estimates and also the associated credible intervals. We have also provided Gibbs sampling procedure to compute the Bayes estimate of any function of unknown parameters and the associated credible interval.

Rest of the paper is organized as follows. In Section 2, we describe the model, provide the prior assumptions, and present the definition and notations used throughout this paper. The proposed estimators and their exact distributions are derived in Section 3. In Section 4, we have presented different confidence intervals. Bayesian analysis has been considered in Section 5. Monte Carlo simulation results and the analysis of a real data set have been presented in Section 6, and finally we conclude the paper in Section 7.

## 2 Notations, Model Descriptions & Prior

Without loss of generality, we assume that there are only two causes of failure. We use the following notations throughout this paper.

### 2.1 Notations

 PDF: probability density function CDF: cumulative distribution function MLE: maximum likelihood estimator HPD: highest posterior density MGF: moment generating function Tji: latent failure time of the i-th individual under cause j, for j = 1, 2 Zi: min{T1i,T2i} Zi:n: i-th order statistic of Zi, i=1,…,n, and we define Zn+1:n=∞ T∗: max{ZR:n,T} J: the total number of failures before T∗ Di: the number of failures observed due to cause i; i = 1, 2 δi: indicator variable denoting the cause of failure of the i-th ordered individual D: {(z1n,δ1),…,(zJn,δJ)},the % observation before the experiment stops Ij: {zi:n;δi=j};j=1,2 GA(α,λ): gamma random variable with PDF; λαΓ(α)xα−1e−λx; x>0 exp(λ): exponential random variable with PDF; λe−λx;x>0,  λ>0,
 Bin(N,p): Binomial random variable with probability mass function (Ni)pi(1−p)N−i,    for i=1,2,…,N Beta(a,b): Beta random variable with PDF; Γ(a+b)Γ(a)Γ(b)xa−1(1−x)b−1; 0μ ¯¯¯¯FG(x;μ,α,λ): ∫∞xfG(z;μ,α,λ)dz BG(b0,a0,a1,a2): Random variable having a Beta-Gamma distribution with the PDF given by (???).

### 2.2 Model Description

Consider the following lifetime experiment in which items are put on a test. Each unit is exposed to two risks. Here denotes the lifetime of the -th unit and

 Zi=min{T1i,T2i}.

Based on Cox’s latent failure time model assumptions, it is assumed that and are independently distributed for each . Moreover, it is assumed that (follows) exp and exp, for . The test is terminated at the time point . It is immediate that the PDF of has the following form

 f(z;λ1,λ2)=(λ1+λ2)e−z(λ1+λ2);    z>0, (1)

and 0, otherwise. We use the following notation

 λ=λ1+λ2.

Moreover, we have the following observation at the end of the experiment.
Case-I : ; if
Case-II : ; if and for .

### 2.3 Prior Assumption

Borrowing the idea from Pena and Gupta [24], we assume the following joint conjugate prior on and . For , the joint prior of has the following PDF for .

 π(λ1,λ2|b0,a0,a1,a2)=Γ(a1+a2)Γ(a0)(b0(λ1+λ2))a0−a1−a2×ba10Γ(a1)λa1−11e−b0λ1×ba20Γ(a2)λa2−12e−b0λ2. (2)

The joint PDF (2) is known as the PDF of a Beta-gamma distribution.

The joint PDF (2) can take variety of shapes. The correlation between and can be both positive and negative, depending on the values of and . If , the prior distributions of and become independent. The following results will be useful for further development.

Result 1: If BG(), then for ,

 E(λi)=a0aib0(a1+a2)   and   V(λi)=a0aib20(a1+a2)×{(ai+1)(a0+1)a1+a2+1−a0aia1+a2}. (3)

Proof:

 E(λ1) = ∫∞0∫∞0Γ(a1+a2)Γ(a0)(b0(λ1+λ2))a0−a1−a2×ba10Γ(a1)λa11e−b0λ1×ba20Γ(a2)λa2−12e−b0λ2dλ1dλ2 = Γ(a0+1)Γ(a1+1+a2)Γ(a1+a2)Γ(a0)Γ(a1+1)Γ(a1)1b0 ∫∞0∫∞0Γ(a1+1+a2)Γ(a0+1)(b0(λ1+λ2))a0+1−a1−1−a2×ba1+10Γ(a1+1)λa11e−b0λ1×ba20Γ(a2)λa2−12 e−b0λ2dλ1dλ2 = a0a1b0(a1+a2).

The integration in the second step of the above turns out to be 1 because of equation (2). Similarly we get . Next we derive . Note that . We derive below.

 E(λ21) = ∫∞0∫∞0Γ(a1+a2)Γ(a0)(b0(λ1+λ2))a0−a1−a2×ba10Γ(a1)λa1+11e−b0λ1×ba20Γ(a2)λa2−12e−b0λ2dλ1dλ2 = Γ(a1+2)Γ(a1)Γ(a1+a2)Γ(a0)Γ(a0+2)Γ(a1+a2+2)1b20 = ∫∞0∫∞0Γ(a1+2+a2)Γ(a0+2)(b0(λ1+λ2))a0+2−a1−2−a2×ba1+20Γ(a1+2)λa11e−b0λ1×ba20Γ(a2)λa2−12 e−b0λ2dλ1dλ2 = a0a1(a0+1)(a1+1)b20(a1+a2)(a1+a2+1).

Thus

 V(λ1) = a0a1b20(a1+a2)×{(a1+1)(a0+1)a1+a2+1−a0a1a1+a2}, V(λ2) = a0a2b20(a1+a2)×{(a1+1)(a0+1)a1+a2+1−a0a2a1+a2}.

Next we provide steps to generate samples from distribution. We need the following Lemma. The proof is quite straight forward, hence the details are avoided.

Lemma 1: If then

 U=λ1+λ2∼GA(a0,b0)   and   V=λ1λ1+λ2∼Beta(a1,a2).

Moreover, and are independent.

Using the same algorithm as suggested in Kundu and Pradhan [23] following steps are required to generate samples from a Beta-Gamma distribution.

• Step-1 Generate from .

• Step-2 Generate from .

• Step-3 Obtain and

## 3 Estimators of λ1 and λ2 and their Distributions

### 3.1 Estimators of λ1 and λ2

Likelihood contribution of the data point , for , is given by

 L(λ1,λ2|(z,δ=j))=λje−λ1ze−λ2z=λje−(λ1+λ2)z.

Thus, the likelihood function of the observation is given by

 L(λ1,λ2|Data)=⎧⎪⎨⎪⎩n!D1!D2!(n−R)!λD11λD22e−W(λ1+λ2)if T

here . Hence, for

 W={∑Ri=1Zi+ZR(n−R)forCase-I∑Ji=1Zi+T(n−J)\ \ forCase-II,

the log likelihood function without the additive constant is

 (4)

Clearly, the MLEs of and are given by

 ˆλ1MLE=D1W   if  D1>0     and   ˆλ2MLE=D2W  if   D2>0.

Note that when , the MLE of does not exist, and similarly, when = 0, the MLE of does not exist. We define the estimators of and which will be useful for constructing their confidence intervals even when the MLEs do not exist. The proposed estimators are as follows:

 ˆλ1={ˆλ1MLEifD1>00ifD1=0    and    ˆλ2={ˆλ2MLEifD2>00ifD2=0.

Therefore, although the MLEs of and may not always exist, and always exist, see for example the definition of an estimator in Definition 7.1.1 of Casella and Berger [4]. Now we obtain the exact distributions of and , and based on the exact distributions of and , exact confidence intervals can be constructed. We will also show that even when = 0 or = 0, it is possible to obtain the exact confidence set of .

### 3.2 Distributions of ˆλ1 and ˆλ2

In this section we provide the exact distributions of and . For , the distribution function of is given

 P(ˆλ1≤x) = P(ˆλ1≤x|D1=0)P(D1=0)+P(0<ˆλ1≤x|D1>0)P(D1>0) (5) = P(D1=0)+P(0<ˆλ1≤x|D1>0)P(D1>0) = n∑i=0ci+R∑i=1R−1∑s=0ci,s(x)+n∑j=Rj∑i=1j∑s=0dj,i,s(x),

where
(a)

(b) for

 ci,s(x) =n(n−1R−1)(R−1s)(Ri)(λ1λ1+λ2)i(λ2λ1+λ2)R−i(−1)s(n−R+s+1)e−Tλ(n−R+1+s) ¯¯¯¯FG(1x;Ti(n−R+s+1),R,iλ),

(c.) for

 dj,i,s(x) =(nj)(ji)(js)(λ1λ1+λ2)i(λ2λ1+λ2)j−i(−1)se−Tλ(n−j+s)¯¯¯¯FG(1x;Ti(n−j+s),j,iλ). (6)

Proof: See in the Appendix.

It is clear that the distribution of is mixture of a degenerate and an absolute continuous distributions. The PDF of the absolute continuous part of the distribution of can be written as

 fˆλ1|D1>0(x)=1P(D1>0)[R∑i=1R−1∑s=0ddxci,s(x)+n∑j=Rj∑i=1j∑s=0ddxdj,i,s(x)],

where,
(i) for

 ddxci,s(x) =n(n−1R−1)(R−1s)(Ri)(λ1λ1+λ2)i(λ2λ1+λ2)R−i(−1)s(n−R+s+1) e−Tλ(n−R+1+s)1x2fG(1x;Ti(n−R+s+1),R,iλ),

(ii) for

 ddxdj,i,s(x) =(nj)(ji)(js)(λ1λ1+λ2)i(λ2λ1+λ2)j−i(−1)se−Tλ(n−j+s) 1x2fG(1x;Ti(n−j+s),j,iλ). (7)

Similarly, the distribution function of and the PDF of the absolute continuous part of the distribution of can obtained from equations (6) and (7), respectively, by interchanging and .

Comment: Note that if we denote = and , then the MLEs of and exist, if and , respectively. The MLE of , say , given is given . Similarly, the MLE of given is given . Therefore, the distributions of given and given can be easily obtained from (5).

## 4 Confidence interval

In this section we present different methods of constructing 100(1-)% confidence interval of . Similar methods can be applied to construct confidence interval of also, and they are not presented here.

### 4.1 Confidence Intervals Based on Exact Distributions

First we consider the case when and . The case when or , will be discussed later. The method of construction of the confidence interval of based on the exact conditional distribution of is based on a similar assumption as those of Chen and Bhattacharyya [6] or Kundu and Basu [21]. First let us assume that is known. Suppose is a strictly increasing function of for all , and let be a function such that . Therefore, for

 α2=Pλ′1[ˆλ1≥b(λ′1)]=Pλ1[ˆλ1≥b(λ1)]

which implies is an strictly increasing function as . Hence exists and it is also an increasing function. Now from (8), we have

 1−α2=Pλ1[ˆλ1≤b(λ1)]=Pλ1[b−1(ˆλ1)≤λ1]. (9)

Clearly, (9) indicates that is the symmetric lower bound of the 100()% confidence intervals of . Therefore, if denotes the observed value of , then we need to find , such that

 α2=Pλ1L(ˆλ1≥ˆλ1,obs). (10)

Note that, (10) is equivalent in finding

 1−α2=Pλ1L(ˆλ1≤ˆλ1,obs). (11)

Similarly, we can obtain , the symmetric upper bound of the 100()% confidence intervals of by solving the following equation

 α2=Pλ1U(ˆλ1≤ˆλ1,obs). (12)

Since, it is not possible to obtain a closed form expression of , we need to use some iterative method to solve (11) and (12) to compute and , respectively. In practice since is also unknown we replace it by its MLE.

The construction of the confidence interval of is based on the assumption that is a strictly increasing function of for all . Unfortunately, due to complicated nature of , we could not establish this property. It may be mentioned that many authors including Chen and Bhattacharyya [6], Gupta and Kundu [15], Childs et al. [7] used this property to find confidence interval of the scale parameter for exponential distribution. Although theoretically it is difficult to check the assumption, a numerical study supports the monotonicity assumption. We present the graphs for in Figure 1 and Figure 2, and they support our claim. Moreover, heuristically it may be argued that since is a scale parameter, the distribution function of is stochastically increasing as a function of . That justifies the assumption. Based on the assumption that is a strictly increasing function of for all we have the following result.

Lemma 5: For and , the solutions of (11) and (12) always exist.

Proof: See in the Appendix.

Now let us consider the case when either = 0 or = 0. Note that when = 0, , and vice versa. Now when = 0, a 100 confidence set of can be obtained as follows:

 A={(λ1,λ2):Pλ1,λ2(D1=0)>1−α}.

Similarly, when = 0, a 100 confidence set of can be obtained as follows:

 B={(λ1,λ2):Pλ1,λ2(D2=0)>1−α}.

### 4.2 Asymptotic and Bootstrap Confidence Intervals

Since the construction of the confidence intervals of and based on the exact distributions of the estimators are quite computationally involved, we propose to use two alternative confidence intervals which can be obtained more conveniently. Based on the asymptotic normality of the MLEs, 100()% asymptotic confidence interval of and can be obtained as

 ⎛⎝ˆλ1−zα2D1/21W,^λ1+zα2D1/21W⎞⎠   and   ⎛⎝ˆλ2−zα2D1/22W,ˆλ2+zα2D1/22W⎞⎠, (13)

respectively. Note that asymptotic interval of does not exist when . Similar case holds for also.

We propose to use bootstrap method for constructing confidence intervals of the unknown parameters. Steps of construction of bootstrap confidence intervals are as follows.

• Step 1: Define,

Here and are obtained from the following two non-linear equations:

 Pλ11,ˆλ2(D1=0)=0.5   and   Pˆλ1,λ22(D2=0)=0.5.
• Step 2:
Case-I:

• Generate a sample of size from the distribution

 f(x)=ˆλMe−ˆλMx1−e−ˆλMZR:n      0

where . If the largest value of the sample is greater than , perform Step (b), otherwise, repeat Step (a).

• Assign Cause-I or Cause-II to each failure with probability and , respectively.

Case-II:

• Generate a sample of size from the distribution

 f(x)=ˆλMe−ˆλMx1−e−ˆλMT      0
• Assign Cause-I or Cause-II to each failure with probability and , respectively.

• Step 3: Compute and from the bootstrap sample. Repeat the process times.

• Step 4: Let be the empirical distribution function of . Let us define . Then approximate confidence interval of is given by . Similarly, we can obtain the bootstrap confidence interval of also.

## 5 Bayesian analysis

It is assumed that has a joint Beta-Gamma prior as given in (2). Now based on the above joint prior, we provide the Bayes estimates and the associated credible set of the unknown parameters. The joint posterior distribution of and can be easily observed as

 π(λ1,λ2|data)∝e−(W+b0)(λ1+λ2)λ1a1+D1−1λ2a2+D2−1(λ1+λ2)a0−a1−a2,  λ1>0,λ2>0. (14)

Hence,

 π(λ1,λ2|data)∼BG(b0+W,a0+J,a1+D1,a2+D2). (15)

Therefore, under the squared error loss function the Bayes estimates of and are

 ^λ1B=E(λ1|data)=(a0+J)(a1+D1)(b0+W)(a1+a2+J),^λ2B=E(λ2|data)=(a0+J)(a2+D2)(b0+W)(a1+a2+J),

respectively. Similarly, we can obtain the corresponding posterior variances as follows;

 V(λ1|data)=A1B1,    V(λ2|data)=A2B2.

Here, for ,

 Ak=(a0+J)(ak+Dk)(b0+W)2(a1+a2+J)