Inverse Cognitive Radar – A Revealed Preferences Approach
Abstract
We consider an adversarial signal processing problem involving “us” versus an “enemy” radar equipped with a Bayesian tracker. By observing the emissions of the enemy’s radar, how can we detect if the radar is cognitive (constrained utility maximizer)? Given knowledge of “our” state and the observed sequence of actions taken by the enemy’s radar, we consider three problems: (i) Are the enemy radar’s actions (waveform choice, beam scheduling) consistent with constrained utility maximization? If so how can we estimate the cognitive radar’s utility function that is consistent with its actions. We formulate and solve the problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and the algebraic Riccati equation. (ii) How to construct a statistical test for detecting a cognitive radar (constrained utility maximization) when we observe the radar’s actions in noise or the radar observes our probe signal in noise? We propose a statistical detector with a tight Type 2 error bound. (iii) How can we optimally probe (interrogate) the enemy’s radar by choosing our state to minimize the Type 2 error of detecting if the radar is deploying an economic rational strategy, subject to a constraint on the Type 1 detection error? We present a stochastic optimization algorithm to optimize our probe signal. “Our” state can be viewed as a probe signal which causes the enemy’s radar to act; so choosing the optimal state sequence is an input design problem. The main analysis framework used in this paper is that of revealed preferences from microeconomics.
I Introduction
Cognitive radars [1] use the perceptionaction cycle of cognition to sense the environment, learn from it relevant information about the target and the background, then adapt the radar sensor to optimally satisfy the needs of their mission. A crucial element of a cognitive radar is optimal adaptivity: based on its tracked estimates, the radar adaptively optimizes the waveform, aperture, dwell time and revisit rate. In other words, a cognitive radar is a constrained utility maximizer.
This paper is motivated by the next logical step, namely, inverse cognitive radar. From the intercepted emissions of an enemy’s radar: (i) Is the enemy’s radar cognitive? That is, are the enemy radar’s actions consistent with optimizing a utility function (equivalently, is the radar’s behavior rational in an economics sense). If so how can we estimate the cognitive radar’s utility function that is consistent with its actions? (ii) How to construct a statistical detection test for utility maximization when we observe the enemy’s radar’s actions in noise and the enemy radar observes our probe signal in noise? (iii) How can optimally probe the enemy’s radar by choosing our state to minimize the Type 2 error of detecting if the enemy radar is deploying an economic rational strategy, subject to a constraint on the Type 1 detection error?
The central theme of this paper involves an adversarial signal processing/inverse reinforcement learning problem
Ia Revealed Preferences and Afriat’s Theorem
Nonparametric detection of utility maximization behavior is the central theme in the area of revealed preferences in microeconomics; which dates back to Samuelson in 1938 [3].
Definition 1 ([4, 5]).
A system is a utility maximizer if for every probe , the response satisfies
(1) 
where is a monotone utility function.
In economics, denotes the price vector and the consumption vector. Then is a natural budget constraint
The key result in revealed preferences is the following remarkable theorem due to Afriat; see [6, 5, 4, 7, 8] for extensive expositions.
Theorem 2 (Afriat’s Theorem [4]).
Given a data set
(2) 
the following statements are equivalent:

The system is a utility maximizer and there exists a monotonically increasing,
^{4} continuous, and concave utility function by satisfies (1). 
For and the following set of inequalities (called Afriat’s inequalities) has a feasible solution:
(3) 
The data set satisfies the Generalized Axiom of Revealed Preference (GARP), namely for any ,
(5)
Afriat’s theorem tests for economicsbased rationality; its remarkable property is that it gives a necessary and sufficient condition for a system to be a utility maximizer based on the system’s inputoutput response.
The feasibility of the set of inequalities (3) can be checked using a linear programming solver; alternatively GARP (4) can be checked using Warshall’s algorithm with computations [10] [11].
A utility function consistent with the data can be constructed
The recovered utility using (4) is not unique; indeed any positive monotone increasing transformation of (4) also satisfies Afriat’s Theorem; that is, the utility function constructed is ordinal. This is the reason why the budget constraint is without loss of generality; it can be scaled by an arbitrary positive constant and Theorem 2 still holds. In signal processing terminology, Afriat’s Theorem can be viewed as setvalued system identification of an argmax system; setvalued since (4) yields a set of utility functions that rationalize the finite dataset .
IB Objectives
In this paper, our working assumption is that a cognitive radar satisfies economicsbased rationality; that is, a cognitive radar is a constrained utility maximizer in the sense of (4) with possibly a nonlinear budget constraint. The main objectives of the paper involve answering:
1. Test for Utility Maximization – Spectral Revealed Preferences: The first question is: Does a radar satisfy economics based rationality, i.e., is its action consistent with optimizing a utility function ? By observing how the enemy’s radar switches ambiguity function and waveforms to track a target, or how the radar schedules its beam between targets, is there a utility function that rationalizes the radar’s behavior? Notice that a key requirement in Afriat’s theorem is a budget constraint. How to formulate a useful budget constraint for a radar? A key idea in this paper is to formulate linear and nonlinear budget constraints for a radar in terms of the tracking error covariance where and are the spectra of the state and observation noise matrices (as will be justified in Section III) associated with a Kalman filter tracker. From a practical point of view, such spectral revealed preferences yield constructive estimates of the radar’s utility function, and so we can predict (in a Bayesian sense) its future actions.
2. Cognition Detection in Noise: If the radar’s response or probe signal is observed in noise, then violation of Afriat’s theorem could be either due to measurement noise or the absence of utility maximization. We will construct a statistical detection test to decide if the radar is a utility maximizer. The hypothesis test yields a tight bound for the TypeI errors.
3. Optimal Probing. Given the detector in the above objective, what choice of probe signal yields the smallest Type II error in detecting if the radar is a utility maximizer, subject to maintaining the Type I error within a specified bound? We construct a stochastic gradient algorithm that estimates our optimal probe sequence.
IC Context and Literature
The above objectives are fundamentally different to the modelcentric theme used in the signal processing literature where one postulates an objective function (typically convex) and then proposes optimization algorithms. In contrast the revealed preference approach is data centric  given a dataset, we wish to determine if is consistent with utility maximization. Specifically, Sections III and IV below discuss how revealed preferences can be used as a systematic method to detect utility maximization in cognitive radars.
Regarding the literature, in the context of revealed preferences we already mentioned [7, 12, 8, 6, 4]. A nonlinear budget version extension was developed in [13] which we will exploit in our spectral revealed preferences setup in Section III. A stochastic detector for utility maximization given noisy measurements of the probe or response is studied in [14, 15] and we will use these results in Section V. Our earlier work [16, 17] consider utility estimation in adversarial signal processing and social network applications. As mentioned above, revealed preferences are more general than inverse reinforcement learning [2].
Cognitive radars [18] use stochastic control and resource allocation to adapt their waveform, aperture, and service requests. In the last decade there have been several works in adaptive/cognitive radar and radar resource management; see [19, 20, 21] and references therein. Our main aim in this paper is to detect cognitive radars. Below we will use revealed preferences to detect radars that optimize their waveforms; [22] contains detailed analysis of waveform adaptation based on the seminal book of [23].
Finally, this paper builds on our recent work [24, 25] in Bayesian adversarial signal processing where the aim is to reconstruct the posterior distribution of the enemy’s tracker given its actions. While [24, 25] deal with inverse Bayesian estimation problems, the focus here is on detecting constrained utility maximization.
Ii Cognitive Radar Response Model
The setup involves two time scales. Let denote discrete time (fast time scale) and denote epoch (slow time scale). Our probe signal is , the radar’s response action is and our measurement of this action is .
The model of “us” interacting with the cognitive radar has the following dynamics, see Figure 2:
(6) 
Let us explain the notation in (6): denotes a generic conditional probability density function (or probability mass function), denotes distributed according to, and

is our Markovian state with transition kernel and prior where denotes the state space. So our dynamics are determined by the control probe signal which evolves on the slow time scale. Our probing of the enemy radar is performed via purposeful maneuvers.

Based on optimizing a utility function (which is unknown to us) of the predicted target statistic (e.g. covariance of the target’s estimate) in epoch , the enemy radar chooses an action . It is here that actual tracker structure determines the response.

is the radar’s noisy observation of our state ; with observation likelihoods . Here denote the observation space. So the observation at the radar depends on its action , which evolves on the slow time scale.

is the radar tracker’s belief (posterior) of our state where denotes the sequence . The tracking functionality in (6) is the classical Bayesian optimal filtering update formula [21]
(7) Let denote the space of all such beliefs. When the state space is Euclidean space, then is a function space comprising the space of density functions; if is finite, then is the unit dimensional simplex of dimensional probability mass functions.

denotes our noisy measurement of the radar’s action
Iii Waveform Adaptation: Spectral Revealed Preferences to Test for Cognitive Radar
Waveform adaptation is perhaps one of the most important functionalities of a cognitive radar. A cognitive radar adapts its waveform by adapting its ambiguity function. Our aim is to detect such cognitive behavior of the enemy’s radar when it deploys a Bayesian filter as a tracker. Below we identify economic rationality of a radar controller that interacts with a physical level tracker. For concreteness, in this section we assume that the enemy’s cognitive radar uses a Kalman filter tracker. Also since the probe and response signal evolve on a slow time scale (described below) we assume that both the radar and us (observer) have perfect measurements of probe and response .
Our working assumption is that a cognitive radar satisfies economicsbased rationality; that is, it adapts its waveform by maximizing a utility function in the sense of (4) with a possibly nonlinear budget constraint. A key requirement in Afriat’s Theorem 2 is the budget constraint. In economics, such a constraint is obvious since it specifies the total available resources of the decision maker. How to formulate useful budget constraints for waveform adaptation? Our key idea here is to formulate linear and nonlinear budget constraints in terms of the Kalman filter error covariance where and are the spectra (eigenvalues) of the state and covariance noise matrices of the state space model.
Iiia Waveform Adaptation by Cognitive Radar
Suppose a radar adapts its waveform while tracking a target (us) using a Kalman filter. Our probe input comprises purposeful maneuvers that modulate the spectrum (vector of eigenvalues) of the state noise covariance matrix. The radar responds with an optimized waveform which modulates the spectrum of the observation noise covariance matrix. By observing the radar’s signals, how can we test the radar for economic rationality?
Linear Gaussian Target Model and Radar Tracker
Linear Gaussian dynamics for a target’s kinematics [26] and linear Gaussian measurements at the radar are widely assumed as a useful approximation [27]. Accordingly, consider the following special case of model (6) with linear Gaussian dynamics and measurements:
(8) 
Here is “our” state with initial density , denotes the cognitive radar’s observations, , and , are mutually independent i.i.d. processes. When denotes respectively, the x,y,z position and velocity components of the target (so ) then
(9) 
where is the sampling interval. Recall indexes the fast time scale while indexes the slow time scale.
In (8) we explicitly indicate the dependence of the state noise covariance on our probe signal and the observation noise covariance on the radar’s response signal . These are justified as follows. When the radar controls its ambiguity function, in effect it controls the measurement noise covariance . Of course, this come as a cost: reducing the observation noise covariance of a target results in increased visibility of the radar (and therefore higher threat) or increased covariance of other targets.
Our probing of the enemy radar is performed via purposeful maneuvers by modulating our state covariance matrix in (8) by . For example, in a classical linear Gaussian state space model used in target tracking [27], our probe parametrizes the state noise covariance which models acceleration maneuvers of our drone.
Based on observation sequence , the tracking functionality in the radar computes the posterior
where is the conditional mean state estimate and is the covariance. These are computed by the classical Kalman filter:
(10) 
Under the assumption that the model parameters in (8) satisfy is detectable and is stabilizable, the asymptotic predicted covariance as is the unique nonnegative definite solution of the algebraic Riccati equation (ARE):
(11) 
where and are the probe and response signals of the radar at epoch . Note is a symmetric matrix. Since is parametrized by , we write the solution of the ARE at epoch as .
IiiB Effect of waveform design on observation noise covariance
To give a precise structure to the radar dynamics, this section summarizes how the observation noise covariance in (8) depends on the radar waveform. The details involve maximum likelihood estimation involving the radar ambiguity function and can be found in [22]. Below:

denotes the speed of light (in free space),

denotes the carrier frequency,

is an adjustable parameter in the waveform,

is the signal to noise ratio at the radar.

is the unit imaginary number.

is the complex envelope of the waveform.
We now describe 3 waveforms and their resulting observation noise covariance matrices ; see [22] for details.
(i) Triangular Pulse  Continuous Wave
(12) 
(ii) Gaussian Pulse  Continuous Wave
(13) 
(iii) Gaussian Pulse  Linear Frequency Modulation chirp
(14) 
To summarize, by adapting its waveform parametrized by (vector of eigenvalues), the radar can change the noise covariance . Below we will use the response to construct revealed preference tests for cognition.
IiiC Testing for Cognitive Radar: Spectral Revealed Preferences with Linear Budget
We now show that Afriat’s theorem (Theorem 2) can be used to determine if a radar is cognitive. The assumption here is that the utility function maximized by the radar is a monotone function (unknown to us) of the predicted covariance of the target. Our main task is to formulate and justify a linear budget constraint in Afriat’s theorem.
Specifically, suppose

Our probe that characterizes our maneuvers, is the vector of eigenvalues of the positive definite matrix

The radar response is the vector of eigenvalues of the positive definite matrix .
Then the cognitive radar chooses its waveform parameter at each slow time epoch to maximize a utility :
(15) 
where is a monotone increasing function of .
Then Afriat’s theorem (Theorem 2) can be used to detect utility maximization and construct a utility function that rationalizes the response of the radar. Recall that the 1 in the right hand side of the budget can be replaced by any nonnegative constant.
It only remains to justify the linear budget constraint in (15). The th component of , denoted as , is the incentive for considering the th mode of the target; is proportional to the signal power. The th component of is the amount of resources (energy) devoted by the radar to this th mode; a higher (more resources) results in a smaller measurement noise covariance, resulting in higher accuracy of measurement by the radar. So measures the signal to noise ratio (SNR) and the budget constraint is a bound on the SNR. A rational radar maximizes a utility that is monotone increasing in the accuracy (inverse of noise power) . However, the radar has limited resources and can only expend sufficient resources to ensure that the precision (inverse covariance) of all modes is at most some prespecified precision at each epoch . We can then justify the linear budget constraint as follows:
Lemma 3.
The linear budget constraint implies that solution of the ARE (11) satisfies for some symmetric positive definite matrix .
IiiD Testing for Cognitive Radar: Spectral Revealed Preferences with Nonlinear Budget Constraint
We now formulate and justify a nonlinear budget constraint (nonlinear in ) which emerges naturally from the ARE (11), and use this together with an extension of Afriat’s theorem to test if a radar satisfies economic rationality. The interpretation of the probe and response are different (in some sense “opposite”) to that of the linear case:

The probe vector is the vector of eigenvalues of .

The radar response is the vector of eigenvalues of .

Define as the largest eigenvalue of where is the solution of the ARE (11).
With the above definitions, our aim is to test if the radar’s response satisfies economicsrationality:
(16) 
Since there is no natural ordering of eigenvalues, our assumption is that
is a symmetric function
Economicsbased Rationale for Utility and Nonlinear Budget constraint (16)
The economicsbased rationale for the utility (16) is as follows: The th component of , denoted as , is the price the radar pays for devoting resources to the th mode of the target. Since is inversely proportional to the signal power; so a higher implies a more expensive mode to track, implying that the enemy radar needs to allocate more resources to the th mode. The radar’s response for the th mode is ; this reflects the cost incurred by the radar for estimating mode . A rational radar aims to minimize its total effort where cost decreases with since choosing a waveform that results in a larger observation noise variance requires less effort. Equivalently, the radar seeks to maximize a utility function where is increasing with .
We now discuss the nonlinear budget constraint in (16) together with . The radar seeks to minimize total effort subject to maintaining the inaccuracy of all modes (covariance ) to be smaller than some prespecified covariance . Clearly, a sufficient condition is that . But for revealed preferences involving nonlinear budgets, we need the following (see Theorem 5 below): The constraint in (16) needs to be active at . This is straightforwardly ensured by choosing as
(17) 
That is, is the largest eigenvalue of the unique solution of the ARE . The constraint (17) says that the enemy’s Bayesian tracker cannot perform worse in covariance than that of the worst case observation noise covariance . i.e., (positive definite ordering).
Remark. In the special case when the constraint is omitted, then is the solution of the algebraic Lyapunov equation
(18) 
The constraint (17) then says that the enemy’s Bayesian tracker cannot perform worse than the optimal predictor (which has infinite observation noise). Of course, when is specified as in (9), since all the eigenvalues of are 1, the solution of the algebraic Lyapunov equation is not finite.
We can now justify the nonlinear budget for a cognitive radar equipped with a Kalman filter tracker as follows:
Revealed Preference for Nonlinear Budget
Having formally justified the nonlinear budget constraint in (16), we now state the main revealed preference test [13] which generalizes Afriat’s theorem to nonlinear budgets. The result below provides an explicit test for a cognitive radar and constructs a set of utility functions that rationalizes the decisions of the cognitive radar.
Theorem 5 (Test for rationality with nonlinear budget [13]).
Let with an increasing, continuous function and for . Then the following conditions are equivalent:

There exists a monotone continuous utility function that rationalizes the data set . That is

The data set satisfies GARP:

For and the following set of inequalities has a feasible solution:
(19) 
With and defined in (19), an explicit monotone continuous utility function that rationalizes the data set is given by:
(20)
Remarks: (i) Clearly Afriat’s theorem (Theorem 2) is a special case of Theorem 5 where . But unlike Afriat’s theorem, the constructed utility function is not necessarily concave.
(ii) Just like Afriat’s theorem, (19) comprises of linear inequalities in . So feasibility can be checked using an LP solver.
We now show that the nonlinear radar budget constraint in (16), (17) satisfies the properties of Theorem 5 with
(21) 
First, clearly is increasing in and is a continuous function of , and so is . Second Theorem 5 requires the constraint to be active at . This follows since is increasing in and due to (17).
Summary: By choosing the probe signal as the spectrum of and the response signal as the spectrum of , we can use the nonlinear budget Theorem 5 to test a cognitive radar for utility maximization. We can then construct explicit utility functions (20) that rationalize the decisions of the radar in terms of waveform adaptation.
Iv Beam Allocation: Revealed Preference Test
This section constructs a test for cognitivity of a radar that switches its beam adaptively between targets. We work at a higher level of abstraction than the previous section and consider multiple targets. Suppose a radar adaptively switches its beam between targets where these targets are controlled by us. As in (8), on the fast time scale indexed by , each target has linear Gaussian dynamics and the enemy radar obtains linear Gaussian measurements:
(22) 
Here , . We assume that both and are known to us and the enemy.
As in previous sections, indexes the slow time scale and indexes the fast time scale.
The enemy’s radar tracks our targets using Kalman filter trackers.
The fraction of time the radar allocates to each target in epoch is .
The price of each target at the beginning of epoch is its predicted precision, namely the trace
of the inverse predicted covariance at epoch using the Kalman predictor
Obviously, depends on the maneuver covariance of target . Unlike the previous section where the spectrum of the probe matrix was chosen as the probe vector, here we abstract the target’s covariance by the trace . Note also that the observation noise covariance depends on the enemy’s radar response , i.e., the fraction of time allocated to target .
We assume that each target is equipped with a radar detector and can estimate
Given the time series , , our aim is to detect if the enemy’s radar is cognitive. We assume that a cognitive radar optimizes its beam allocation as follows:
(23) 
where is the enemy radar’s utility function (unknown to us) and is a prespecified average precision of all targets.
The economicsbased rationale for the budget constraint is natural: For targets that are cheaper (lower precision ), the radar has incentive to devote more time . However, given its resource constraints, the radar can achieve at most an average precision of over all targets.
Note that the setup (23) is directly amenable to Afriat’s Theorem 2. Thus (3) can be used to test if the radar satisfies utility maximization in its beam scheduling (23) and also estimate the set of utility functions (4). Furthermore as in Afriat’s theorem since the utility is ordinal, can be chosen as 1 without loss of generality (and therefore does not need to be known by us).
V Detecting Cognitive Radars in a Noisy Setting
Afriat’s theorem (Theorem 2) and its generalization to nonlinear budgets (Theorem 5) assumes perfect observation of the probe and response. However, when the response (e.g. enemy’s radar waveform) is measured in noise by us, or the probe signal (e.g. our maneuver) is measured in noise by the enemy, violation of the inequalities in Afriat Theorem could be either due to measurement noise or absence of utility maximization (economic rationality). Below we give two statistical detection tests for utility maximization and characterize the TypeI and TypeII errors.
Va Detecting Cognitive Radar given Noisy Response
Suppose we observe the response of the enemy’s radar in additive noise as
(24) 
Here are dimensional random variables that are possibly correlated but functionally independent of . As an example, consider the setup of Section IV where a cognitive radar allocates its beam between multiple targets. Each target equipped with a radar detector obtains a noisy estimate of the fraction of time the enemy radar devotes to it.
Given the noisy data set
(25) 
from the enemy radar, how can we detect if is cognitive? Let denote the null hypothesis that the data set in (25) satisfies utility maximization. Let denote the alternative hypothesis that the data set does not satisfy utility maximization. There are two possible sources of error:
TypeI errors:  Reject when is valid.  
TypeII errors:  Accept when is invalid.  (26) 
Given , we propose the following statistical test to determine if the enemy radar is a utility maximizer (1):
(27) 
In the statistical test (27):
(i) is the “significance level” of the test.
(ii) The “test statistic” , with is the solution of the following constrained optimization problem:
(28) 
(iii) is the pdf of the random variable where
(29) 
The intuition behind (27), (28) is clear: if , then (28) is equivalent to Afriat’s theorem. Due to presence of noise, it is unlikely that is feasible; so we seek the minimum perturbation that satisfies (28).
The constrained optimization problem (28) is nonconvex due to the bilinear constraints . However, since the objective function depends only on the scalar , a one dimensional line search algorithm can be used. In particular, for any fixed value of , (28) becomes a set of linear inequalities, and so feasibility is straightforwardly determined.
The numerical implementation of detector (27) for a given probe sequence is described in Algorithm 1.

Offline Step. For iterations :

Simulate noise sequence .

Compute using (29).
Compute the empirical distribution of from these samples.


Record the response from the enemy radar.
The following theorem is our main result for characterizing the detector (27). It states that the probability of Type I error (false alarm) of the detector is bounded by and that the optimal solution gives the tightest false alarm bound.
Theorem 6.
Proof.
Suppose holds. By. Theorem 2, is equivalent to (3) having a feasible solution. Let denote a feasible solution for (3). Then substituting , it is easily seen that is a feasible solution for the noisy inequalities (28). Since is feasible, clearly the minimizing solution of (28) satisfies . Therefore,
Similarly, let denote a feasible solution to the noisy inequalities (28). Then implies that (3) has a feasible solution, i.e.,
Therefore if (28) is feasible, is equivalent to .
Let denote the complementary cdf of .
From the statistical test (27), the event given is equivalent to the event given
and (28).
So .
Now if , then since is uniform
Suppose . Then clearly, , i.e., (32) holds.
∎
VB Detecting Cognitive Radar given Noisy Probe
Here we consider the case where the radar observes our probe signal in additive noise as
(33) 
Here are dimensional i.i.d. random variables. Note that (33) is equivalent to the radar input being and us observing the radar input in noise as .
Given the noisy data set
(34) 
we propose the following statistical test for testing utility maximization (1) of the radar:
(35) 
In the statistical test (35):
(i) is the “significance level” of the test.
(ii) The test statistic , with is the solution of the following constrained optimization problem:
(36) 
(iii) is the pdf of the random variable where
(37) 
The numerical implementation of the detector (35) is similar to that of (30).
In complete analogy to Theorem 6 we have
VC Lower bound for False Alarm Probability
Given the significance level of the statistical test in (27), a Monte Carlo simulation is required to compute the threshold. We now present an analytical expression for a lower bound on the false alarm probability of the statistical test in (27) when the additive noise in (24) are standard normal variables.
Theorem 8 ([17]).
The proof is in [17]. From the analytical expression (38), can obtain an upper bound of the test statistic, denoted by . Hence, given a data set in (25), if the solution to the optimization problem (28) is such that , then the conclusion is that the data set does not satisfy utility maximization, for the desired false alarm probability.
Remark. We have discussed detecting a cognitive radar when either the radar’s response is observed in noise or our probe vector to the radar is observed in noise. A more general framework would be where both probe and response were observed in noise. We are unable to analyze the detector in this case.
Vi Optimizing our Probe Signal to Minimize TypeII Detection Error of Enemy’s Radar
This section deals with adaptively interrogating the enemy radar to detect if it is cognitive. Specifically, given batches of noisy measurements of the enemy’s radar response , (see (24)) how can we adaptively design batches of our probe signals , so as to minimize the TypeII error (deciding that the radar is cognitive when it is not)?
Theorem 6 above guarantees that if we observe the radar response in noise, then the probability of TypeI errors (deciding that the radar is not cognitive when it is) is less then for the decision test (27). In this subsection, the statistical test (27) is enhanced by adaptively optimizing the probe vectors to reduce the probability of TypeII errors (deciding that the radar is cognitive when it is not).
The framework is shown in Figure 3.
The probe signals are adapted to estimate
(39) 
Here denotes the conditional probability that the statistical test (27) accepts , defined in (26), given that is false. In (39), the noise matrix where the random vectors are defined in (24), and is the significance level of (27). The set contains all the elements , with , where does not satisfy (4).
Since the probability density function defined in (29) is not known explicitly, (39) is a simulation based stochastic optimization problem. To determine a local minimum value of , several types of stochastic optimization algorithms can be used [29]. Algorithm 2 uses the simultaneous perturbation stochastic gradient (SPSA) algorithm:

Choose initial probe

For iterations

Estimate cost in (39) using independent trials: In each trial compute noisy response for probe vector . Then evaluate
(40) Here is the indicator function. is the empirical cdf of computed as in (30). is obtained from (28) using noisy observation sequence (24), where is a fixed realization of , and data set , described below (39).

Compute the gradient estimate
(41) with gradient step size .

Update the probe vector with step size :
