I. Parameter Estimation
1 Statistical Modeling and Quality Criteria
Statistical Estimation
The goal is to determine the probability distribution of the random variables based on avaliable samples. / The stochastic model is a set of probability spaces and the task of statistical estimation is to select the most appropriate candidate based on the observed outcomes of a random experiment.
Statistical Model
In a statistical model, data is assumed to be generated by some underlying probability distribution, and the goal is to estimate the parameters of this distribution.
Statistics
refers to the probability of an event according to the probability distribution selected by . The same holds for and .
1.1 Introductory Example: Estimating upper bound
Given
Solution 1: Use
Solution 2: Intuition
1.2 Consistency and Unbiasedness
Consistent estimator :
Using Chebyshev inequality, we derive the law of large numbers,
Using law of large numbers, we get
Unbiased estimator :
is unbiased:
is asymptorically unbiased:
Proof. For
is unbiased:
1.3 Variance:
A further quality measure for an estimator is its variance.
1.4 Mean Squared Error(MSE):
An extension of the variance is the MSE(mean squared error), where
1.5 Bias/Variance Trade-Off
MSE of an estimator
and can be decomposed into its bias and variance
Choose
to get optimal
Therefore, an unbiased estimator is not necessarily the optimal estimator, but for large
2 Maximum Likelihood Estimation
2.1 Maximum Likelihood Principle
The maximum likelihood principle suggests to select a candidate probability measure such that the observed outcomes of the experiment become most probable. A maximum likelihood estimator
The likelihood function depends on the statistical model, assuming all observations are iid, we obtain,
Normally, we use log-likelihood function,
In the slides,
is used rather than .
2.2 Parameter Estimation
Channel Estimation
Consider an AWGN channel
Given
The ML estimator is obviously identical with the least squares estimator, which changes drastically when the statistics
are correlated or when is non-Gaussian distributed.
Introductory Example: Estimating upper bound
Suppose the distribution of observations is uniform, the likelihood function of
Bernoulli Experiments
Given
and
The ML-estimator is obtained by
In the following, we analyze the quality of
Since the estimator is unbiased, the MSE is equal to the variance of the estimator,
However, biased estimator can have less MSE and thus provide better estimates than unbiased estimator.
Alternative Solution:
2.3 Best Unbiased Estimator
ML estimators are not necessarily the best estimators. However, a wide class of estimators is defined by minimizing the MSE under an unbiasedness constraint.
We call an estimator
for any alternative unbiased estimator
Best unbiased estimators are also referred to as UMVU(Uniformly Minimum Variance Unbiased) estimators.
3. Fisher's Information Inequality
An universal lower bound for the variance of an estimator can be introduced, if the following consition is fulfilled:
We define the score function as the slope of
3.1 Cramer-Rao Lower Bound
With
which can be interpreted as the negative mean curvature of the log-likelihood function at
The variance of an estimator can be lower bounded by the Cramer-Rao lower bound
If
Properties of the Fisher Information:
depends on given observations and the unknown parameterA large value of
corresponds to a strong curvature and more information in . A small value of corresponds to a weak curvature and little information in . is monotonically increasing with the number of independent observation statistics.
3.2 Exponential Models
A exponential model is a statistical model with
If
Mean Estimation Example
Consider the estimation of the unkown mean value
Since
3.3 Asymptotically Efficient Estimators
An estimator is asymptotically efficient if (convergence in distribution)
A ML estimator is asymptotically efficient if …
II. Examples
4. ML Principle for Direction of Arrival Estimation
4.1 Signal Model and ML Estimation
We consider the estimation of the Direction Of Arrival(DoA)
Then the signal at the
where
Furthermore, we assume a Uniform Linear Array(ULA) with
In the case of a single observation and AWGN
!
4.2 Cramer-Rao Bound for DOA Estimation
The likelihood function of the given estimation problem obviously belongs to the family of exponential distributions, where
III. Estimation of Random Variables
5. Bayesian Estimation
In the previous ML principle, we don't have any statistical information about the parameter
The Bayesian Estimation Method is based on a specific Statistical Model for the unknown parameter, here, it is
where
Furthermore, instead of using notation
Now, the MSE of estimator
5.1 Conditional Mean Estimator/Bayes Estimator – Minimizing the MSE
Conditional mean estimator
Theorem
The conditional mean estimator (Bayes estimator)
Alternative cost criterion
Although the mean MSE is the most popular cost criterion, other criteria have been proposed and applied. The mean modulus is defined as
The conditional median estimator minimizes it,
5.2 Binomial Experiment
The ML estimator is obtained as
Now we assume
Note that
We get
5.3 Mean Estimation Example
We consider the estimation of the unknown mean value
assuming
Conditional mean estimator (two steps needed):
Computing conditional PDF
Computing conditional mean
Discussion
Given large
or small conditional variance or large variance , it is recommended to rely on the ML estimator.Given small variance
or large conditional variance , it is recommended to rely on the mean value of .
Minimum Mean Square Error
MSE is minimized at
which can be obtained once we have the information about the joint distribution
5.4 Jointly Gaussian Random Variables – Multivariate Case
Given random vectors and the covariance matrix from the joint distribution:
the multivariate conditional mean estimator is obtained as:
The respective MMSE is equal to the trace of the conditional covariance matrix
Yes, the estimator in (5.48) is the solution of the integral in (5.34), where
and are substituted by the respective vectors. The derivation of the estimator is however not as straightforward than for the scalar case due to the different rules for operations of vectors and matrices instead of scalars.If you refer to the last paragraph on slide
, you find the case for . There, if you consider to be equal with , you certainly see the strong similarity to the Eq. (5.48).Silde
: For , we obtainGiven Jointly Gaussian Random Variables
and , the Conditional Mean Estimator is a Linear Function in ( ), and the marginal distribution of is also Gaussian independent of the correlation coefficient. This does not hold for arbitrarily jointly distributed random variables!
Example case:
where
Apply (5.48), we have the CM estimator
5.5 Orthogonality Principle
The stochastic orthogonality is an inherent property of the conditional mean estimator, it describes the inherent stochastic orthogonality between the CM estimator error and any observations statistics thereof:
the CM estimation error is stochastically orthogonal to any observations statistics.where
Mean Estimation Example
Given
and functionals
Recall that the MSE optimal estimator is linear in
Apply the orthogonal principle,
we obtain
The missing parameter
Finally we get
Alternative solution:
We choose
and apply orthogonality principle:
IV. Linear Estimation
6. Linear Estimation
We focus on linear models (linear regression)
6.1 Least Square Estimation
The standard approach for this is based on convexity of the objective function,
Geometrical Perspective
2nd Geometrical Perspective
Using orthogonal projector
Mean Estimation Example
In order to estimate the mean
and the LS optimization problem
Affine Linear Regression
Given traning set, a linear estimator is defined as
Another linear estimator is
SVD Perspective
6.2 Least Equares Estimation with Regularization (Ridge Regression)
In many real-world applications, LS problems may be ill-conditioned. In such cases, regularization techniques can provide an alternative handling of the problem.
Tikhonov/Ridge regression:
where
Furthermore,
6.3 Linear Minimum Mean Square Error Estimation
Based on a linear model for the estimator, the LMMSE minimizes the optimization problem
Given the joint mean values and covariance of the random variables
we can get the LMMSE estimator by
! apply orthogonality principle
In the case of zero-mean random variables, the LMMSE estimator is
. The minimum MSE is
Given that the random variables
and are jointly Gaussian distributed, the LMMSE is obviously identically with the CME.
V. Examples
7. Estimator of a Matrix Channel
We consider the estimation of a time-inveriant non-dispersive MIMO channel
And we compare three linear estimators
MMSE, mean square error estimator
ML, maximum likelihood estimator
MF, matched filter estimator
7.1 Channel Model
The task
is to find good estimators of the channel coefficients
where
The traning signals
consist of
The estimation
of the channel coefficients
where
The model
for the training channel is
i.e.,
By stacking the column vectors of the matrices, we obtain
! here we use
Further Assumptions
We further assume that the stacked vector are Gaussian distributed
where
And the channel vector
Matrix
is used as an abbreviation for
Consequently, due to the linear channel model
with
In the following, we assume full knowledge of the covariance matrices
and .
7.2 Three Linear Estimator
1. MMSE (minimum mean square error estimation)