I. Parameter Estimation
1 Statistical Modeling and Quality Criteria
- Statistical Estimation 
The goal is to determine the probability distribution of the random variables based on avaliable samples. / The stochastic model is a set of probability spaces and the task of statistical estimation is to select the most appropriate candidate based on the observed outcomes of a random experiment.
- Statistical Model 
In a statistical model, data is assumed to be generated by some underlying probability distribution, and the goal is to estimate the parameters of this distribution.
- Statistics 
1.1 Introductory Example: Estimating upper bound 
Given 
Solution 1: Use 
Solution 2: Intuition 
1.2 Consistency and Unbiasedness
Consistent estimator 
 
Using Chebyshev inequality, we derive the law of large numbers,
Using law of large numbers, we get
Unbiased estimator 
Proof. For 

1.3 Variance: 
A further quality measure for an estimator is its variance.
1.4 Mean Squared Error(MSE): 
An extension of the variance is the MSE(mean squared error), where 

1.5 Bias/Variance Trade-Off
MSE of an estimator 
and can be decomposed into its bias and variance
- Choose 
Therefore, an unbiased estimator is not necessarily the optimal estimator, but for large 
2 Maximum Likelihood Estimation
2.1 Maximum Likelihood Principle
The maximum likelihood principle suggests to select a candidate probability measure such that the observed outcomes of the experiment become most probable. A maximum likelihood estimator 

The likelihood function depends on the statistical model, assuming all observations are iid, we obtain,
Normally, we use log-likelihood function,
- In the slides, 
2.2 Parameter Estimation
Channel Estimation
Consider an AWGN channel 
Given 
- The ML estimator is obviously identical with the least squares estimator, which changes drastically when the statistics 
Introductory Example: Estimating upper bound 
Suppose the distribution of observations is uniform, the likelihood function of 
Given 
and 
The ML-estimator is obtained by
In the following, we analyze the quality of 
Since the estimator is unbiased, the MSE is equal to the variance of the estimator,
However, biased estimator can have less MSE and thus provide better estimates than unbiased estimator.
Alternative Solution:

2.3 Best Unbiased Estimator
ML estimators are not necessarily the best estimators. However, a wide class of estimators is defined by minimizing the MSE under an unbiasedness constraint.
We call an estimator 
for any alternative unbiased estimator 
Best unbiased estimators are also referred to as UMVU(Uniformly Minimum Variance Unbiased) estimators.
3. Fisher's Information Inequality
An universal lower bound for the variance of an estimator can be introduced, if the following consition is fulfilled:
We define the score function as the slope of 
3.1 Cramer-Rao Lower Bound
With 
which can be interpreted as the negative mean curvature of the log-likelihood function at 
The variance of an estimator can be lower bounded by the Cramer-Rao lower bound
If 
Properties of the Fisher Information:
- A large value of 

3.2 Exponential Models
A exponential model is a statistical model with
If 
Mean Estimation Example
Consider the estimation of the unkown mean value 
Since 
3.3 Asymptotically Efficient Estimators
An estimator is asymptotically efficient if (convergence in distribution)
- A ML estimator is asymptotically efficient if … 
II. Examples
4. ML Principle for Direction of Arrival Estimation
4.1 Signal Model and ML Estimation
We consider the estimation of the Direction Of Arrival(DoA) 
Then the signal at the 
where 
Furthermore, we assume a Uniform Linear Array(ULA) with 
In the case of a single observation and AWGN 
! 
4.2 Cramer-Rao Bound for DOA Estimation
The likelihood function of the given estimation problem obviously belongs to the family of exponential distributions, where 
III. Estimation of Random Variables
5. Bayesian Estimation
In the previous ML principle, we don't have any statistical information about the parameter 
The Bayesian Estimation Method is based on a specific Statistical Model for the unknown parameter, here, it is 
where 
Furthermore, instead of using notation 
Now, the MSE of estimator 
5.1 Conditional Mean Estimator/Bayes Estimator – Minimizing the MSE
Conditional mean estimator 
- Theorem 
The conditional mean estimator (Bayes estimator) 
- Alternative cost criterion 
Although the mean MSE is the most popular cost criterion, other criteria have been proposed and applied. The mean modulus is defined as
The conditional median estimator minimizes it,
5.2 Binomial Experiment
The ML estimator is obtained as
Now we assume 
Note that
We get

5.3 Mean Estimation Example
We consider the estimation of the unknown mean value 
assuming 
Conditional mean estimator (two steps needed):
- Computing conditional PDF 
- Computing conditional mean 
Discussion
- Given large - Given small variance 
Minimum Mean Square Error
MSE is minimized at 
which can be obtained once we have the information about the joint distribution 
5.4 Jointly Gaussian Random Variables – Multivariate Case
Given random vectors and the covariance matrix from the joint distribution:
the multivariate conditional mean estimator is obtained as:
The respective MMSE is equal to the trace of the conditional covariance matrix 
- Yes, the estimator in (5.48) is the solution of the integral in (5.34), where - If you refer to the last paragraph on slide - Silde - Given Jointly Gaussian Random Variables 
Example case:
where
Apply (5.48), we have the CM estimator
5.5 Orthogonality Principle
The stochastic orthogonality is an inherent property of the conditional mean estimator, it describes the inherent stochastic orthogonality between the CM estimator error and any observations statistics thereof:
the CM estimation error is stochastically orthogonal to any observations statistics. where
where 
Mean Estimation Example
Given
and functionals 
Recall that the MSE optimal estimator is linear in 
Apply the orthogonal principle,
we obtain
The missing parameter 
Finally we get
- Alternative solution: 
We choose 
and apply orthogonality principle:
IV. Linear Estimation
6. Linear Estimation
We focus on linear models (linear regression)
6.1 Least Square Estimation
The standard approach for this is based on convexity of the objective function,
Geometrical Perspective

2nd Geometrical Perspective
Using orthogonal projector 
Mean Estimation Example
In order to estimate the mean 
and the LS optimization problem
Affine Linear Regression
Given traning set, a linear estimator is defined as
Another linear estimator is
SVD Perspective
6.2 Least Equares Estimation with Regularization (Ridge Regression)
In many real-world applications, LS problems may be ill-conditioned. In such cases, regularization techniques can provide an alternative handling of the problem.
Tikhonov/Ridge regression:
where 
Furthermore,
6.3 Linear Minimum Mean Square Error Estimation
Based on a linear model for the estimator, the LMMSE minimizes the optimization problem
Given the joint mean values and covariance of the random variables 
we can get the LMMSE estimator by
! apply orthogonality principle
- In the case of zero-mean random variables, the LMMSE estimator is 
- Given that the random variables 
V. Examples
7. Estimator of a Matrix Channel
We consider the estimation of a time-inveriant non-dispersive MIMO channel 
And we compare three linear estimators
- MMSE, mean square error estimator 
- ML, maximum likelihood estimator 
- MF, matched filter estimator 
7.1 Channel Model
The task
is to find good estimators of the channel coefficients
where 
The traning signals
consist of 
The estimation
of the channel coefficients 
where 
The model
for the training channel is
i.e.,
By stacking the column vectors of the matrices, we obtain
! here we use
Further Assumptions
We further assume that the stacked vector are Gaussian distributed
where
And the channel vector 
Matrix 
is used as an abbreviation for 
Consequently, due to the linear channel model 
with
- In the following, we assume full knowledge of the covariance matrices 
7.2 Three Linear Estimator
1. MMSE (minimum mean square error estimation)
