An Iterative Plug-in Algorithm for Optimal Bandwidth Selection in Kernel Intensity Estimation for Spatial Data
- A popular model for the locations of fibres or grains in composite materials
is the inhomogeneous Poisson process in dimension 3. Its local intensity function
may be estimated non-parametrically by local smoothing, e.g. by kernel
estimates. They crucially depend on the choice of bandwidths as tuning parameters
controlling the smoothness of the resulting function estimate. In this
thesis, we propose a fast algorithm for learning suitable global and local bandwidths
from the data. It is well-known, that intensity estimation is closely
related to probability density estimation. As a by-product of our study, we
show that the difference is asymptotically negligible regarding the choice of
good bandwidths, and, hence, we focus on density estimation.
There are quite a number of data-driven bandwidth selection methods for
kernel density estimates. cross-validation is a popular one and frequently proposed
to estimate the optimal bandwidth. However, if the sample size is very
large, it becomes computational expensive. In material science, in particular,
it is very common to have several thousand up to several million points.
Another type of bandwidth selection is a solve-the-equation plug-in approach
which involves replacing the unknown quantities in the asymptotically optimal
bandwidth formula by their estimates.
In this thesis, we develop such an iterative fast plug-in algorithm for estimating
the optimal global and local bandwidth for density and intensity estimation with a focus on 2- and 3-dimensional data. It is based on a detailed
asymptotics of the estimators of the intensity function and of its second
derivatives and integrals of second derivatives which appear in the formulae
for asymptotically optimal bandwidths. These asymptotics are utilised to determine
the exact number of iteration steps and some tuning parameters. For
both global and local case, fewer than 10 iterations suffice. Simulation studies
show that the estimated intensity by local bandwidth can better indicate
the variation of local intensity than that by global bandwidth. Finally, the
algorithm is applied to two real data sets from test bodies of fibre-reinforced
high-performance concrete, clearly showing some inhomogeneity of the fibre
intensity.