Alternating optimization is an iterative procedure for optimizing some function jointly over all parameters by alternating restricted optimization over individual parameter subsets.
More precisely, consider minimizing or maximizing \(f:\mathbb{R}^n \to \mathbb{R}\) over the set of feasible points \(x \in X \subseteq \mathbb{R}^n\). The underlying algorithm of alternating optimization is as follows:
Assign an initial value for \(x\).
Optimize \(f\) with respect to a subset of parameters \(\tilde{x}\) while holding the other parameters constant1. This can be done explicitly or implicitly (by numerical optimization, which is the implemented method in this package).
Replace the values in \(x\) by the optimal values for \(\tilde{x}\) found in step 2.
Repeat from step 2 with another parameter subset.
Stop when the process has converged or reached an iteration limit.
When the joint optimization is (numerically) difficult (or not feasible).
When there is a natural division of the parameters. That is the case e.g. for likelihood functions, where the parameter space naturally divides into parameter subsets corresponding to linear effects, variances and covariances with different influence on the likelihood value.
To improve optimization time in some cases, see (Hu and Hathaway 2002) for an example.
Compared to joint optimization, alternating optimization may be better in bypassing local optima, see (James C. Bezdek and Hathaway 2002).
Alternating optimization, under certain conditions on \(f\), can convergence to the global optimum. However, the set of possible solutions also contains saddle points of \(f\), see for example (James C. Bezdek et al. 1987).
(J. Bezdek and Hathaway 2003) shows that alternating optimization under reasonable assumptions is locally q-linearly convergent.
As an application, we consider minimizing the Himmelblau’s function in \(n = 2\) dimensions with parameter constraints \(-5 \leq x_1, x_2 \leq 5\):
<- function(x) (x[1]^2+x[2]-11)^2+(x[1]+x[2]^2-7)^2 himmelblau
The ao package requires that the function first is introduced via the ao::set_f()
function in the following way:
<- ao::set_f(f = himmelblau, npar = 2, lower = -5, upper = 5, check = TRUE)
f #> Configuration checked.
The output f
is an object of class ao_f
which can be passed to the implementation of the alternating optimization algorithm ao::ao()
in the following.
ao::set_f()
has the following arguments:
f
: A function to be optimized, returning a single numeric value. Its first argument should be a numeric vector of length npar
. Additional arguments can be specified via the ...
argument. Gradient or Hessian of f
can be specified via attributes gradient
and hessian
for the function value. They are used for optimization if specified.
...
: Additional arguments to be passed to f
.
npar
: The number of variables of f
.
lower
: Lower bounds on the variables, which can be a single numeric value (a joint bound for all parameters) or a numeric vector of length npar
(for individual bounds).
upper
: Upper bounds on the variables, analogue to lower
.
iterlim
: The maximum number of iterations for the numerical optimizer for each sub-problem. No limit per default.
check
: If TRUE
checks the configuration. This will take at most 20 seconds in most cases. Set to FALSE
if you are confident about the configuration to save computation time.
Next, we pass f
to ao::ao()
as follows to perform alternating optimization:
::ao(f = f, partition = list(1, 2), initial = 0, iterations = 10,
aotolerance = 1e-6, minimize = TRUE, progress = FALSE, plot = FALSE)
#> Optimum value: 1.940035e-12
#> Optimum at: 3.584428 -1.848126
#> Optimization time: 0.33 seconds
This does the following: It minimizes f
by alternating optimizing with respect to each parameter separately, where the parameters all are initialized at the value 0. The algorithm terminates after 10 iterations or prematurely if the euclidean distance between the current solution and the one from the last iteration is smaller than tolerance = 1e-6
.2
Let’s also review the arguments of ao::ao()
:
f
: An object of class ao_f
, i.e. the output of ao::set_f()
.
partition
: A list of vectors of parameter indices \(1,...,n\) of the function. For example, choosing partition = list(1, 2)
as in the example optimizes each parameter separately, while choosing partition = list(1:2)
leads to joint optimization. Parameter indices can be members of multiple subsets.
initial
: A vector of length npar
of initial parameter values. Per default, the algorithm is initialized at the origin.
iterations
: The number of iterations through all subsets.
tolerance
: A non-negative numeric value. The function terminates prematurely if the euclidean distance between the current solution and the one from the last iteration is smaller than tolerance
.
minimize
: If TRUE
(the default), minimization, if FALSE
, maximization.
progress
: If TRUE
, progress is printed.
plot
: If TRUE
, the parameter updates are plotted.
Note that alternating optimization is a generalization of joint optimization, where the only parameter subset would be the whole set of parameters.↩︎
We set progress = FALSE
and plot = FALSE
here because setting these parameters to TRUE
would heavily expand the output. However, these options for printing and plotting the progress are useful for analyzing the behaviour of the alternating optimization.↩︎