Skip to content

Nonlinear conjugate gradient#554

Merged
adler-j merged 13 commits intomasterfrom
issue-251__nonlinear_cg
Feb 6, 2017
Merged

Nonlinear conjugate gradient#554
adler-j merged 13 commits intomasterfrom
issue-251__nonlinear_cg

Conversation

@adler-j
Copy link
Member

@adler-j adler-j commented Aug 31, 2016

Work in progress, but I feel that we need this sooner or later.

@adler-j
Copy link
Member Author

adler-j commented Sep 25, 2016

This obviously needs the PR #587 before proceeding.

@adler-j adler-j force-pushed the issue-251__nonlinear_cg branch from 6e54df9 to b1cb674 Compare December 10, 2016 16:29
@adler-j
Copy link
Member Author

adler-j commented Dec 11, 2016

This is now ready for a review. Only remaining thing to do is that this should be moved out of iterative and into scalar.

@adler-j
Copy link
Member Author

adler-j commented Jan 3, 2017

Bump for review

Copy link
Member

@kohr-h kohr-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fixes needed, method itself and tests look good though.

See Also
--------
conjugate_gradient : Optimized solver for symmetric matrices
conjugate_gradient_normal : Equivalent solver but for nonlinear case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normal -> nonlinear

conjugate_gradient : Optimized solver for linear and symmetric case
conjugate_gradient_normal : Equivalent solver but for linear case
"""
# TODO: add a book reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be OK with a Wikipedia link?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote higher up, we have a book reference for FR: GNS2009

conjugate_gradient_normal : Equivalent solver but for linear case
"""
# TODO: add a book reference
# TODO: update doc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still TODO, see above.


The method is described in a
`Wikipedia article
<https://en.wikipedia.org/wiki/Nonlinear_conjugate_gradient_method>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a problem statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will get that done, also short word on algorithm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want a book reference there is one in GNS2009 (at least for what wiki calls Flethcer-Reeves update) which we already have in our list of references

Operator in the inverse problem. If not linear, it must have
an implementation of `Operator.derivative`, which
in turn must implement `Operator.adjoint`, i.e.
the call ``op.derivative(x).adjoint`` must be valid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

tol : float, optional
Tolerance that should be used for terminating the iteration.
beta_method : {'FR', 'PR', 'HS', 'DY'}
Method to calculate ``beta`` in the iterates. TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a short math section would make sense to explain this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll add that but that goes up in notes, not here since that would clog this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thought, I wont. This can be easily googled and writing it down here would require quite a few lines.

niter : int
Number of iterations per reset.
nreset : int, optional
Number of times the solver should be reset. Default: no reset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this API confusing. niter or max_iter were always the total number of iterations in some sense, now the total number is (nreset + 1) * niter. Total surprise to me, and not a positive one. (I know the docstring says it, but if you're familiar with an API and pattern matching tells you it's the same here, this tiny change in text goes through totally unnoticed.)
IMO, either change the algorithm to use niter_before_reset = niter // (nreset + 1) and make sure you really run niter in total by adding the remaining ones at the end (I would do that), or rename niter to niter_before_reset or niter_inner vs. niter_outer or something that makes clear it's not the total number (I wouldn't do that because it unnecessarily changes the API).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do your first suggestion.

elif not callable(line_search):
line_search = ConstantLineSearch(line_search)

for rest_nr in range(nreset + 1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iteration variable not used

x.lincomb(1, x, a, dx) # x = x + a * dx

dx_old = dx
s = dx # for 'HS' and 'DY' beta methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only that, you also use them in the search direction update and the line search, no?

def solver(op, x, rhs):
norm2 = op.adjoint(op(x)).norm() / x.norm()
odl.solvers.landweber(op, x, rhs, niter=10, omega=0.5 / norm2)
odl.solvers.landweber(op, x, rhs, niter=50, omega=0.5 / norm2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It started failing due to some other change.

Copy link
Member

@aringh aringh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great :-). The only critical comment I have is on how to handle when we set beta = 0. It can handled as it is now, but then we are in some sense half-way into "more advanced" reseting options, and the nreset becomes only a lower bound.

Right-hand side of the equation defining the inverse problem
niter : int
Number of iterations per reset.
nreset : int, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are interested there are more advanced ways to determine when to reset. The one I know about is by Powell from 1977, but maybe there are more recent ones as well: http://link.springer.com/article/10.1007/BF01593790

raise ValueError('unknown ``beta_method``')

# Reset beta if negative.
beta = max(0, beta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take beta = 0, you will effectively reset the conjugate gradient method since s just becomes the steepest decent direction. I don't know if that actually matters, but we should think through how we want to handle that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically got this from wikipedia, I'm far from an expert. Personally, I'd feel fine with this version for now and then improving it later (if anyone wants to use it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't consider myself an expert in the area either ;-) and it is a perfectly fine solution for me (if you look at the reference that wikipedia uses, it claims convergence of the PR-version if this is used). I justed wanted to bring to attention that it actually is a restart of the method.

@adler-j adler-j force-pushed the issue-251__nonlinear_cg branch 2 times, most recently from 442aafe to 2e96668 Compare January 16, 2017 15:28
@adler-j
Copy link
Member Author

adler-j commented Jan 16, 2017

Fixed comments and moved the code into the smooth solvers, guess it needs a re-review.

Copy link
Member

@aringh aringh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, except for a potential sign error.

If there is a sign error it is a bit weird that the tests pass. I have double checked with the reference on wikipedia for the DY-formula, which contains all four expressions for beta, an it looks like there should indeed be a minus sign for HS and DY.

line_search : float or `LineSearch`, optional
Strategy to choose the step length. If a float is given, uses it as a
fixed step length.
maxiter : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional. In cg and cg normal this is called niter, is there a reason for using a different name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because the algorithm can terminate early and is not guaranteed to run this number of iterations. Should be cross-checked anyway and changed in the other methods if it applies there, too. I like the distinction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @kohr-h, that is why. Keeping this

raise TypeError('`x` {!r} is not in the domain of `f` {!r}'
''.format(x, f.domain))

if line_search is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is set to 1, but maybe you want to catch this anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you are correct, it should be "if isinstance", or perhaps a try catch float conversion.

elif beta_method == 'PR':
beta = dx.inner(dx - dx_old) / dx_old.inner(dx_old)
elif beta_method == 'HS':
beta = dx.inner(dx - dx_old) / s.inner(dx - dx_old)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a minus sign missing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where? What source are you using to see that?

elif beta_method == 'HS':
beta = dx.inner(dx - dx_old) / s.inner(dx - dx_old)
elif beta_method == 'DY':
beta = dx.inner(dx) / s.inner(dx - dx_old)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a minus sign missing

@kohr-h
Copy link
Member

kohr-h commented Jan 18, 2017

Minor comments, except for a potential sign error.

If there is a sign error it is a bit weird that the tests pass. I have double checked with the reference on wikipedia for the DY-formula, which contains all four expressions for beta, an it looks like there should indeed be a minus sign for HS and DY.

Could also be a sign error in the Wikipedia reference. Looking at the formulas it seems weird that the last two have this ominous minus sign although the rest looks kind of similar to the first two formulas. The simplest thing would be to flip the sign and see what happens.

@aringh
Copy link
Member

aringh commented Jan 18, 2017

The simplest thing would be to flip the sign and see what happens.

I did, and the tests still pass. Which is a bit troublesome.

The reason for the minus signs on wikipedia but that they are not there is in the reference, which is non-obvious from a first look, is that the wikipedia page works with delta x which is - grad while the reference works with grad directly. But someone can double check that I haven't made other sign errors when I tried to check it :-)

@adler-j
Copy link
Member Author

adler-j commented Jan 18, 2017

Holding on until the sign error is fixed, I'll have a look. Nice catch.

@adler-j adler-j changed the title ENH: initial commit for nonlinear conjugate gradient Nonlinear conjugate gradient Jan 19, 2017
@adler-j
Copy link
Member Author

adler-j commented Jan 19, 2017

The sign error was indeed a sign error. Fixed it and number of iterations needed went down drastically.

@adler-j
Copy link
Member Author

adler-j commented Jan 19, 2017

Can I merge this?

Copy link
Member

@kohr-h kohr-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stuff, after fix good for merge from my point of view.

@@ -1,4 +1,4 @@
# Copyright 2014-2016 The ODL development group
# Copyright 2014-2017 The ODL development group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, we need to change that across the board (not here though).

# You should have received a copy of the GNU General Public License
# along with ODL. If not, see <http://www.gnu.org/licenses/>.

"""Nonlinear version of conjugate gradient."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the conjugate gradient method

callback=None):
"""Conjugate gradient for nonlinear problems.

Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this comes before Parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do that in some cases, particularily when the description becomes too "thin" otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but in the long run I find it preferable to get directly to the parameters without much ado since that's what users have to look at most frequently. The math explanation is useful once or a couple of times, but later it's just in the way to the parameter reference.
So I prefer the note section after Parameters in almost all cases.


:math:`\min f(x)`

for a differentiable function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functional


The method is described in a
`Wikipedia article
<https://en.wikipedia.org/wiki/Nonlinear_conjugate_gradient_method>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that's exactly the amount of mathy documentation reasonable for such a docstring.

See Also
--------
bfgs_method : Quasi-newton solver for the same problem
conjugate_gradient : Optimized solver for linear and symmetric case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"least-squares problem with linear and symmetric operator" maybe? Otherwise not so clear what "linear case" means, could also be "linear functional"

--------
bfgs_method : Quasi-newton solver for the same problem
conjugate_gradient : Optimized solver for linear and symmetric case
conjugate_gradient_normal : Equivalent solver but for linear case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accordingly here

def conjugate_gradient_nonlinear(f, x, line_search=1.0, maxiter=1000, nreset=0,
tol=1e-16, beta_method='FR',
callback=None):
"""Conjugate gradient for nonlinear problems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method
also perhaps "general nonlinear problems" since we're also solving a nonlinear problem in the other case (the least squares problem)

used as starting point of the iteration, and its values are
updated in each iteration step.
line_search : float or `LineSearch`, optional
Strategy to choose the step length. If a float is given, uses it as a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used


# Find optimal step along s
dir_derivative = -dx.inner(s)
if abs(dir_derivative) < tol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking it should be <= here because "tolerance" in inclusive. It's also relevant if a user sets tol=0 because in some (toy) cases the update could actually be 0 and the method should terminate then.

@adler-j
Copy link
Member Author

adler-j commented Jan 19, 2017

Ready for merge?

@kohr-h
Copy link
Member

kohr-h commented Jan 19, 2017

Ready for merge?

That's for @aringh to decide ;-)

@kohr-h
Copy link
Member

kohr-h commented Jan 19, 2017

Perhaps you can also squash the first two commits to get rid of that WIP commit.

@adler-j
Copy link
Member Author

adler-j commented Jan 19, 2017

Will do once this is accepted

@adler-j
Copy link
Member Author

adler-j commented Jan 23, 2017

Bump for @aringh attention.

Copy link
Member

@aringh aringh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny comment. Looks good to me :-)

Number of times the solver should be reset. Default: no reset.
tol : float, optional
Tolerance that should be used for terminating the iteration.
beta_method : {'FR', 'PR', 'HS', 'DY'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional. Mention default value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do that only if it's not clear from the signature, i.e., when the parameter is in kwargs. Or if the default is None and we need to explain what happens in that case.

The convention for "limited-range" parameters is to name the default first. So I'd say it's fine as it is.

@adler-j
Copy link
Member Author

adler-j commented Jan 25, 2017

We'll @aringh had this innocent comment about missing a "optional" so I made a script to find stuff like this, and needless to say I found like 100+ errors.

Fixed them all so going to need a further review.

@adler-j adler-j force-pushed the issue-251__nonlinear_cg branch 2 times, most recently from e56c7f1 to 7507c52 Compare January 25, 2017 11:15
@aringh
Copy link
Member

aringh commented Jan 27, 2017

I had a quick look, and it looks good to me :-)

@adler-j adler-j force-pushed the issue-251__nonlinear_cg branch from accbad0 to 463a1d3 Compare February 6, 2017 11:11
@adler-j
Copy link
Member Author

adler-j commented Feb 6, 2017

Merge after CI.

@adler-j adler-j merged commit bd9e4c5 into master Feb 6, 2017
@adler-j adler-j deleted the issue-251__nonlinear_cg branch February 6, 2017 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants