This lecture: All minima of wide-enough shallow sigmoid network are global (https://ieeexplore.ieee.org/document/410380). Generalization to deep sigmoid networks (https://arxiv.org/abs/1704.08045).
Slides are the same as in the previous lecture, video.
Next lecture announcement: Spin-glass model (https://arxiv.org/abs/1412.0233, http://proceedings.mlr.press/v40/Choromanska15.pdf). Eliminating local minima (https://arxiv.org/abs/1901.00279).
If we would have enough time: gradient descent almost surely converges to local minima (https://arxiv.org/abs/1602.04915).