Skip to content

Conversation

@Adhithya-Laxman
Copy link

Description

This PR implements the Adagrad (Adaptive Gradient) optimizer using pure NumPy as part of the effort to add neural network optimizers to the repository.

This PR addresses part of issue #13662 - Add neural network optimizers module to enhance training capabilities

What does this PR do?

  • Implements Adagrad optimizer that adapts the learning rate for each parameter individually based on historical gradient information
  • Accumulates squared gradients and scales learning rate inversely with the square root of this accumulation
  • Particularly effective for sparse data and features with varying frequencies
  • Provides a clean, educational implementation without external deep learning frameworks

Implementation Details

  • Algorithm: Adagrad (Adaptive Gradient)
  • Update rule:
    accumulated_grad += gradient^2
    adjusted_lr = learning_rate / (sqrt(accumulated_grad) + epsilon)
    param = param - adjusted_lr * gradient
    
  • Key Features:
    • Parameter-specific adaptive learning rates
    • Accumulation of squared gradients over time
    • Epsilon term for numerical stability
  • Pure NumPy: No PyTorch, TensorFlow, or other frameworks required
  • Educational focus: Clear variable names, detailed docstrings, and comments

Features

✅ Complete docstrings with parameter descriptions
✅ Type hints for all function parameters and return values
✅ Doctests for correctness validation
✅ Usage example demonstrating optimizer on quadratic function minimization
✅ PEP8 compliant code formatting
✅ Accumulated gradient tracking per parameter
✅ Numerical stability with epsilon parameter

Testing

All doctests pass:

python -m doctest neural_network/optimizers/adagrad.py -v

Linting passes:

ruff check neural_network/optimizers/adagrad.py

Example output demonstrates proper convergence behavior, with learning rates automatically adapting for each parameter.

References

Relation to Issue #13662

This PR is part of the planned optimizer sequence outlined in #13662:

Why Adagrad?

Adagrad is particularly useful for:

  • Training on sparse data (e.g., NLP tasks)
  • Handling features that appear with different frequencies
  • Automatic learning rate adaptation without manual tuning
  • Early stopping of frequently updated parameters

Checklist

  • I have read CONTRIBUTING.md
  • This pull request is all my own work -- I have not plagiarized
  • I know that pull requests will not be merged if they fail the automated tests
  • This PR only changes one algorithm file
  • All new Python files are placed inside an existing directory
  • All filenames are in all lowercase characters with no spaces or dashes
  • All functions and variable names follow Python naming conventions
  • All function parameters and return values are annotated with Python type hints
  • All functions have doctests that pass the automated testing
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation

Next Steps

Additional optimizers (NAG, Adam, Muon) will be submitted in follow-up PRs to maintain focused, reviewable contributions as outlined in issue #13662.


Related: Part of #13662

- Implements Adagrad (Adaptive Gradient) using pure NumPy
- Adapts learning rate individually for each parameter
- Includes comprehensive docstrings and type hints
- Adds doctests for validation
- Provides usage example demonstrating convergence
- Follows PEP8 coding standards
- Part of issue TheAlgorithms#13662
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant