Enhance MAIA2 with full mypy type annotations, docstrings, and code clarity improvements #11

Uspectacle · 2025-11-08T15:09:28Z

👋 Hello 😄
This is my first time adding full mypy typing to a library, so I’m excited to share this and would really appreciate your feedback.

This PR introduces comprehensive type annotations, detailed docstrings, and general code quality improvements across the MAIA2 codebase.

The main goals are:

✅ Type safety — explicit, accurate type hints throughout the code.
✅ Readability — improved clarity through standardized documentation.
✅ Maintainability — clearer interfaces, better IDE and static analysis support.

All changes are fully backward compatible and carefully verified for correctness and consistency.

While I used an LLM to assist with the initial documentation and typing, I have personally reviewed, corrected, and validated every annotation and edit to ensure top-quality results.

If direct typing updates are not preferred, I’ve also prepared an alternative approach with stubs and .pyi files on another branch:
🔗 https://github.com/Uspectacle/maia2/tree/stubs

I personally believe that typing the codebase directly provides stronger long-term benefits, but I’m happy to adapt to the project’s preferred approach.

In the meantime, I’ve published the stubs separately here for convenience:
🔗 https://github.com/Uspectacle/maia2-stubs

General Improvements (all files)

Added module-level, class, and function docstrings.
Imported typing and used annotations for all functions and classes.
Removed unused imports.
Renamed unused variables with _.
Added comments to improve readability.
Applied ruff formatting and fixed all pylint and mypy warnings.

Breaking potential:

Minimal; almost only docstring and type annotation changes.

dataset.py

Ignored gdown import (not typed).
Moved constants to top: DEFAULT_SAVE_ROOT, TEST_DATASET_URL, TRAIN_DATASET_URL.
Simplified existence checks: if not os.path.exists(save_root).
Renamed evolving variable data → filtered_data in load_example_test_dataset.

Breaking potential:

Renaming data → filtered_data.
Constant names changed; users relying on inline URLs must update.

inference.py

Imported packages individually instead of bulk import.
preprocessing(): legal_moves cast to torch.float32.
get_preds(): used enumerate for loops.
inference_batch():
- highest_prob_move now unpacks (key, value) from max().
- Renamed acc → accuracy.
inference_each(): renamed elo_self → elo_self_tensor and elo_oppo → elo_oppo_tensor.

Breaking potential:

Variable renames (elo_self → elo_self_tensor, acc → accuracy) could affect external code referencing old names.
Users depending on tuple unpacking from prepare() may not notice change; tuple return recommended in future.

main.py

Disabled pylint:too-many-lines.
Imported packages individually.
process_per_game(): checked clock_info is not None before threshold comparison.
game_filter(): renamed white_elo → white_elo_str, black_elo → black_elo_str; returns None when no result.
process_per_chunk(): simplified if len(ret_per_game) > 0.
MAIA2Dataset.__getitem__(): renamed board_input → board_input_tensor.
Cast tqdm in evaluation functions (evaluate_MAIA1_data, train_chunks).
Cast loss_maia to loss in train_chunks().

Breaking potential:

Renamed variables may break user code depending on previous names.
None returns in game_filter() may require additional handling.

model.py

Ignored gdown import.
Moved constants to top: DEFAULT_SAVE_ROOT, CONFIG_URL, MODEL_URLS.
from_pretrained():
- Renamed illegal Python variable type → model_type.
- Simplified os.path.exists checks.
- Renamed evolving variable model → maia2_model and model_module.

Breaking potential:

model renaming could break code using direct access to the old variable.

train.py

Renamed model → maia2_model.
Renamed N_params → n_params for PEP8 compliance.

Breaking potential:

Minor; variable renames may require user updates.

utils.py

Moved constants to top: ELO_INTERVAL, ELO_START, ELO_END, PIECE_TYPES.
Config default values provided (arbitrary choices).
parse_args(): added encoding="utf-8" for file reading.
get_side_info(): added assert moving_piece is not None to avoid NoneType errors.
Renamed variables to avoid shadowing outer scope (chunks → chunk_list, all_moves → all_moves_uci).

Breaking potential:

Renamed variables could break user code that relies on old names.

Other Notes

All type annotations checked with mypy.
All formatting issues resolved with ruff.
prepare() is marked for future improvement: ideally returning a Tuple instead of a List for proper type safety.
Minor runtime behavior changes: mostly variable renames and type casts.

This PR introduces extensive type annotations, docstrings, and code improvements throughout the MAIA2 codebase. All changes have been made with minimal impact on runtime behavior and backward compatibility, though some evolving-type variable renames and functional clarifications may affect code that relies on previous names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance MAIA2 with full mypy type annotations, docstrings, and code clarity improvements #11

Enhance MAIA2 with full mypy type annotations, docstrings, and code clarity improvements #11

Uh oh!

Uspectacle commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Enhance MAIA2 with full mypy type annotations, docstrings, and code clarity improvements #11

Are you sure you want to change the base?

Enhance MAIA2 with full mypy type annotations, docstrings, and code clarity improvements #11

Uh oh!

Conversation

Uspectacle commented Nov 8, 2025

General Improvements (all files)

dataset.py

inference.py

main.py

model.py

train.py

utils.py

Other Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant