Enhance MAIA2 with full mypy type annotations, docstrings, and code clarity improvements #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
👋 Hello 😄
This is my first time adding full mypy typing to a library, so I’m excited to share this and would really appreciate your feedback.
This PR introduces comprehensive type annotations, detailed docstrings, and general code quality improvements across the MAIA2 codebase.
The main goals are:
All changes are fully backward compatible and carefully verified for correctness and consistency.
While I used an LLM to assist with the initial documentation and typing, I have personally reviewed, corrected, and validated every annotation and edit to ensure top-quality results.
If direct typing updates are not preferred, I’ve also prepared an alternative approach with stubs and
.pyifiles on another branch:🔗 https://github.com/Uspectacle/maia2/tree/stubs
I personally believe that typing the codebase directly provides stronger long-term benefits, but I’m happy to adapt to the project’s preferred approach.
In the meantime, I’ve published the stubs separately here for convenience:
🔗 https://github.com/Uspectacle/maia2-stubs
General Improvements (all files)
typingand used annotations for all functions and classes._.ruffformatting and fixed allpylintandmypywarnings.Breaking potential:
dataset.py
gdownimport (not typed).DEFAULT_SAVE_ROOT,TEST_DATASET_URL,TRAIN_DATASET_URL.if not os.path.exists(save_root).data→filtered_datainload_example_test_dataset.Breaking potential:
data→filtered_data.inference.py
Imported packages individually instead of bulk import.
preprocessing():legal_movescast totorch.float32.get_preds(): usedenumeratefor loops.inference_batch():highest_prob_movenow unpacks(key, value)frommax().acc→accuracy.inference_each(): renamedelo_self→elo_self_tensorandelo_oppo→elo_oppo_tensor.Breaking potential:
elo_self→elo_self_tensor,acc→accuracy) could affect external code referencing old names.prepare()may not notice change; tuple return recommended in future.main.py
pylint:too-many-lines.process_per_game(): checkedclock_info is not Nonebefore threshold comparison.game_filter(): renamedwhite_elo→white_elo_str,black_elo→black_elo_str; returnsNonewhen no result.process_per_chunk(): simplifiedif len(ret_per_game) > 0.MAIA2Dataset.__getitem__(): renamedboard_input→board_input_tensor.tqdmin evaluation functions (evaluate_MAIA1_data,train_chunks).loss_maiatolossintrain_chunks().Breaking potential:
Nonereturns ingame_filter()may require additional handling.model.py
Ignored
gdownimport.Moved constants to top:
DEFAULT_SAVE_ROOT,CONFIG_URL,MODEL_URLS.from_pretrained():type→model_type.os.path.existschecks.model→maia2_modelandmodel_module.Breaking potential:
modelrenaming could break code using direct access to the old variable.train.py
model→maia2_model.N_params→n_paramsfor PEP8 compliance.Breaking potential:
utils.py
ELO_INTERVAL,ELO_START,ELO_END,PIECE_TYPES.parse_args(): addedencoding="utf-8"for file reading.get_side_info(): addedassert moving_piece is not Noneto avoidNoneTypeerrors.chunks→chunk_list,all_moves→all_moves_uci).Breaking potential:
Other Notes
mypy.ruff.prepare()is marked for future improvement: ideally returning aTupleinstead of aListfor proper type safety.