Add support for MiniMax-M2 #42028

rogeryoungh · 2025-11-05T09:26:38Z

What does this PR do?

This PR adds MiniMax-M2 model to Hugging Face Transformers from MiniMaxAI.

Relevant Links:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @Cyrilvallez

Signed-off-by: xuebi <xuebi@minimaxi.com>

github-actions · 2025-11-05T09:27:48Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, minimax_m2

Signed-off-by: xuebi <xuebi@minimaxi.com>

molbap

Very clean integration, no particular comments. Thank you! cc @Cyrilvallez for core review

molbap · 2025-11-05T13:01:34Z

docs/source/en/model_doc/minimax_m2.md

+
+## Overview
+
+MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever.


Changed a bit to be more factual

Suggested change

MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever.

MiniMax-M2 is a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever.

molbap · 2025-11-05T13:15:41Z

src/transformers/models/minimax_m2/modular_minimax_m2.py

+    keys_to_ignore_at_inference = ["past_key_values"]
+    base_model_tp_plan = {
+        "layers.*.self_attn.q_proj": "colwise",
+        "layers.*.self_attn.k_proj": "colwise",
+        "layers.*.self_attn.v_proj": "colwise",
+        "layers.*.self_attn.o_proj": "rowwise",
+        "layers.*.block_sparse_moe.gate": "colwise_rep",  # we need to replicate here to correctly route experts
+        "layers.*.block_sparse_moe.experts.*.w1": "colwise",
+        "layers.*.block_sparse_moe.experts.*.w2": "rowwise",
+        "layers.*.block_sparse_moe.experts.*.w3": "colwise",
+    }
+    base_model_pp_plan = {
+        "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+        "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+        "norm": (["hidden_states"], ["hidden_states"]),
+    }
+    attribute_map = {
+        "num_experts": "num_local_experts",
+    }


For further reviews: This part is identical to Minimax1 configuration but inheriting the configuration would require deleting a bunch of keys, like full_attn_beta_factor, and so on, so ok to keep.

molbap · 2025-11-05T13:16:57Z

src/transformers/models/minimax_m2/modular_minimax_m2.py

+class MiniMaxM2MLP(MixtralMLP):
+    pass


should be safe to remove

Suggested change

class MiniMaxM2MLP(MixtralMLP):

pass

molbap · 2025-11-05T13:19:19Z

src/transformers/models/minimax_m2/modular_minimax_m2.py

+
+class MiniMaxM2DecoderLayer(MixtralDecoderLayer):
+    pass
+


safe to delete as well

Suggested change

class MiniMaxM2DecoderLayer(MixtralDecoderLayer):

pass

xuebi added 6 commits October 31, 2025 14:17

update: init m2

5363916

Signed-off-by: xuebi <xuebi@minimaxi.com>

update: docs and config

261fe5c

Signed-off-by: xuebi <xuebi@minimaxi.com>

update: init minimax-m2 test

ac4613c

Signed-off-by: xuebi <xuebi@minimaxi.com>

update: fix tests

3421fe7

Signed-off-by: xuebi <xuebi@minimaxi.com>

update: use partial_rotary_factor

3a5df7a

Signed-off-by: xuebi <xuebi@minimaxi.com>

update: some fix

cb17f62

Signed-off-by: xuebi <xuebi@minimaxi.com>

fix: import Unpack from processing_utils

f6775d8

Signed-off-by: xuebi <xuebi@minimaxi.com>

molbap approved these changes Nov 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for MiniMax-M2 #42028

Add support for MiniMax-M2 #42028

rogeryoungh commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

molbap left a comment

Uh oh!

molbap Nov 5, 2025

Uh oh!

molbap Nov 5, 2025

Uh oh!

molbap Nov 5, 2025

Uh oh!

molbap Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Overview

		MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever.

Add support for MiniMax-M2 #42028

Are you sure you want to change the base?

Add support for MiniMax-M2 #42028

Conversation

rogeryoungh commented Nov 5, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

molbap Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants