Skip to content

Conversation

@tbg
Copy link
Member

@tbg tbg commented Nov 5, 2025

This introduces the Status struct and hooks it up to the TestClusterState test.
It doesn't yet attach any actual semantics to the new member of storeStatus
that is being added as part of doing so. This will happen in follow-up commits,
loosely replacing the integration points removed in #156933.

Touches #156776.

Epic: CRDB-55052

@tbg tbg requested review from sumeerbhola and wenyihu6 November 5, 2025 14:27
@tbg tbg requested review from a team as code owners November 5, 2025 14:27
@blathers-crl
Copy link

blathers-crl bot commented Nov 5, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@sumeerbhola reviewed 2 of 2 files at r8, 1 of 5 files at r9, 6 of 6 files at r10, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 48 at r10 (raw file):

	// receive leases.
	LeaseDispositionOK LeaseDisposition = iota
	// LeaseDispositionRefusing dennotes that the store should not be considered

nit: denotes


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 51 at r10 (raw file):

	// as a target for leases.
	LeaseDispositionRefusing
	// LeaseDispositionShedding dennotes that the store should actively be

ditto


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 60 at r10 (raw file):

	// ReplicaDispositionRefusing dennotes that the store can receive replicas.
	ReplicaDispositionOK ReplicaDisposition = iota
	// ReplicaDispositionRefusing dennotes that the store should not be considered as

ditto


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 63 at r10 (raw file):

	// a target for replica placement.
	ReplicaDispositionRefusing
	// ReplicaDispositionShedding dennotes that the store should actively be

ditto


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 100 at r10 (raw file):

// are necessarily used by it yet.
//
// INVARIANT: if Health != HealthOK, Disposition.{Lease,Replica} = Shedding.

I wonder whether we should have one enum type with Refusing, Shedding, Ok.

Copy link
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: mod Sumeer’s comments. Maybe just due to unfamiliarity, I found this much easier to understand than the existing allocator’s StoreFilter.

Maybe this is too far into the future - do we have a preference for whether we want to preserve the best vs. good selector logic if mma starts doing upreplication https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go#L1328-L1332?

Not needed right now, but in the future it might be useful to review how filtering is done

candidateStoreList, aliveStoreCount, throttled := storePool.GetStoreList(storepool.StoreFilterThrottled)
and https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/allocator/storepool/store_pool.go#L274 and write a comment on the mapping from the existing allocator to mma behaviour to make sure the behavior remains consistent.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 116 at r8 (raw file):

func (ss *Status) assert() error {
	if ss.Health < 0 || ss.Health > healthCount {

Why not >= for this sentinel?

Copy link
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 62 at r10 (raw file):

	// ReplicaDispositionRefusing dennotes that the store should not be considered as
	// a target for replica placement.
	ReplicaDispositionRefusing

Super nit so feel free to ignore - I find reject to be clearer to me than refuse.

tbg added 3 commits November 6, 2025 14:46
It is included in the Stringer, but not otherwise hooked up
to anything.
This allows TestClusterState to update the store status.
It continues to be true that no semantics are attached to the store status at
this point.
@tbg tbg force-pushed the mma-storehealth branch from 8aa13b8 to 3e61da3 Compare November 6, 2025 13:47
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is too far into the future - do we have a preference for whether we want to preserve the best vs. good selector logic if mma starts doing upreplication https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go#L1328-L1332?

Looking at what good and best mean in this context:

https://github.com/cockroachdb/cockroach/blob/4b36efa5390b11bb3df65e5bf54d44ea865dd17e/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L1033-L1070

https://github.com/cockroachdb/cockroach/blob/4b36efa5390b11bb3df65e5bf54d44ea865dd17e/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L1132-L1168

this seems like a higher level decision anyway. We will need something that picks from a set of candidates based on the Status, the constraint conformance, and the storeLoadSummary. In the worst case, we can introduce a score-based ranking, though I hope we're not going to end up with a balance and diversity score as they get me confused every time because they do very different things depending on the context and are hard to remember the semantics and purpose of.

Not needed right now, but in the future it might be useful to review how filtering is done

Yes, I was planning on looking into this soon. For now I am thinking (though this plan may hit some snag when I try to do it) to rely mostly on the store pool directly (vs its input signals). That is, whenever we want to update the MMA's view of stores' statuses, we look at the store pool storeStatuses:

https://github.com/cockroachdb/cockroach/blob/4b36efa5390b11bb3df65e5bf54d44ea865dd17e/pkg/kv/kvserver/allocator/storepool/store_pool.go#L1398

@tbg dismissed @sumeerbhola and @wenyihu6 from 6 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @sumeerbhola and @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 116 at r8 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Why not >= for this sentinel?

Because silly bug, thanks for catching that. Fixed.


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 51 at r10 (raw file):

Previously, sumeerbhola wrote…

ditto

Done.


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 60 at r10 (raw file):

Previously, sumeerbhola wrote…

ditto

Done.


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 62 at r10 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Super nit so feel free to ignore - I find reject to be clearer to me than refuse.

Done.


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 63 at r10 (raw file):

Previously, sumeerbhola wrote…

ditto

Done.


pkg/kv/kvserver/allocator/mmaprototype/store_status.go line 100 at r10 (raw file):

Previously, sumeerbhola wrote…

I wonder whether we should have one enum type with Refusing, Shedding, Ok.

Mhm, I think that could lead to situations in which it isn't clear which of the purposes we're comparing. I want as much type system support as possible. I'll leave as is for now.

@tbg
Copy link
Member Author

tbg commented Nov 6, 2025

bors r+
TFTR!

@craig
Copy link
Contributor

craig bot commented Nov 6, 2025

@craig craig bot merged commit fa99540 into cockroachdb:master Nov 6, 2025
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants