-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add more metrics for reindex #137597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add more metrics for reindex #137597
Changes from 1 commit
f11a9c7
290fa48
6566a55
70ed46d
978f34c
84afac9
26abb9d
ec63e05
aa955f7
83282af
0a4e911
be783e6
03e2018
5e07f7c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,19 +15,66 @@ | |
| public class ReindexMetrics { | ||
|
|
||
| public static final String REINDEX_TIME_HISTOGRAM = "es.reindex.duration.histogram"; | ||
| // metrics for remote reindex should be a subset of the all metrics | ||
| public static final String REINDEX_TIME_HISTOGRAM_REMOTE = "es.reindex.duration.histogram.remote"; | ||
| public static final String REINDEX_SUCCESS_HISTOGRAM = "es.reindex.completion.success"; | ||
| public static final String REINDEX_SUCCESS_HISTOGRAM_REMOTE = "es.reindex.completion.success.remote"; | ||
| public static final String REINDEX_FAILURE_HISTOGRAM = "es.reindex.completion.failure"; | ||
| public static final String REINDEX_FAILURE_HISTOGRAM_REMOTE = "es.reindex.completion.failure.remote"; | ||
|
|
||
| private final LongHistogram reindexTimeSecsHistogram; | ||
| private final LongHistogram reindexTimeSecsHistogramRemote; | ||
| private final LongHistogram reindexSuccessHistogram; | ||
| private final LongHistogram reindexSuccessHistogramRemote; | ||
| private final LongHistogram reindexFailureHistogram; | ||
| private final LongHistogram reindexFailureHistogramRemote; | ||
samxbr marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| public ReindexMetrics(MeterRegistry meterRegistry) { | ||
| this(meterRegistry.registerLongHistogram(REINDEX_TIME_HISTOGRAM, "Time to reindex by search", "seconds")); | ||
| } | ||
| this.reindexTimeSecsHistogram = meterRegistry.registerLongHistogram(REINDEX_TIME_HISTOGRAM, "Time to reindex by search", "seconds"); | ||
| this.reindexTimeSecsHistogramRemote = meterRegistry.registerLongHistogram( | ||
| REINDEX_TIME_HISTOGRAM_REMOTE, | ||
| "Time to reindex by search from remote cluster", | ||
| "seconds" | ||
| ); | ||
|
|
||
| this.reindexSuccessHistogram = meterRegistry.registerLongHistogram( | ||
| REINDEX_SUCCESS_HISTOGRAM, | ||
| "Number of successful reindex", | ||
| "unit" | ||
| ); | ||
| this.reindexSuccessHistogramRemote = meterRegistry.registerLongHistogram( | ||
| REINDEX_SUCCESS_HISTOGRAM_REMOTE, | ||
| "Number of successful reindex from remote cluster", | ||
| "unit" | ||
| ); | ||
|
|
||
| private ReindexMetrics(LongHistogram reindexTimeSecsHistogram) { | ||
| this.reindexTimeSecsHistogram = reindexTimeSecsHistogram; | ||
| this.reindexFailureHistogram = meterRegistry.registerLongHistogram(REINDEX_FAILURE_HISTOGRAM, "Number of failed reindex", "unit"); | ||
| this.reindexFailureHistogramRemote = meterRegistry.registerLongHistogram( | ||
| REINDEX_FAILURE_HISTOGRAM_REMOTE, | ||
| "Number of failed reindex from remote cluster", | ||
| "unit" | ||
| ); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought of using Counter, my understanding is that Counter accumulates over time, which is good if we only want to track the the total amount at any given time. Whereas using histogram has the advantage of aggregating over arbitrary period of time.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, a counter accumulates over time. In my experience, you then use it by calculating a rate over some period of time e.g. 15mins. That's considered best practice when using Prometheus, for example. I don't know whether the same thing works in Kibana. It relies on it being cheap and easy to compute rates. We should maybe ask around for advice — either within the team or with some subject expert.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess the other option is to ditch this metric and just have a single histogram metric for the durations with the error attribute as well as the source attribute? That would mean combining the listeners, I haven't checked whether that's doable.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to find some existing Counter metrics in Kibana, and found the emitted values are not accumulating, which makes me wonder if my understanding on Counter is actually correct. I have put the details in this thread, hopefully someone can help answer there.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @samxbr. |
||
| } | ||
|
|
||
| public long recordTookTime(long tookTime) { | ||
| public long recordTookTime(long tookTime, boolean remote) { | ||
| reindexTimeSecsHistogram.record(tookTime); | ||
| if (remote) { | ||
| reindexTimeSecsHistogramRemote.record(tookTime); | ||
| } | ||
| return tookTime; | ||
| } | ||
|
|
||
| public void recordSuccess(boolean remote) { | ||
| reindexSuccessHistogram.record(1L); | ||
| if (remote) { | ||
| reindexSuccessHistogramRemote.record(1L); | ||
| } | ||
| } | ||
|
|
||
| public void recordFailure(boolean remote) { | ||
| reindexFailureHistogram.record(1L); | ||
| if (remote) { | ||
| reindexFailureHistogramRemote.record(1L); | ||
| } | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.