-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Add exponential histogram percentile function #137553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @Evaluator(warnExceptions = ArithmeticException.class) | ||
| static void process(DoubleBlock.Builder resultBuilder, ExponentialHistogram value, double percentile) { | ||
| if (percentile < 0.0 || percentile > 100.0) { | ||
| throw new ArithmeticException("Percentile value must be in the range [0, 100], got: " + percentile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is it a good or a bad practice to include the invalid value in the warning message? E.g. Asin doesn't include the wrong value in its warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the percentile is a constant with the integration, we should verify it and reject invalid values instead. But this should be okay now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you mean that if someone writes a query like STATS PERCENTILE(my_histo, 200) the query shouldn't even get to the execution phase, but we should reject it with an error.
Am I understanding this correctly?
I unfortunately don't know what the correct callback / hook would be to perform this verification. I looked in the PERCENTILE aggregation and the POW function for examples, but they don't seem to be doing this kind of verification either.
If you could provide me with an example on how to do this, I can add it in a follow-up.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
dnhatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if a separate function is required, but the implementation looks good. Let's merge it and move to the next step. Thanks, Jonas
| @Evaluator(warnExceptions = ArithmeticException.class) | ||
| static void process(DoubleBlock.Builder resultBuilder, ExponentialHistogram value, double percentile) { | ||
| if (percentile < 0.0 || percentile > 100.0) { | ||
| throw new ArithmeticException("Percentile value must be in the range [0, 100], got: " + percentile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the percentile is a constant with the integration, we should verify it and reject invalid values instead. But this should be okay now.
.../elasticsearch/xpack/esql/expression/function/scalar/histogram/HistogramPercentileTests.java
Outdated
Show resolved
Hide resolved
Having this as a separate function helps on two fronts:
|
…-json
* upstream/main:
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
[LTR] Fix feature display order when using explain. (elastic#137671)
Remove extra RemoteClusterService instances in unit test (elastic#137647)
Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
Update bundled JDK to 25.0.1 (elastic#137640)
resolve indices for prefixed _all expressions (elastic#137330)
ESQL: Add TopN support for exponential histograms (elastic#137313)
allows field caps to be cross project (elastic#137530)
ESQL: Add exponential histogram percentile function (elastic#137553)
Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
Tighten on when THROTTLE decision can be returned (elastic#136794)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
Add a test for two little known conditional processor paths (elastic#137645)
Extract a common ORIGIN constant (elastic#137612)
Remove early phase failure in batched (elastic#136889)
Returning correct index mode from get data streams api (elastic#137646)
[ML] Manage AD results indices (elastic#136065)
…-json
* upstream/main:
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
[LTR] Fix feature display order when using explain. (elastic#137671)
Remove extra RemoteClusterService instances in unit test (elastic#137647)
Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
Update bundled JDK to 25.0.1 (elastic#137640)
resolve indices for prefixed _all expressions (elastic#137330)
ESQL: Add TopN support for exponential histograms (elastic#137313)
allows field caps to be cross project (elastic#137530)
ESQL: Add exponential histogram percentile function (elastic#137553)
Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
Tighten on when THROTTLE decision can be returned (elastic#136794)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
Add a test for two little known conditional processor paths (elastic#137645)
Extract a common ORIGIN constant (elastic#137612)
Remove early phase failure in batched (elastic#136889)
Returning correct index mode from get data streams api (elastic#137646)
[ML] Manage AD results indices (elastic#136065)
Part of #137549.
Adds a new
HistogramPercentilescalar function, which extracts a given percentile from a single exponential histogram value.This functions is not intended to be user-facing, but will only be used as a surrogate to implement the existing
PERCENTILEaggregation in combination with an upcoming histogram-merge aggregation. For this reason, this function is undocumented (except for javadoc) and not registered inEsqlFunctionRegistry.We can eventually expose this function if needed, but in that case we need to go through the design discussions first.
Because this function is not user-facing, this PR does not add CSV tests yet.
CSV tests will follow when the exponential histogram
PERCENTILEis implemented via the surrogate explained above.