-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL - vector similarity pushdown follow up #137564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
carlosdelest
wants to merge
10
commits into
elastic:main
Choose a base branch
from
carlosdelest:non-issue/esql-vector-similarity-pushdown-follow-up
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+303
−170
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
092d2a2
Implement single pass, similar to ResolveUnionTypes
b5b1176
Add test for replacing duplicates in multiple commands
516b4f9
Calculate name as part of the BlockLoaderFunctionConfig, so it can en…
9466e31
Add javadoc
207075a
Implement canonicalize() and CanonicalizeVectorSimilarityFunctions
d154931
Add randomized testing
7e59a92
[CI] Auto commit changes from spotless
ba8bcb0
Merge branch 'main' into non-issue/esql-vector-similarity-pushdown-fo…
carlosdelest 8ee21af
Fix test - dimensions checking must not use a null vector, or it won'…
9c16f79
Merge remote-tracking branch 'carlosdelest/non-issue/esql-vector-simi…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
27 changes: 27 additions & 0 deletions
27
...asticsearch/xpack/esql/optimizer/rules/logical/CanonicalizeVectorSimilarityFunctions.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| package org.elasticsearch.xpack.esql.optimizer.rules.logical; | ||
|
|
||
| import org.elasticsearch.xpack.esql.core.expression.Expression; | ||
| import org.elasticsearch.xpack.esql.expression.function.vector.VectorSimilarityFunction; | ||
| import org.elasticsearch.xpack.esql.optimizer.LogicalOptimizerContext; | ||
|
|
||
| /** | ||
| * Ensures that vector similarity functions are in their canonical form, with literals to the right. | ||
| */ | ||
| public class CanonicalizeVectorSimilarityFunctions extends OptimizerRules.OptimizerExpressionRule<VectorSimilarityFunction> { | ||
|
|
||
| public CanonicalizeVectorSimilarityFunctions() { | ||
| super(OptimizerRules.TransformDirection.UP); | ||
| } | ||
|
|
||
| @Override | ||
| protected Expression rule(VectorSimilarityFunction vectorSimilarityFunction, LogicalOptimizerContext ctx) { | ||
| return vectorSimilarityFunction.canonical(); | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,7 @@ | |
|
|
||
| package org.elasticsearch.xpack.esql.optimizer.rules.logical.local; | ||
|
|
||
| import org.elasticsearch.index.mapper.MappedFieldType; | ||
| import org.elasticsearch.xpack.esql.core.expression.Attribute; | ||
| import org.elasticsearch.xpack.esql.core.expression.AttributeSet; | ||
| import org.elasticsearch.xpack.esql.core.expression.Expression; | ||
|
|
@@ -18,13 +19,14 @@ | |
| import org.elasticsearch.xpack.esql.core.type.FunctionEsField; | ||
| import org.elasticsearch.xpack.esql.expression.function.vector.VectorSimilarityFunction; | ||
| import org.elasticsearch.xpack.esql.optimizer.LocalLogicalOptimizerContext; | ||
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.OptimizerRules; | ||
| import org.elasticsearch.xpack.esql.plan.logical.Aggregate; | ||
| import org.elasticsearch.xpack.esql.plan.logical.EsRelation; | ||
| import org.elasticsearch.xpack.esql.plan.logical.Eval; | ||
| import org.elasticsearch.xpack.esql.plan.logical.Filter; | ||
| import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan; | ||
| import org.elasticsearch.xpack.esql.plan.logical.local.EsqlProject; | ||
| import org.elasticsearch.xpack.esql.rule.ParameterizedRule; | ||
| import org.elasticsearch.xpack.esql.stats.SearchStats; | ||
|
|
||
| import java.util.ArrayList; | ||
| import java.util.HashMap; | ||
|
|
@@ -38,30 +40,39 @@ | |
| * the similarity function during value loading, when one side of the function is a literal. | ||
| * It also adds the new field function attribute to the EsRelation output, and adds a projection after it to remove it from the output. | ||
| */ | ||
| public class PushDownVectorSimilarityFunctions extends OptimizerRules.ParameterizedOptimizerRule< | ||
| LogicalPlan, | ||
| LocalLogicalOptimizerContext> { | ||
| public class PushDownVectorSimilarityFunctions extends ParameterizedRule<LogicalPlan, LogicalPlan, LocalLogicalOptimizerContext> { | ||
|
|
||
| public PushDownVectorSimilarityFunctions() { | ||
| super(OptimizerRules.TransformDirection.DOWN); | ||
| @Override | ||
| public LogicalPlan apply(LogicalPlan plan, LocalLogicalOptimizerContext context) { | ||
| Map<Attribute.IdIgnoringWrapper, Attribute> addedAttrs = new HashMap<>(); | ||
| return plan.transformUp(LogicalPlan.class, p -> doRule(p, context.searchStats(), addedAttrs)); | ||
| } | ||
|
|
||
| @Override | ||
| protected LogicalPlan rule(LogicalPlan plan, LocalLogicalOptimizerContext context) { | ||
| private LogicalPlan doRule(LogicalPlan plan, SearchStats searchStats, Map<Attribute.IdIgnoringWrapper, Attribute> addedAttrs) { | ||
| // Collect field attributes from previous runs | ||
| int originalAddedAttrsSize = addedAttrs.size(); | ||
| if (plan instanceof EsRelation rel) { | ||
| addedAttrs.clear(); | ||
| for (Attribute attr : rel.output()) { | ||
| if (attr instanceof FieldAttribute fa && fa.field() instanceof FunctionEsField) { | ||
| addedAttrs.put(fa.ignoreId(), fa); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (plan instanceof Eval || plan instanceof Filter || plan instanceof Aggregate) { | ||
| Map<Attribute.IdIgnoringWrapper, Attribute> addedAttrs = new HashMap<>(); | ||
| LogicalPlan transformedPlan = plan.transformExpressionsOnly( | ||
| VectorSimilarityFunction.class, | ||
| similarityFunction -> replaceFieldsForFieldTransformations(similarityFunction, addedAttrs, context) | ||
| similarityFunction -> replaceFieldsForFieldTransformations(similarityFunction, addedAttrs, searchStats) | ||
| ); | ||
|
|
||
| if (addedAttrs.isEmpty()) { | ||
| // No fields were added, return the original plan | ||
| if (addedAttrs.size() == originalAddedAttrsSize) { | ||
| return plan; | ||
| } | ||
|
|
||
| List<Attribute> previousAttrs = transformedPlan.output(); | ||
| // Transforms EsRelation to extract the new attribute | ||
|
|
||
| // Transforms EsRelation to extract the new attributes | ||
| List<Attribute> addedAttrsList = addedAttrs.values().stream().toList(); | ||
| transformedPlan = transformedPlan.transformDown(EsRelation.class, esRelation -> { | ||
| AttributeSet updatedOutput = esRelation.outputSet().combine(AttributeSet.of(addedAttrsList)); | ||
|
|
@@ -83,59 +94,44 @@ protected LogicalPlan rule(LogicalPlan plan, LocalLogicalOptimizerContext contex | |
| private static Expression replaceFieldsForFieldTransformations( | ||
| VectorSimilarityFunction similarityFunction, | ||
| Map<Attribute.IdIgnoringWrapper, Attribute> addedAttrs, | ||
| LocalLogicalOptimizerContext context | ||
| SearchStats searchStats | ||
| ) { | ||
| // Only replace if exactly one side is a literal and the other a field attribute | ||
| if ((similarityFunction.left() instanceof Literal ^ similarityFunction.right() instanceof Literal) == false) { | ||
| return similarityFunction; | ||
| } | ||
| // Only replace if it consists of a literal and the other a field attribute. | ||
| // CanonicalizeVectorSimilarityFunctions ensures that if there is a literal, it will be on the right side. | ||
| if (similarityFunction.left() instanceof FieldAttribute fieldAttr && similarityFunction.right() instanceof Literal) { | ||
|
|
||
| Literal literal = (Literal) (similarityFunction.left() instanceof Literal ? similarityFunction.left() : similarityFunction.right()); | ||
| FieldAttribute fieldAttr = null; | ||
| if (similarityFunction.left() instanceof FieldAttribute fa) { | ||
| fieldAttr = fa; | ||
| } else if (similarityFunction.right() instanceof FieldAttribute fa) { | ||
| fieldAttr = fa; | ||
| } | ||
| // We can push down also for doc values, requires handling that case on the field mapper | ||
| if (fieldAttr == null || context.searchStats().isIndexed(fieldAttr.fieldName()) == false) { | ||
| return similarityFunction; | ||
| } | ||
| // We can push down also for doc values, requires handling that case on the field mapper | ||
| if (searchStats.isIndexed(fieldAttr.fieldName()) == false) { | ||
| return similarityFunction; | ||
| } | ||
|
|
||
| @SuppressWarnings("unchecked") | ||
| List<Number> vectorList = (List<Number>) literal.value(); | ||
| float[] vectorArray = new float[vectorList.size()]; | ||
| int arrayHashCode = 0; | ||
| for (int i = 0; i < vectorList.size(); i++) { | ||
| vectorArray[i] = vectorList.get(i).floatValue(); | ||
| arrayHashCode = 31 * arrayHashCode + Float.floatToIntBits(vectorArray[i]); | ||
| } | ||
| // Change the similarity function to a reference of a transformation on the field | ||
| MappedFieldType.BlockLoaderFunctionConfig blockLoaderFunctionConfig = similarityFunction.getBlockLoaderFunctionConfig(); | ||
| FunctionEsField functionEsField = new FunctionEsField( | ||
| fieldAttr.field(), | ||
| similarityFunction.dataType(), | ||
| blockLoaderFunctionConfig | ||
| ); | ||
| var name = rawTemporaryName(fieldAttr.name(), blockLoaderFunctionConfig.name()); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rely on |
||
| var newFunctionAttr = new FieldAttribute( | ||
| fieldAttr.source(), | ||
| fieldAttr.parentName(), | ||
| fieldAttr.qualifier(), | ||
| name, | ||
| functionEsField, | ||
| fieldAttr.nullable(), | ||
| new NameId(), | ||
| true | ||
| ); | ||
| Attribute.IdIgnoringWrapper key = newFunctionAttr.ignoreId(); | ||
| if (addedAttrs.containsKey(key)) { | ||
| return addedAttrs.get(key); | ||
| } | ||
|
|
||
| // Change the similarity function to a reference of a transformation on the field | ||
| FunctionEsField functionEsField = new FunctionEsField( | ||
| fieldAttr.field(), | ||
| similarityFunction.dataType(), | ||
| similarityFunction.getBlockLoaderFunctionConfig() | ||
| ); | ||
| var name = rawTemporaryName(fieldAttr.name(), similarityFunction.nodeName(), String.valueOf(arrayHashCode)); | ||
| // TODO: Check if exists before adding, retrieve the previous one | ||
| var newFunctionAttr = new FieldAttribute( | ||
| fieldAttr.source(), | ||
| fieldAttr.parentName(), | ||
| fieldAttr.qualifier(), | ||
| name, | ||
| functionEsField, | ||
| fieldAttr.nullable(), | ||
| new NameId(), | ||
| true | ||
| ); | ||
| Attribute.IdIgnoringWrapper key = newFunctionAttr.ignoreId(); | ||
| if (addedAttrs.containsKey(key)) { | ||
| ; | ||
| return addedAttrs.get(key); | ||
| addedAttrs.put(key, newFunctionAttr); | ||
| return newFunctionAttr; | ||
| } | ||
|
|
||
| addedAttrs.put(key, newFunctionAttr); | ||
| return newFunctionAttr; | ||
| return similarityFunction; | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had assumed there was already a rule for this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not. Given that we don't canonicalize other expressions, I'm thinking on removing this rule and getting back to the previous code for checking field and literal - this is adding complexity and coupling between the two rules.
WDYT?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have LiteralsOnTheRight rule. But it only works for BinaryOperator. It seems VectorSimilarityFunction is not BinaryOperator and might be hard to make it one.
Alternatively, we will swap left and right in the surrogate method for spacial functions. Then you don't need a new rule and the code is much simpler. See SpatialContains.surrogate() for an example how to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't
SurrogateExpressionsused in the context of aggregations? Would it make sense to make theVectorSimilarityFunctions aSurrogateExpression?I don't see any practical reason for doing that other than simplifying the check that is done in order to push down the vector similarity functions. I think it does not pay off - we're expecting a rule to act in order to be able to simplify an expression that should be able to understand when it should be pushable or not.