Skip to content

EndpointSlices not cleaned up after replica service deletion (orphaned resources) #2973

@halvaborsch

Description

@halvaborsch

Which image of the operator are you using? 1.14.0
Where do you run it - cloud or metal? Vanilla K8S/Bare metall
Are you running Postgres Operator in production? Yes
Type of issue? Bug report

Problem Description
We're experiencing a resource leak where EndpointSlices for -repl services are not being cleaned up after the parent Service is deleted. This has triggered quota alerts in our namespace.

Observed Behavior
When a PostgreSQL cluster is deleted, the operator properly deletes the replica service (*-repl)
However, the associated EndpointSlices remain in the namespace as orphaned resources
This only affects -repl services - other services (-config, master endpoint) clean up properly
The EndpointSlices have proper ownerReferences pointing to the Service:

  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Service
    name: endpoints-keys-test-repl
    uid: 4d904c2d-cfff-4ee1-bd50-3215dc16de82

Reproduction Steps

Create a Spilo PostgreSQL cluster
Delete the cluster
Check for remaining EndpointSlices:

kubectl get endpointslices -n <namespace> | grep db-repl

Orphaned EndpointSlices will remain despite parent Services being deleted

Expected Behavior
EndpointSlices should be automatically cleaned up by Kubernetes garbage collector when the parent Service is deleted (via ownerReferences).

Investigation Findings
The operator does not create an endpoint for -repl during cluster creation, only the service.
However, at the moment of cluster deletion attempts to clean this object.
From operator logs during deletion:

time="2025-10-30T11:26:09Z" level=debug msg="deleting replica endpoint" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3
time="2025-10-30T11:26:09Z" level=info msg="replica endpoint \"service-keys/endpoints-keys-test-repl\" has been deleted" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3
time="2025-10-30T11:26:09Z" level=debug msg="deleting replica service" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3
time="2025-10-30T11:26:09Z" level=info msg="replica service \"service-keys/endpoints-keys-test-repl\" has been deleted" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3

From Kubernetes API logs, the garbage collector attempts to patch the EndpointSlice but doesn't delete it:

Oct 30, 2025 @ 11:26:09.840  /apis/discovery.k8s.io/v1/namespaces/service-keys/endpointslices/endpoints-keys-test-repl-9p7pf
ResponseComplete  patch  
system:serviceaccount:kube-system:generic-garbage-collector

Hypothesis
It appears the operator may be deleting the Service with orphan propagation policy instead of background/foreground, which would explain why the garbage collector only patches but never deletes the EndpointSlices.

Impact
Resource quota exhaustion in namespaces with many demo/test databases
Manual cleanup required periodically
Affects all -repl services across multiple clusters

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions