Skip to content

Commit 7469efa

Browse files
FxKuCyberDem0n
andauthored
enhance docs on clone and restore (#1592)
* enhance docs on clone and restore * add chapter about upgrading the operator * add section for standby clusters * Update docs/administrator.md Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com> Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
1 parent 1dd0cd9 commit 7469efa

File tree

2 files changed

+86
-21
lines changed

2 files changed

+86
-21
lines changed

docs/administrator.md

Lines changed: 80 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,21 @@
33
Learn how to configure and manage the Postgres Operator in your Kubernetes (K8s)
44
environment.
55

6+
## Upgrading the operator
7+
8+
The Postgres Operator is upgraded by changing the docker image within the
9+
deployment. Before doing so, it is recommended to check the release notes
10+
for new configuration options or changed behavior you might want to reflect
11+
in the ConfigMap or config CRD. E.g. a new feature might get introduced which
12+
is enabled or disabled by default and you want to change it to the opposite
13+
with the corresponding flag option.
14+
15+
When using helm, be aware that installing the new chart will not update the
16+
`Postgresql` and `OperatorConfiguration` CRD. Make sure to update them before
17+
with the provided manifests in the `crds` folder. Otherwise, you might face
18+
errors about new Postgres manifest or configuration options being unknown
19+
to the CRD schema validation.
20+
621
## Minor and major version upgrade
722

823
Minor version upgrades for PostgreSQL are handled via updating the Spilo Docker
@@ -157,20 +172,26 @@ from numerous escape characters in the latter log entry, view it in CLI with
157172
`PodTemplate` used by the operator is yet to be updated with the default values
158173
used internally in K8s.
159174

160-
The operator also support lazy updates of the Spilo image. That means the pod
161-
template of a PG cluster's stateful set is updated immediately with the new
162-
image, but no rolling update follows. This feature saves you a switchover - and
163-
hence downtime - when you know pods are re-started later anyway, for instance
164-
due to the node rotation. To force a rolling update, disable this mode by
165-
setting the `enable_lazy_spilo_upgrade` to `false` in the operator configuration
166-
and restart the operator pod. With the standard eager rolling updates the
167-
operator checks during Sync all pods run images specified in their respective
168-
statefulsets. The operator triggers a rolling upgrade for PG clusters that
169-
violate this condition.
170-
171-
Changes in $SPILO\_CONFIGURATION under path bootstrap.dcs are ignored when
172-
StatefulSets are being compared, if there are changes under this path, they are
173-
applied through rest api interface and following restart of patroni instance
175+
The StatefulSet is replaced if the following properties change:
176+
- annotations
177+
- volumeClaimTemplates
178+
- template volumes
179+
180+
The StatefulSet is replaced and a rolling updates is triggered if the following
181+
properties differ between the old and new state:
182+
- container name, ports, image, resources, env, envFrom, securityContext and volumeMounts
183+
- template labels, annotations, service account, securityContext, affinity, priority class and termination grace period
184+
185+
Note that, changes in `SPILO_CONFIGURATION` env variable under `bootstrap.dcs`
186+
path are ignored for the diff. They will be applied through Patroni's rest api
187+
interface, following a restart of all instances.
188+
189+
The operator also support lazy updates of the Spilo image. In this case the
190+
StatefulSet is only updated, but no rolling update follows. This feature saves
191+
you a switchover - and hence downtime - when you know pods are re-started later
192+
anyway, for instance due to the node rotation. To force a rolling update,
193+
disable this mode by setting the `enable_lazy_spilo_upgrade` to `false` in the
194+
operator configuration and restart the operator pod.
174195

175196
## Delete protection via annotations
176197

@@ -667,6 +688,12 @@ if it ends up in your specified WAL backup path:
667688
envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
668689
```
669690

691+
You can also check if Spilo is able to find any backups:
692+
693+
```bash
694+
envdir "/run/etc/wal-e.d/env" wal-g backup-list
695+
```
696+
670697
Depending on the cloud storage provider different [environment variables](https://github.com/zalando/spilo/blob/master/ENVIRONMENT.rst)
671698
have to be set for Spilo. Not all of them are generated automatically by the
672699
operator by changing its configuration. In this case you have to use an
@@ -734,8 +761,15 @@ WALE_S3_ENDPOINT='https+path://s3.eu-central-1.amazonaws.com:443'
734761
WALE_S3_PREFIX=$WAL_S3_BUCKET/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}
735762
```
736763

737-
If the prefix is not specified Spilo will generate it from `WAL_S3_BUCKET`.
738-
When the `AWS_REGION` is set `AWS_ENDPOINT` and `WALE_S3_ENDPOINT` are
764+
The operator sets the prefix to an empty string so that spilo will generate it
765+
from the configured `WAL_S3_BUCKET`.
766+
767+
:warning: When you overwrite the configuration by defining `WAL_S3_BUCKET` in
768+
the [pod_environment_configmap](#custom-pod-environment-variables) you have
769+
to set `WAL_BUCKET_SCOPE_PREFIX = ""`, too. Otherwise Spilo will not find
770+
the physical backups on restore (next chapter).
771+
772+
When the `AWS_REGION` is set, `AWS_ENDPOINT` and `WALE_S3_ENDPOINT` are
739773
generated automatically. `WALG_S3_PREFIX` is identical to `WALE_S3_PREFIX`.
740774
`SCOPE` is the Postgres cluster name.
741775

@@ -874,6 +908,36 @@ on one of the other running instances (preferably replicas if they do not lag
874908
behind). You can test restoring backups by [cloning](user.md#how-to-clone-an-existing-postgresql-cluster)
875909
clusters.
876910

911+
If you need to provide a [custom clone environment](#custom-pod-environment-variables)
912+
copy existing variables about your setup (backup location, prefix, access
913+
keys etc.) and prepend the `CLONE_` prefix to get them copied to the correct
914+
directory within Spilo.
915+
916+
```yaml
917+
apiVersion: v1
918+
kind: ConfigMap
919+
metadata:
920+
name: postgres-pod-config
921+
data:
922+
AWS_REGION: "eu-west-1"
923+
AWS_ACCESS_KEY_ID: "****"
924+
AWS_SECRET_ACCESS_KEY: "****"
925+
...
926+
CLONE_AWS_REGION: "eu-west-1"
927+
CLONE_AWS_ACCESS_KEY_ID: "****"
928+
CLONE_AWS_SECRET_ACCESS_KEY: "****"
929+
...
930+
```
931+
932+
### Standby clusters
933+
934+
The setup for [standby clusters](user.md#setting-up-a-standby-cluster) is very
935+
similar to cloning. At the moment, the operator only allows for streaming from
936+
the S3 WAL archive of the master specified in the manifest. Like with cloning,
937+
if you are using [additional environment variables](#custom-pod-environment-variables)
938+
to access your backup location you have to copy those variables and prepend the
939+
`STANDBY_` prefix for Spilo to find the backups and WAL files to stream.
940+
877941
## Logical backups
878942

879943
The operator can manage K8s cron jobs to run logical backups (SQL dumps) of

docs/user.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -733,20 +733,21 @@ spec:
733733
uid: "efd12e58-5786-11e8-b5a7-06148230260c"
734734
cluster: "acid-batman"
735735
timestamp: "2017-12-19T12:40:33+01:00"
736+
s3_wal_path: "s3://<bucketname>/spilo/<source_db_cluster>/<UID>/wal/<PGVERSION>"
736737
```
737738

738739
Here `cluster` is a name of a source cluster that is going to be cloned. A new
739740
cluster will be cloned from S3, using the latest backup before the `timestamp`.
740741
Note, that a time zone is required for `timestamp` in the format of +00:00 which
741-
is UTC. The `uid` field is also mandatory. The operator will use it to find a
742-
correct key inside an S3 bucket. You can find this field in the metadata of the
743-
source cluster:
742+
is UTC. You can specify the `s3_wal_path` of the source cluster or let the
743+
operator try to find it based on the configured `wal_[s3|gs]_bucket` and the
744+
specified `uid`. You can find the UID of the source cluster in its metadata:
744745

745746
```yaml
746747
apiVersion: acid.zalan.do/v1
747748
kind: postgresql
748749
metadata:
749-
name: acid-test-cluster
750+
name: acid-batman
750751
uid: efd12e58-5786-11e8-b5a7-06148230260c
751752
```
752753

@@ -799,7 +800,7 @@ no statefulset will be created.
799800
```yaml
800801
spec:
801802
standby:
802-
s3_wal_path: "s3 bucket path to the master"
803+
s3_wal_path: "s3://<bucketname>/spilo/<source_db_cluster>/<UID>/wal/<PGVERSION>"
803804
```
804805

805806
At the moment, the operator only allows to stream from the WAL archive of the

0 commit comments

Comments
 (0)