You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/dynamodb-opensearch-zetl/integrations/index.en.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,4 +4,6 @@ menuTitle: "Integrations"
4
4
date: 2024-02-23T00:00:00-00:00
5
5
weight: 30
6
6
---
7
-
In this section, you will configure integrations between services. You'll first set up ML and Pipeline connectors in OpenSearch Service followed by a zero ETL connector to move data written to DynamoDB to OpenSearch. Once these integrations are set up, you'll be able to write records to DynamoDB as your source of truth and then automatically have that data available to query in other services.
7
+
In this section, you will configure integrations between services. First you will set up machine learning (ML) and Pipeline connectors in OpenSearch Service. Then you will setup a zero-ETL connector to move data stored in DynamoDB into OpenSearch for indexing. Once both these integrations are set up, you'll be able to write records to DynamoDB as your source of truth and then automatically have that data available to query in the other services.
Copy file name to clipboardExpand all lines: content/dynamodb-opensearch-zetl/integrations/os-connectors.en.md
+33-24Lines changed: 33 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,19 +4,21 @@ menuTitle: "Load DynamoDB Data"
4
4
date: 2024-02-23T00:00:00-00:00
5
5
weight: 20
6
6
---
7
-
In this section you'll configure ML and Pipeline connectors in OpenSearch Service. These configurations are set up by a series of POST and PUT requests that are authenticated with AWS Signature Version 4 (sig-v4). Sigv4 is the standard authentication mechanism used by AWS services. While in most cases an SDK abstracts away sig-v4 but in this case we will be building the requests ourselves with curl.
7
+
In this section you'll configure OpenSearch so it will preprocess and enrich data as it is written to its indexes, by connecting to an externally hosted machine learning embeddings model. This is a simpler application design than having your application write the embeddings as an attribute in the Item within DynamoDB. Instead, the data is kept as text in DynamoDB and when it arrives in OpenSearch, OpenSearch will connect out using Bedrock to generate and store the embeddings.
8
8
9
-
Building a sig-v4 signed request requires a session token, access key, and secret access key. You'll first retrieve these from your Cloud9 Instance metadata with the provided "credentials.sh" script which exports required values to environmental variables. In the following steps, you'll also export other values to environmental variables to allow for easy substitution into listed commands.
9
+
More information on this design can be around at [ML and Pipeline connectors in OpenSearch Service](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/index/).
10
10
11
-
1. Run the credentials.sh script to retrieve and export credentials. These credentials will be used to sign API requests to the OpenSearch cluster. Note the leading "." before "./credentials.sh", this must be included to ensure that the exported credentials are available in the currently running shell.
12
-
```bash
13
-
. ./credentials.sh
14
-
```
15
-
1. Next, export an environmental variable with the OpenSearch endpoint URL. This URL is listed in the CloudFormation Stack Outputs tab as "OSDomainEndpoint". This variable will be used in subsequent commands.
1. Execute the following curl command to create the OpenSearch ML model connector.
11
+
We will perform these configurations using a series of POST and PUT requests made to OpenSearch endpoints. The calls will be made using the IAM role that was previously mapped to the OpenSearch "all_access" role.
12
+
13
+
The calls are authenticated with AWS Signature Version 4 (sig-v4). Sigv4 is the standard authentication mechanism used by AWS services. In most cases an SDK abstracts away the sig-v4 details, but in this case we will be building the requests ourselves with curl.
14
+
15
+
Building a sig-v4 signed request requires a session token, access key, and secret access key. These are available to your VS Code Instance as metadata. These values were retrieved by the "credentials.sh" script you ran during setup. It pulled the required values and then exported them as environmental variables for your use. In the following steps, you'll also export other values to environmental variables to allow for easy substitution into the various commands.
16
+
17
+
If any of the following commands fail, try re-running the credentials.sh script in the :link[Environment Setup]{href="/setup/step1"} step.
18
+
19
+
As you run these steps, be very careful about typos. Also remember the Copy icon in the corner.
20
+
21
+
1. Execute the following curl command to **create the OpenSearch ML model connector**. You can use ML connectors to connect OpenSearch Service to a model hosted on bedrock or a model hosted on a third party platform. Here we are connecting to the Titan embedding model hosted on bedrock.
@@ -53,11 +55,11 @@ Building a sig-v4 signed request requires a session token, access key, and secre
53
55
]
54
56
}'
55
57
```
56
-
1. Note the "connector_id" returned in the previous command. Export it to an environmental variable forconvenient substitutionin future commands.
58
+
1. Note the **"connector_id"** returned in the previous command. **Export it to an environmental variable**forconvenient substitutionin future commands.
57
59
```bash
58
60
export CONNECTOR_ID='xxxxxxxxxxxxxx'
59
61
```
60
-
1. Run the next curl command to create the model group.
62
+
1. Run the next curl command to **create the model group**.
With the model created, OpenSearch can now use Bedrock's Titan embedding model forprocessing text. An embeddings model is a type of machine learning model that transforms high-dimensional data (like text or images) into lower-dimensional vectors, known as embeddings. These vectors capture the semantic or contextual relationships between the data pointsin a more compact, dense representation.
118
+
With the model created, **OpenSearch can now use Bedrock's Titan embedding model**for processing text.
115
119
116
-
The embeddings represent the semantic meaning of the input data, in this case product descriptions. Words with similar meanings are represented by vectors that are close to each other inthe vector space. For example, the vectors for "sturdy" and "strong" would be closer to each other than to "warm".
120
+
**An embeddings model** is a type of machine learning model that transforms high-dimensional data (like text or images) into lower-dimensional vectors, known as embeddings. These vectors capture the semantic or contextual relationships between the data points in a more compact, dense representation.
117
121
118
-
1. Now we can test the model. If you recieve results back with a "200" status code, everything is working properly.
122
+
The embeddings represent the semantic meaning of the input data, in this case product descriptions. Words with similar meanings are represented by vectors that are close to each other in the vector space. For example, the vectors for "sturdy" and "strong" would be closer to each other than to "stringy".
123
+
124
+
1. Now we can *test the model*. With the below command, we are sending some text to OpenSearch and asking it to return the Vector embeddings using the configured "MODEL_ID". If you receive results back with a "200" status code, everything is working properly.
@@ -130,7 +136,9 @@ Building a sig-v4 signed request requires a session token, access key, and secre
130
136
}
131
137
}'
132
138
```
133
-
1. Next, we'll create the Details table mapping pipeline.
139
+
::alert[_Output will have vector embeddings as well. So, try to find the statuscode variable to check the status._]
140
+
141
+
1. Next, we'll create the **ProductDetails table mapping ingest pipeline**. An **ingest pipeline** is a sequence of processors that are applied to documents as they are ingested into an index. This uses the configured model to generate the embeddings. Once this is created, as new data arrives into OpenSearch from the DynamoDB "ProductDetails" table the embeddings will be created and indexed.
@@ -158,7 +166,8 @@ Building a sig-v4 signed request requires a session token, access key, and secre
158
166
]
159
167
}'
160
168
```
161
-
1. Followed by the Reviews table mapping pipeline. We won't use this in this version of the lab, but in a real system you will want to keep your embeddings indexes separate for different queries.
169
+
::alert[_Here, we have created the processor which is going to take the source and create embedding which will be under 'product_embedding'_]
170
+
1. Followed by the **Reviews table mapping pipeline**. We won't use this in this version of the lab, but in a real system you will want to keep your embeddings indexes separate for different queries. Note the different endpoint pipeline path.
Copy file name to clipboardExpand all lines: content/dynamodb-opensearch-zetl/integrations/zetl.en.md
+80-96Lines changed: 80 additions & 96 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,112 +6,96 @@ weight: 30
6
6
---
7
7
Amazon DynamoDB offers a zero-ETL integration with Amazon OpenSearch Service through the DynamoDB plugin for OpenSearch Ingestion. Amazon OpenSearch Ingestion offers a fully managed, no-code experience for ingesting data into Amazon OpenSearch Service.
8
8
9
-
1. Open [OpenSearch Service Ingestion Pipelines](https://us-west-2.console.aws.amazon.com/aos/home?region=us-west-2#opensearch/ingestion-pipelines)
1. Name your pipeline, and include the following for your pipeline configuration. The configuration contains multiple values that need to be updated. The needed values are provided in the CloudFormation Stack Outputs as "Region", "Role", "S3Bucket", "DdbTableArn", and "OSDomainEndpoint".
15
-
```yaml
16
-
version: "2"
17
-
dynamodb-pipeline:
18
-
source:
19
-
dynamodb:
20
-
acknowledgments: true
21
-
tables:
22
-
# REQUIRED: Supply the DynamoDB table ARN
23
-
- table_arn: "{DDB_TABLE_ARN}"
24
-
stream:
25
-
start_position: "LATEST"
26
-
export:
27
-
# REQUIRED: Specify the name of an existing S3 bucket for DynamoDB to write export data files to
28
-
s3_bucket: "{S3BUCKET}"
29
-
# REQUIRED: Specify the region of the S3 bucket
30
-
s3_region: "{REGION}"
31
-
# Optionally set the name of a prefix that DynamoDB export data files are written to in the bucket.
32
-
s3_prefix: "pipeline"
33
-
aws:
34
-
# REQUIRED: Provide the role to assume that has the necessary permissions to DynamoDB, OpenSearch, and S3.
35
-
sts_role_arn: "{ROLE}"
36
-
# REQUIRED: Provide the region
37
-
region: "{REGION}"
38
-
sink:
39
-
- opensearch:
40
-
hosts:
41
-
# REQUIRED: Provide an AWS OpenSearch endpoint, including https://
1. **Wait until the pipeline has finished creating**. This will take 5 minutes or more.
79
+
10.**Wait until the pipeline has finished creating and status is "Active"**. This will take 5 minutes or more.
103
80
104
81
105
-
After the pipeline is created, it will take some additional time for the initial export from DynamoDB and import into OpenSearch Service. After you have waited several more minutes, you can check if items have replicated into OpenSearch by making a query in Dev Tools in the OpenSearch Dashboards.
82
+
After the pipeline is created, it will take some additional time for the initial export from DynamoDB and import into OpenSearch Service. After you have waited several more minutes, you can check if items have replicated into OpenSearch by making a query using the OpenSearch Dashboards feature called Dev Tools.
106
83
107
-
To open Dev Tools, click on the menu in the top left of OpenSearch Dashboards, scroll down to the `Management` section, then click on `Dev Tools`. Enter the following query in the left pane, then click the "play" arrow.
84
+
- To open Dev Tools, click on the menu in the top left of OpenSearch Dashboards, scroll down to the `Management` section, then click on `Dev Tools`.
85
+
86
+

87
+
88
+
- Enter the following query in the left pane, then click the "play" arrow to execute it.
108
89
109
90
```text
110
91
GET /product-details-index-en/_search
111
92
```
112
-
You may encounter a few types of results:
113
-
- If you see a 404 error of type *index_not_found_exception*, then you need to wait until the pipeline is `Active`. Once it is, this exception will go away.
114
-
- If your query does not have results, wait a few more minutes for the initial replication to finish and try again.
93
+
94
+
- The output will the list of documents that have all the fields mentioned under the zero-ETL pipeline mapping.
95
+
96
+
You may encounter a few types of results:
97
+
- If you see a 404 error of type *index_not_found_exception*, then you need to wait until the pipeline is `Active`. Once it is, this exception will go away.
98
+
- If your query does not have results, wait a few more minutes for the initial replication to finish and try again.
0 commit comments