You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow JSONLines files as input for CLI create-embeddings
Why these changes are being introduced:
Initially, the CLI command create-embeddings only supported reading input records
from the TIMDEX dataset via TDA. While this is likely the way we'll get input
records, supporting a JSONLines file as input is helpful for testing.
How this addresses that need:
* Adds a new --input-jsonl argument that reads a JSONLines file and uses
those rows as input for creating embeddings.
* Args --dataset-location and --run-id are required when --input-jsonl
is not set.
Side effects of this change:
* None
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-137
{"timdex_record_id": "record:1", "run_id": "abc123", "run_record_offset": 0, "transformed_record": "{\"title\":\"Record 1\",\"description\":\"This is a record about coffee in the mountains.\"}"}
2
+
{"timdex_record_id": "record:2", "run_id": "abc123", "run_record_offset": 1, "transformed_record": "{\"title\":\"Record 2\",\"description\":\"Sometimes poetry is made accidentally by the fabrication of metadata.\"}"}
3
+
{"timdex_record_id": "record:3", "run_id": "abc123", "run_record_offset": 2, "transformed_record": "{\"title\":\"Record 3\",\"description\":\"This is an oddball record, meant to evoke the peculiar nature of mathematics.\"}"}
0 commit comments