You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add max_file_size support to AzureBlob source (#1259)
* feat: add max_file_size support to AzureBlob source
Add optional max_file_size parameter to filter files by size in both
list() and get_value() APIs. Files exceeding the limit are treated as
non-existent. Closes#1251
* fix: correct type comparison for blob content_length
Azure blob content_length is u64, not Option<u64>, so compare directly
with max_size cast to u64 instead of unwrapping Option.
---------
Co-authored-by: prabhath004 <ppalakur@gmu.edu>
Copy file name to clipboardExpand all lines: docs/docs/sources/azureblob.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,15 +63,19 @@ The spec takes the following fields:
63
63
*`excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`.
64
64
Any file or directory matching these patterns will be excluded even if they match `included_patterns`.
65
65
If not specified, no files will be excluded.
66
-
*`sas_token` (`cocoindex.TransientAuthEntryReference[str]`, optional): a SAS token for authentication.
67
-
*`account_access_key` (`cocoindex.TransientAuthEntryReference[str]`, optional): an account access key for authentication.
68
66
69
67
:::info
70
68
71
69
`included_patterns` and `excluded_patterns` are using Unix-style glob syntax. See [globset syntax](https://docs.rs/globset/latest/globset/index.html#syntax) for the details.
72
70
73
71
:::
74
72
73
+
*`max_file_size` (`int`, optional): if provided, files exceeding this size in bytes will be treated as non-existent and skipped during processing.
74
+
This is useful to avoid processing large files that are not relevant to your use case, such as videos or backups.
75
+
If not specified, no size limit is applied.
76
+
*`sas_token` (`cocoindex.TransientAuthEntryReference[str]`, optional): a SAS token for authentication.
77
+
*`account_access_key` (`cocoindex.TransientAuthEntryReference[str]`, optional): an account access key for authentication.
78
+
75
79
### Schema
76
80
77
81
The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
0 commit comments