Problem
CreateMultiple, UpdateMultiple, and UpsertMultiple each send all records in a single POST regardless of count. There is no client-side chunking.
_create_multiple (data/_odata.py:316-376) — builds one {"Targets": [...]} payload with every record and POSTs it.
_update_multiple (data/_odata.py:656-697) — same pattern.
_upsert_multiple (data/_odata.py:440-493) — same pattern.
Dataverse has a server-side limit (typically 1,000 records per *Multiple call). Sending more can result in 400/413 errors or timeouts. Today, callers must chunk manually in their scripts. The SDK should handle this internally.
Proposed changes
1. Client-side batching (correctness fix)
Split large record lists into 1,000-record chunks and send each as a separate POST. This is the minimum viable fix.
# Pseudocode for _create_multiple with batching
BATCH_SIZE = 1000
def _create_multiple(self, entity_set, table_schema_name, records):
all_ids = []
for i in range(0, len(records), BATCH_SIZE):
chunk = records[i:i + BATCH_SIZE]
ids = self._create_multiple_batch(entity_set, table_schema_name, chunk)
all_ids.extend(ids)
return all_ids
Atomicity trade-off: Today a single POST is atomic (all-or-nothing). Splitting into batches means partial success is possible — batch 1 succeeds, batch 2 fails, leaving the caller with a partial import. This should be clearly documented. Callers who need atomicity should limit their input to <=1000 records.
2. Optional concurrent batch dispatch (performance, follow-on)
After batching exists, add an opt-in max_workers parameter to dispatch batches concurrently via concurrent.futures.ThreadPoolExecutor (stdlib, no new dependency).
def create(self, table, data, *, max_workers=1):
# max_workers=1 (default) = sequential, identical to today
# max_workers=4 = 4 concurrent batch POSTs
Default must be 1 (sequential) to avoid any regression:
- No extra threads on slow machines
- No extra memory overhead
- No concurrent request spike hitting Dataverse rate limits
- Identical behavior to today unless user explicitly opts in
When max_workers > 1:
- Uses ThreadPoolExecutor (~8MB stack per thread, bounded by
max_workers)
- Respects 429 (rate limit) responses — backs off all workers
- Connection pooling via existing
_HttpClient session support
3. Page pre-fetching in _get_multiple (separate enhancement)
_get_multiple (data/_odata.py:821-826) fetches pages sequentially in a while next_link loop. Each page blocks until complete before the next is requested.
Pre-fetching 1 page ahead while the caller processes the current page would overlap I/O with processing:
def _get_multiple(self, ..., prefetch_pages=0):
# prefetch_pages=0 (default) = sequential, identical to today
# prefetch_pages=1 = fetch next page while caller processes current
Default must be 0 to avoid buffering extra pages in memory. A single pre-fetched page for a 5,000-record default page size is ~5-20MB depending on column count — acceptable when opted in, but shouldn't be forced.
4. Picklist cache warming (separate enhancement)
_optionset_map (data/_odata.py:1219-1331) makes 2 HTTP calls per string field on cache miss. The cache works well for subsequent records, but the first record with N string fields triggers 2N sequential HTTP calls.
A warm_picklist_cache(table) method that fetches all picklist metadata for a table in a single request would eliminate the cold-start penalty for bulk operations.
APIs NOT proposed for parallelism
| API |
Why not |
Chunked file upload (_upload.py:117-195) |
Protocol is sequential by design — uses session token with Content-Range headers, each chunk returns 206 before next can be sent |
Column creation (_odata.py:1712-1762) |
Dataverse metadata locks on the same table can cause conflicts with concurrent POSTs |
Column deletion (_odata.py:1764-1831) |
Same metadata lock concern |
Relationship creation (_relationships.py) |
Same metadata lock concern |
BulkDelete (_odata.py:548-618) |
Already async server-side; splitting into concurrent jobs adds complexity with minimal benefit |
Context
Identified during end-to-end validation of a 21-table dataset import. The agent-generated script had to implement its own chunking (chunk_size=1000) because the SDK doesn't handle it. Client-side batching should be an SDK responsibility, not something every caller reinvents.
Problem
CreateMultiple,UpdateMultiple, andUpsertMultipleeach send all records in a single POST regardless of count. There is no client-side chunking._create_multiple(data/_odata.py:316-376) — builds one{"Targets": [...]}payload with every record and POSTs it._update_multiple(data/_odata.py:656-697) — same pattern._upsert_multiple(data/_odata.py:440-493) — same pattern.Dataverse has a server-side limit (typically 1,000 records per
*Multiplecall). Sending more can result in 400/413 errors or timeouts. Today, callers must chunk manually in their scripts. The SDK should handle this internally.Proposed changes
1. Client-side batching (correctness fix)
Split large record lists into 1,000-record chunks and send each as a separate POST. This is the minimum viable fix.
Atomicity trade-off: Today a single POST is atomic (all-or-nothing). Splitting into batches means partial success is possible — batch 1 succeeds, batch 2 fails, leaving the caller with a partial import. This should be clearly documented. Callers who need atomicity should limit their input to <=1000 records.
2. Optional concurrent batch dispatch (performance, follow-on)
After batching exists, add an opt-in
max_workersparameter to dispatch batches concurrently viaconcurrent.futures.ThreadPoolExecutor(stdlib, no new dependency).Default must be 1 (sequential) to avoid any regression:
When
max_workers > 1:max_workers)_HttpClientsession support3. Page pre-fetching in
_get_multiple(separate enhancement)_get_multiple(data/_odata.py:821-826) fetches pages sequentially in awhile next_linkloop. Each page blocks until complete before the next is requested.Pre-fetching 1 page ahead while the caller processes the current page would overlap I/O with processing:
Default must be 0 to avoid buffering extra pages in memory. A single pre-fetched page for a 5,000-record default page size is ~5-20MB depending on column count — acceptable when opted in, but shouldn't be forced.
4. Picklist cache warming (separate enhancement)
_optionset_map(data/_odata.py:1219-1331) makes 2 HTTP calls per string field on cache miss. The cache works well for subsequent records, but the first record with N string fields triggers 2N sequential HTTP calls.A
warm_picklist_cache(table)method that fetches all picklist metadata for a table in a single request would eliminate the cold-start penalty for bulk operations.APIs NOT proposed for parallelism
_upload.py:117-195)Content-Rangeheaders, each chunk returns 206 before next can be sent_odata.py:1712-1762)_odata.py:1764-1831)_relationships.py)_odata.py:548-618)Context
Identified during end-to-end validation of a 21-table dataset import. The agent-generated script had to implement its own chunking (
chunk_size=1000) because the SDK doesn't handle it. Client-side batching should be an SDK responsibility, not something every caller reinvents.