Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk> # Conflicts: # vortex-array/public-api.lock # vortex-array/src/array/vtable/dyn_.rs # vortex-array/src/array/vtable/mod.rs # vortex-array/src/arrays/bool/array.rs # vortex-array/src/arrays/fixed_size_list/compute/slice.rs # vortex-layout/src/layouts/flat/reader.rs # vortex-layout/src/layouts/flat/writer.rs # vortex-layout/src/segments/cache.rs
The lazy buffer path was issuing separate request_ranges() calls per buffer within a segment, bypassing SharedSegmentSource deduplication and SegmentCache. This caused N I/O requests per segment instead of 1, with catastrophic impact on S3 (2x slower on some queries). Key changes: - materialize_selection: Selection::Range now uses request() + local slice instead of request_ranges(), benefiting from caching and deduplication when multiple buffers share the same segment - Replace resolve_filter() with effective_selection() returning Cow<Selection>, avoiding a full LazyBufferHandle clone on every slice()/select_ranges() call when no filter is pending - Add MAX_SPARSE_RANGES (32) limit in materialize() to prevent hundreds of tiny I/O requests from scattered masks - Downgrade hot-path tracing from debug! to trace! Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01LM4hVS2zQBoFA8X1YZmQ5Z
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
BENCHMARK FAILEDBenchmark |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.087x ➖ datafusion / vortex-file-compressed (1.087x ➖, 0↑ 4↓)
|
File Sizes: PolarSignals ProfilingFile Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.966x ➖, 1↑ 1↓)
datafusion / vortex-compact (0.953x ➖, 4↑ 1↓)
datafusion / parquet (0.929x ➖, 6↑ 1↓)
datafusion / arrow (0.917x ➖, 8↑ 0↓)
duckdb / vortex-file-compressed (0.998x ➖, 1↑ 1↓)
duckdb / vortex-compact (0.977x ➖, 2↑ 1↓)
duckdb / parquet (0.978x ➖, 1↑ 3↓)
duckdb / duckdb (0.923x ➖, 5↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (medium confidence) datafusion / vortex-file-compressed (0.943x ➖, 3↑ 3↓)
datafusion / vortex-compact (0.905x ➖, 1↑ 0↓)
datafusion / parquet (1.006x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.042x ➖, 1↑ 6↓)
duckdb / vortex-compact (0.933x ➖, 1↑ 0↓)
duckdb / parquet (0.998x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.986x ➖, 9↑ 3↓)
datafusion / vortex-compact (0.997x ➖, 0↑ 2↓)
datafusion / parquet (0.933x ➖, 21↑ 0↓)
duckdb / vortex-file-compressed (1.031x ➖, 2↑ 14↓)
duckdb / vortex-compact (1.018x ➖, 2↑ 5↓)
duckdb / parquet (0.999x ➖, 1↑ 0↓)
duckdb / duckdb (0.973x ➖, 7↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (medium confidence) datafusion / vortex-file-compressed (0.971x ➖, 2↑ 1↓)
datafusion / vortex-compact (1.027x ➖, 0↑ 1↓)
datafusion / parquet (0.902x ➖, 11↑ 0↓)
datafusion / arrow (1.026x ➖, 2↑ 3↓)
duckdb / vortex-file-compressed (1.123x ❌, 0↑ 11↓)
duckdb / vortex-compact (1.062x ➖, 0↑ 4↓)
duckdb / parquet (0.977x ➖, 1↑ 1↓)
duckdb / duckdb (1.005x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.745x ➖, 8↑ 0↓)
datafusion / vortex-compact (1.068x ➖, 0↑ 3↓)
datafusion / parquet (0.746x ➖, 6↑ 0↓)
duckdb / vortex-file-compressed (1.133x ➖, 0↑ 7↓)
duckdb / vortex-compact (1.098x ➖, 0↑ 6↓)
duckdb / parquet (0.885x ➖, 2↑ 0↓)
Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.756x ➖, 3↑ 3↓)
datafusion / vortex-compact (0.702x ➖, 3↑ 0↓)
datafusion / parquet (0.828x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.969x ➖, 2↑ 3↓)
duckdb / vortex-compact (0.838x ➖, 3↑ 3↓)
duckdb / parquet (0.961x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.984x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.993x ➖, 0↑ 0↓)
duckdb / parquet (0.943x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.080x ➖, 1↑ 16↓)
datafusion / parquet (1.042x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (1.058x ➖, 6↑ 11↓)
duckdb / parquet (1.028x ➖, 0↑ 1↓)
duckdb / duckdb (1.002x ➖, 2↑ 2↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.043x ➖, 1↑ 3↓)
datafusion / vortex-compact (1.044x ➖, 1↑ 4↓)
datafusion / parquet (0.988x ➖, 3↑ 5↓)
duckdb / vortex-file-compressed (1.175x ➖, 0↑ 5↓)
duckdb / vortex-compact (1.153x ➖, 0↑ 6↓)
duckdb / parquet (0.862x ➖, 1↑ 0↓)
Full attributed analysis
|
Summary
Closes: #000
Testing