Sign In

Storage & Compression

Intelligent tiered storage with 91-95% compression. Your telemetry stays local in DuckDB. Only ~2KB summaries leave for AI analysis.

Zero-Knowledge Architecture: Raw telemetry never leaves your infrastructure. The AI only sees statistical summaries, not your actual logs, traces, or metrics.

How It Works

ReductrAI stores all telemetry locally in DuckDB, an embedded analytical database. Data automatically flows through three storage tiers based on age, with intelligent compression at each stage.

Telemetry Ingestion -> HOT Tier -> WARM Tier -> COLD Tier
(Raw data)          (< 1 hour)     (1h - 7 days)    (> 7 days)

|
v
~2KB Summary -> Cloud AI -> Investigation Results

Storage Tiers

HOT Tier (< 1 hour)

Raw, uncompressed data for real-time anomaly detection. Fastest query performance for active incident investigation. Data is kept in native DuckDB format for sub-second queries.

WARM Tier (1 hour - 7 days)

Compressed storage with 91-95% reduction. V2 Compression Engine applies dictionary encoding, delta compression, and columnar transformation. Still queryable, slightly higher latency.

COLD Tier (> 7 days)

Maximum compression for long-term retention. Data is preserved for compliance and historical analysis. Query latency is higher but storage cost is minimal. Retention period depends on your license tier.

Retention by License

Data retention varies by license tier. After the retention period, data is automatically purged from local storage.

License Tier Retention Period Storage Limit
FREE 30 days 10 GB
PRO 90 days 50 GB
BUSINESS 180 days 200 GB
ENTERPRISE 365 days Unlimited

V2 Compression Engine

The open-source compression engine achieves 91-95% storage reduction through multiple techniques. This is the "proof" that your data stays local - you can audit every line of code.

Compression by Data Type

Data Type Technique Compression
Spans / Traces SpanPatternCompressor (delta encoding, dictionary) 94-95%
Logs ContextualDictionaryCompressor (template extraction) 91-92%
Metrics TimeSeriesAggregator (series grouping, delta timestamps) 91%
Events / JSON SemanticCompressor (columnar transform) 91-93%

How Compression Works

  1. Dictionary Encoding - Repeated strings (service names, error messages) replaced with integer indices
  2. Delta Encoding - Timestamps stored as deltas from a base time, reducing bytes per value
  3. Columnar Transformation - Row-based data converted to column format for better compression ratios
  4. Template Extraction - Log patterns extracted and stored once, with variables referenced
  5. Gzip Final Pass - Standard compression applied to the transformed data

Open Source: The compression engine is part of the open-source agent. Security teams can audit the code at github.com/reductrai/agent

What Goes to the Cloud?

Only statistical summaries (~2KB per service) are sent to the cloud for AI analysis. Here's exactly what the AI sees:

# Example summary sent to cloud (actual data stays local) { "service": "payment-service", "errorRate": 4.2, # Percentage, not actual errors "latencyP99": 523, # Milliseconds "requestsPerMin": 1250, # Count only "anomalyType": "error_spike", "correlatedServices": ["stripe-api", "db-primary"] }

What the Cloud NEVER Sees

Verify It Yourself: The agent is open source. Run tcpdump or wireshark to inspect exactly what leaves your network. We prove it, not just claim it.

Local Queries with DuckDB

All your telemetry is stored locally in DuckDB and can be queried directly. The ReductrAI agent provides a query command:

# Query your local telemetry reductrai query "SELECT service, COUNT(*) as errors FROM logs WHERE level = 'error' AND timestamp > now() - INTERVAL '1 hour' GROUP BY service ORDER BY errors DESC" # Output: +-----------------------+--------+ | service | errors | +-----------------------+--------+ | payment-service | 423 | | auth-service | 89 | | notification-worker | 12 | +-----------------------+--------+

Useful Queries

# Storage usage by tier reductrai query "SELECT tier, SUM(size_bytes)/1e9 as gb FROM storage_stats GROUP BY tier" # Top error-producing services (last 24h) reductrai query "SELECT service, error_rate FROM service_metrics WHERE timestamp > now() - INTERVAL '24 hours' ORDER BY error_rate DESC LIMIT 10" # Trace latency percentiles reductrai query "SELECT service, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY duration_ms) as p50, PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) as p99 FROM spans GROUP BY service"

Storage Location

By default, ReductrAI stores data in ~/.reductrai/. You can customize this with the REDUCTRAI_DATA_DIR environment variable.

# Default location ~/.reductrai/ +-- reductrai.db # Main DuckDB database +-- reductrai.db.wal # Write-ahead log +-- compressed/ # WARM/COLD tier data | +-- 2024-01/ | +-- 2024-02/ +-- config.yaml # Local configuration # Custom location export REDUCTRAI_DATA_DIR=/mnt/fast-storage/reductrai

Storage Recommendation: Use SSD storage for the data directory. The HOT tier benefits significantly from fast I/O for real-time anomaly detection.