Storage & Compression

Intelligent tiered storage with 91-95% compression. Your telemetry stays on your infrastructure. Only ~2KB summaries go to the AI for analysis.

Zero-Knowledge Architecture: Raw telemetry never leaves your infrastructure. The AI only sees statistical summaries, not your actual logs, traces, or metrics.

How It Works

ReductrAI stores all telemetry locally in DuckDB, an embedded analytical database. Data automatically flows through three storage tiers based on age, with intelligent compression at each stage.

Telemetry Ingestion -> HOT Tier -> WARM Tier -> COLD Tier
(Raw data) (< 1 hour) (1h - 7 days) (> 7 days)

|
v
~2KB Summary -> Cloud AI -> Investigation Results

Storage Tiers

HOT Tier (< 1 hour)

Raw, uncompressed data for real-time anomaly detection. Fastest query performance for active incident investigation. Data is kept in native DuckDB format for sub-second queries.

WARM Tier (1 hour - 7 days)

Compressed storage with 91-95% reduction. V2 Compression Engine applies dictionary encoding, delta compression, and columnar transformation. Still queryable, slightly higher latency.

COLD Tier (> 7 days)

Maximum compression for long-term retention. Data is preserved for compliance and historical analysis. Query latency is higher but storage cost is minimal. Retention period depends on your license tier.

Retention by License

Data retention varies by license tier. After the retention period, data is automatically purged from local storage.

License Tier	Retention Period	Archive Storage
FREE	30 days	Local only
PRO	90 days	Local only
BUSINESS	180 days	Local only
ENTERPRISE	Custom (unlimited)	S3, GCS, Azure Blob

Enterprise customers can configure custom archive storage (AWS S3, Google Cloud Storage, Azure Blob) with unlimited retention. Data is automatically tiered to your archive storage after the warm period.

V2 Compression Engine

The source-available compression engine achieves 91-95% storage reduction through multiple techniques. This is the "proof" that your data stays local - you can audit every line of code.

Compression by Data Type

Data Type	Technique	Compression
Spans / Traces	SpanPatternCompressor (delta encoding, dictionary)	94-95%
Logs	ContextualDictionaryCompressor (template extraction)	91-92%
Metrics	TimeSeriesAggregator (series grouping, delta timestamps)	91%
Events / JSON	SemanticCompressor (columnar transform)	91-93%

How Compression Works

Dictionary Encoding - Repeated strings (service names, error messages) replaced with integer indices
Delta Encoding - Timestamps stored as deltas from a base time, reducing bytes per value
Columnar Transformation - Row-based data converted to column format for better compression ratios
Template Extraction - Log patterns extracted and stored once, with variables referenced
Gzip Final Pass - Standard compression applied to the transformed data

Source-Available (SSPL): The compression engine is part of the source-available agent. Security teams can audit the code at github.com/reductrai/agent

What Goes to the Cloud?

Only statistical summaries (~2KB per service) are sent to the cloud for AI analysis. Here's exactly what the AI sees:

# Example summary sent to cloud (actual data stays local)
{
  "service": "payment-service",
  "errorRate": 4.2,           # Percentage, not actual errors
  "latencyP99": 523,          # Milliseconds
  "requestsPerMin": 1250,    # Count only
  "anomalyType": "error_spike",
  "correlatedServices": ["stripe-api", "db-primary"]
}
      

What the Cloud NEVER Sees

Actual log messages or error text
Request/response payloads
User IDs, emails, tokens, or PII
Database queries or results
Headers, cookies, or authentication data
Raw trace spans or metric samples

Verify It Yourself: The agent is source-available (SSPL). Run tcpdump or wireshark to inspect exactly what leaves your network. We prove it, not just claim it.

Local Queries with DuckDB

All your telemetry is stored locally in DuckDB and can be queried directly. The ReductrAI agent provides a query command:

# Query your local telemetry
reductrai query "SELECT service, COUNT(*) as errors
  FROM logs
  WHERE level = 'error'
  AND timestamp > now() - INTERVAL '1 hour'
  GROUP BY service
  ORDER BY errors DESC"

# Output:
+-----------------------+--------+
| service               | errors |
+-----------------------+--------+
| payment-service       | 423    |
| auth-service          | 89     |
| notification-worker   | 12     |
+-----------------------+--------+
      

Useful Queries

# Storage usage by tier
reductrai query "SELECT tier, SUM(size_bytes)/1e9 as gb FROM storage_stats GROUP BY tier"

# Top error-producing services (last 24h)
reductrai query "SELECT service, error_rate FROM service_metrics
  WHERE timestamp > now() - INTERVAL '24 hours' ORDER BY error_rate DESC LIMIT 10"

# Trace latency percentiles
reductrai query "SELECT service,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY duration_ms) as p50,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) as p99
  FROM spans GROUP BY service"
      

Storage Location

By default, ReductrAI stores data in ~/.reductrai/. You can customize this with the REDUCTRAI_DATA_DIR environment variable.

# Default location
~/.reductrai/
├── reductrai.db          # Main DuckDB database
├── reductrai.db.wal      # Write-ahead log
├── compressed/           # WARM/COLD tier data
│   ├── 2024-01/
│   └── 2024-02/
└── config.yaml           # Local configuration

# Custom location
export REDUCTRAI_DATA_DIR=/mnt/fast-storage/reductrai
      

Storage Recommendation: Use SSD storage for the data directory. The HOT tier benefits significantly from fast I/O for real-time anomaly detection.