Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pixlcore/xyops/llms.txt

Use this file to discover all available pages before exploring further.

Running xyOps in production with lots of servers and high job volumes? This guide provides best practices for scaling your deployment to handle enterprise workloads.
Start with Self-Hosting first if you’re new to xyOps deployment. This guide complements those foundational concepts.

Hardware Sizing

Proper hardware provisioning is critical for production xyOps deployments at scale.

CPU Cores

xyOps is multi-process and highly concurrent. More cores improve performance across:
  • Job scheduler
  • Web server request handling
  • Storage I/O operations
  • Real-time log compression
Recommendation: Minimum 4 cores for small deployments, 8-16 cores for production fleets with hundreds of servers.

Memory (RAM)

Adequate RAM ensures smooth operation and reduces disk I/O:
  • Node.js heap space
  • In-process caches (storage, lists)
  • Storage engine caches (SQLite, Filesystem)
  • OS page cache for log files
Recommendation: 16-32 GB RAM for production installs. Higher RAM directly improves cache hit rates.

Storage

Use fast SSD/NVMe storage for production. HDDs cannot handle the IOPS required for parallel job logs and database operations.
  • Type: Prefer SSD or NVMe for local Filesystem/SQLite backends
  • IOPS: Ensure adequate IOPS for parallel job logs, snapshots, and uploads
  • Capacity: Plan for log archives, job history, and monitor time-series data

Network

  • Ensure good NIC throughput and low latency between conductors and workers
  • For external storage (S3, Redis, MinIO), place conductors in the same region/AZ
  • Use load balancers with proper health checks for multi-conductor setups

OS Limits

# Increase file descriptor limits
ulimit -n 65536

# For systemd services, add to service file:
LimitNOFILE=65536
LimitNPROC=32768
Configure swap conservatively to avoid heap thrashing under memory pressure.

Memory Configuration

Node.js Heap Size

xyOps honors the NODE_MAX_MEMORY environment variable to set Node’s old-space heap size.
1

Set environment variable

export NODE_MAX_MEMORY=8192
Or for Docker:
docker run -e NODE_MAX_MEMORY=8192 ...
2

Calculate appropriate value

On a 16 GB instance, allocate 8-12 GB to Node.js heap, leaving room for:
  • OS and system processes
  • Filesystem cache
  • External daemons (nginx, database)
3

Monitor and adjust

Monitor RSS vs heap usage over time. Adjust conservatively to avoid swapping.
Default: 4096 MB (4 GB)

Storage Engine Caching

xyOps uses pixl-server-storage with in-memory caches for JSON records.
Storage.SQLite.cache.maxBytes
number
default:"104857600"
Maximum cache size in bytes (default ~100 MB)
Storage.SQLite.cache.maxItems
number
default:"100000"
Maximum cached items
Storage.Filesystem.cache.maxBytes
number
default:"104857600"
Filesystem cache size in bytes
Recommendation: For large production installs, increase cache sizes 5-10× if RAM permits:
"Storage": {
  "SQLite": {
    "cache": {
      "enabled": true,
      "maxBytes": 524288000,
      "maxItems": 500000
    }
  },
  "Filesystem": {
    "cache": {
      "enabled": true,
      "maxBytes": 524288000,
      "maxItems": 500000
    }
  }
}
Tune based on hit ratio and latency. Monitor cache effectiveness in storage logs.

Multi-Conductor Architecture

Multi-conductor deployments require external shared storage so all conductors see the same state.
See Multi-Conductor with Nginx for detailed setup instructions.

Storage Backend Options

AWS S3 works but has higher latency. MinIO (self-hosted S3) performs better on-prem.
"Storage": {
  "engine": "S3",
  "S3": {
    "params": {
      "Bucket": "xyops-production"
    },
    "cache": {
      "enabled": true,
      "maxBytes": 524288000
    }
  }
}
Common pattern: fast key/value store for JSON documents, object store for binaries.
"Storage": {
  "engine": "Hybrid",
  "Hybrid": {
    "docEngine": "Redis",
    "binaryEngine": "S3"
  }
}
Ensure Redis persistence (RDB/AOF) is enabled for durability.
If using NFS for Filesystem backend:
  • Ensure low latency
  • Adequate throughput
  • Robust file locking semantics
Network file systems can introduce latency. Test thoroughly before production use.
SQLite works great for single-conductor but cannot be shared across multiple conductors. Switch to a networked backend for multi-conductor.
Best Practice: Keep conductors in the same region/AZ as storage to minimize cross-zone latency.

Performance Tuning

Disable QuickMon at Scale

QuickMon sends per-second metrics from all satellites. At large scale, this adds ingestion load.
"satellite": {
  "config": {
    "quickmon_enabled": false,
    "monitoring_enabled": true
  }
}
Minute-level monitoring remains enabled via monitoring_enabled.

Disable Job Network Monitoring

For servers with tens of thousands of network connections, disable real-time network monitoring during jobs:
// In satellite config.json or global satellite.config
"disable_job_network_io": true
This reduces load on busy servers while jobs are running.

Job Throughput

Increase the global job rate limit prudently:
max_jobs_per_min
number
default:"100"
Global e-brake to prevent runaway workflows from overwhelming the system
Align with per-category limits and workflow constraints. Monitor worker CPU/RAM when increasing.

Data Retention

Cap database history sizes to prevent unbounded growth:
"db_maint": {
  "jobs": { "max_rows": 1000000 },
  "alerts": { "max_rows": 100000 },
  "snapshots": { "max_rows": 100000 },
  "activity": { "max_rows": 100000 },
  "servers": { "max_rows": 10000 }
}
Adjust to fit your storage budget and compliance requirements.

Search Performance

search_file_threads
number
default:"1"
Worker threads for file search operations
Increase carefully for frequent file searches (I/O bound - test first).

Automated Backups

1

Configure nightly API export

Use the nightly API export for critical data. Schedule via cron and store off-host. See Daily Backups.
2

Enable SQLite backups

"Storage": {
  "SQLite": {
    "backups": {
      "enabled": true,
      "dir": "data/backups",
      "compress": true,
      "keep": 7
    }
  }
}
Note: Backups briefly lock the database during copy.
3

Store backups off-host

Copy backups to S3, network storage, or backup service for disaster recovery.

Monitoring and Alerting

Critical Error Notifications

Configure system hooks to send alerts for crashes and failed upgrades:
"hooks": {
  "critical": {
    "email": "ops-oncall@yourcompany.com"
  }
}
Or create tickets:
"hooks": {
  "critical": {
    "ticket": {
      "type": "issue",
      "assignees": ["admin"]
    }
  }
}

Universal Alert Actions

Configure global alert actions that fire for all alerts:
"alert_universal_actions": [
  {
    "enabled": true,
    "hidden": true,
    "condition": "alert_new",
    "type": "snapshot"
  },
  {
    "enabled": true,
    "condition": "alert_new",
    "type": "email",
    "email": "oncall-pager@mycompany.com"
  }
]

Security Hardening

"WebServer": {
  "whitelist": ["10.0.0.0/8", "172.16.0.0/12"],
  "allow_hosts": ["xyops.yourcompany.com"]
}
Restrict inbound IPs using CIDR notation. Limit valid Host headers.
"WebServer": {
  "https": true,
  "https_port": 5523,
  "https_cert_file": "conf/tls.crt",
  "https_key_file": "conf/tls.key",
  "https_force": true
}
Enable HTTPS and force HTTP→HTTPS redirects. Use https_header_detect if terminating TLS upstream.
"WebServer": {
  "max_upload_size": 536870912,
  "max_connections": 2048,
  "max_concurrent_requests": 256
}
Reduce upload limits and tune connection caps to match instance capacity.
"WebServer": {
  "uri_response_headers": {
    "(\\/|\\.html)$": {
      "Content-Security-Policy": "default-src 'self'...",
      "X-Frame-Options": "DENY",
      "Strict-Transport-Security": "max-age=31536000"
    }
  }
}
Enforce CSP, HSTS, and other security headers for HTML routes.
Rotate your secret_key every few months. See Secret Key Rotation for details.

Rate Limiting with Nginx

If using the multi-conductor Nginx setup, add rate limiting:
1

Create limits.conf

limit_req_zone $binary_remote_addr zone=req_per_ip:20m rate=100r/s;
limit_req_status 429;
2

Add volume bind to Docker

docker run -v ./limits.conf:/etc/nginx/conf.d/limits.conf:ro ...
This limits traffic to 100 requests/sec per IP, using ~20MB cache (~300K IPs). See ngx_http_limit_req_module for more options.

Additional Tuning

Logging Verbosity

Disable verbose logs in production unless actively debugging:
"WebServer": {
  "log_requests": false
},
"Storage": {
  "log_event_types": {}
}

Timeouts

Configure request timeouts to mitigate slow-loris attacks:
"WebServer": {
  "timeout": 30,
  "request_timeout": 30,
  "keep_alive_timeout": 30,
  "socket_prelim_timeout": 5
}

References