The governance layer your AI infrastructure power stack has been missing.

Spark-XC sits above existing GPU, workload, facility, grid, and finance systems to validate, authorize, and prove AI power actions — across five validation paths, with a tamper-evident Power Event Record for every action. Mission Control executes. Spark-XC validates.

Explore the Architecture → Request an AI Power Event Replay

Everything your AI infrastructure needs to prove every power action

GPU Telemetry Validation
Pre- and post-action NVML/DCGM snapshots — power, clocks, and utilization — confirm what the hardware actually did, not just what was requested.
Workload & Scheduler Context
Correlates each power action with Slurm, Kubernetes, and Run:ai job context so every change is tied to the workload it served.
Facility Power Correlation
Cross-checks GPU-level actions against DCIM, BMS, PDU/UPS, and utility APIs so rack, facility, and grid effects all reconcile.
Policy & Approval Gates
Authority, rate, and scope gates confirm a power action is permitted, scoped, and rate-limited before it ever reaches hardware.
Tamper-Evident Evidence Chain
Every governed action is SHA-256 hash-chained into an append-only audit chain (ARIV) — a forensically complete, tamper-evident Power Event Record for every decision.
Sits Above, Doesn't Compete
Spark-XC governs on top of NVIDIA Mission Control, DCGM, schedulers, DCIM, and BMS. Mission Control executes; Spark-XC validates, authorizes, and proves.

From power action to proof

Every AI power action runs through five validation paths and emits a single Power Event Record — the atomic unit of proof that it was approved, safe, auditable, and financially real.

1
Power Action Requested
A workload, scheduler, operator, or vendor stack (NVIDIA Mission Control, DCGM) initiates a power action that Spark-XC governs.
2
GPU Telemetry Validation
Path 1 captures NVML/DCGM pre- and post-snapshots — power, clocks, utilization — to confirm what the hardware actually did.
3
Workload & Facility Correlation
Paths 2 and 3 tie the action to Slurm, Kubernetes, and Run:ai job context, then reconcile it against DCIM, BMS, PDU/UPS, and utility APIs.
4
Policy & Approval Gates
Path 4 evaluates authority, rate, and scope gates. The action is either authorized, modified, or rejected before it reaches hardware.
5
Evidence Chain Commit
Path 5 hash-chains the action into the append-only, SHA-256 tamper-evident chain (ARIV) — optional HMAC signing, always available.
6
Power Event Record Emitted
A self-contained, independently replayable Power Event Record is committed — proving the action was approved, safe, auditable, and financially real.
Validation Flow
Power Action
REQUEST
Mission Control / DCGM / Scheduler
EXECUTE
SPARK-XC VALIDATION PATHS
1 · GPU Telemetry Validation
NVML/DCGM
2 · Workload / Scheduler Context
CONTEXT
3 · Facility Power Correlation
CORRELATE
4 · Policy / Approval Gates
GATE
5 · Tamper-Evident Evidence Chain
PER
Power Event Record
PROVEN

Spark-XC sits above — and stamps every action

Governance is layered above the execution path. As each action passes the Govern layer, Spark-XC stamps it with a Power Event Record. Because it validates and proves rather than gating execution inline, the stack keeps running even if governance is offline.

Spark-XC sits above execution: it stamps each action with a Power Event Record without sitting in the control path.

Governance offline: execution still flows Execute → Hardware — but no PER is produced for those actions. Spark-XC proves; it does not block, so resilience is preserved and the gap is itself recorded once governance returns.

See every power action — read-only by design

The Spark-XC dashboard is a read-only window into your fleet — live power, temperature, and energy telemetry alongside the Power Event Record stream. It observes and proves; it never executes a control action. It reads straight from the ARIV evidence chain and pairs with the metrics you already run on Prometheus and Grafana.

Read-Only by Design
The dashboard surfaces telemetry and Power Event Records — it never issues a power action. Governance and execution stay separate from observation.
Grafana & Prometheus
Fleet power savings, per-GPU power and thermal, control-loop latency, and safety violations export to the Prometheus and Grafana dashboards you already operate.
Straight From the Evidence Chain
Every figure traces back to a Power Event Record on the tamper-evident ARIV chain — what you see on screen is the same evidence an auditor can replay.

Ungoverned power actions vs. proven ones

Without a Governance Layer
  • Vendor stack executes — but no one validates the action
  • GPU telemetry never reconciled against facility and grid data
  • Workload and scheduler context lost after the fact
  • Logs are mutable, scattered, and often incomplete
  • No authority, rate, or scope gates — any action is honored
  • No way to prove an action was financially real
SPARK-XC
  • Sits above the vendor stack — every power action validated and authorized
  • Five validation paths span GPU, workload, facility, policy, and evidence
  • NVML/DCGM pre/post snapshots reconciled with DCIM, BMS, and utility APIs
  • SHA-256 hash-chained, tamper-evident evidence chain (optional HMAC signing)
  • Authority, rate, and scope gates — every action evaluated before hardware
  • A Power Event Record proves each action was approved, safe, auditable, and financially real

Ready to deploy in hours, not months

SPARK-XC sits above your existing stack with minimal deployment friction. No kernel modifications. No driver replacements. No application changes. It governs on top of NVIDIA Mission Control, DCGM, schedulers, DCIM, and BMS rather than replacing them.

Prerequisites
NVIDIA GPU (Ampere, Ada, Hopper, Blackwell) or AMD Instinct (MI210, MI250, MI300)
Linux host (Ubuntu 20.04+, RHEL 8+)
NVIDIA driver 525+ or AMD ROCm 5.0+
Root access for hardware register operations
What You Get
All 5 validation paths active within minutes
Policy and approval gates configurable via JSON/YAML
Power Event Records emitted from the first action
Zero application code changes required

Ready to prove every power action in your AI infrastructure?

We're onboarding partners now. Request a replay and see a real Power Event Record — approved, safe, auditable, and financially real — for your environment.

Request an AI Power Event Replay → View Architecture