v1.0.4 · Patent Pending · Pre-Seed Open

Training runs fail.
Ours don't.

NoNans is a kernel-level C++ stabilization layer that intercepts numerical singularities during LLM training — before they corrupt your optimizer. Zero rollbacks. Zero lost GPU-hours. Drop in, not swap out.

See live demo Request technical deck →
Trusted infrastructure for
H100 Clusters PyTorch JAX Megatron-LM DeepSpeed FSDP AWS · GCP · Azure

Watch it intercept in real time.

Same training loop. Same CUDA stack. The only difference is one line of import — and your run never dies again.

# H100 Cluster · Epoch 88 / 176
[INFRA] Scanning Tensor 0x7f83a4c2...
 
[CRITICAL] NaN Detected @ Layer 88
[CRITICAL] Gradient overflow → inf
[CRITICAL] Propagating through optimizer
[WARN] Momentum buffer corrupted
[WARN] Weight tensors: INVALID
 
[FATAL] Run terminated at step 88,402
[FATAL] Rolling back to checkpoint 71
[FATAL] 17 epochs · ~$6,200 WASTED
 
[INFRA] Restoring model state...
[INFRA] Reloading from disk (84s)...
[INFRA] Resuming from step 71,000...
[WARN] Estimated retraining: 4.2hrs
 
Compute waste this month: $38,400
Architecture Stack
API
Training Framework
PyTorch / JAX / Megatron — unchanged
passthrough
↓ single import
KERNEL
NoNans Intercept Layer
Singularity detection + mapping
0ns
↓ in-place
CUDA
CUDA Extension
∞ gradient → finite boundary map
in-kernel
↓ zero-copy
C++
Continuity Bridge
Momentum preserved · optimizer stable
0-alloc
GPU
H100 / A100 Hardware
Training continues — seamlessly
normal
Integration
pip install nonans
import nonans.stabilize as nn
nn.wrap(model) # done
Performance

Numbers that make CFOs lean forward.

Measured on production H100 SXM5 clusters running 70B+ parameter training runs.

GPU ROI Recovery
15.4%
Of total compute spend recovered per month on average across deployments
Added Kernel Latency
0ns
In-place tensor mutation — zero overhead per training step, measured on H100
Weight Stability
99.9%
Post-singularity optimizer state integrity across 100B+ param runs
Rollbacks Required
0
Zero checkpoint rollbacks across all production deployments since v1.0

What is your cluster bleeding?

Adjust your compute profile. NoNans recovers 15.4% of total spend — the math is uncomfortable.

Monthly GPU Spend
$500,000
NaN Events / Week
6 events
Avg Rollback Depth (epochs)
14 epochs
Monthly compute wasted to NaN
Annual compute wasted
Annual recovery with NoNans
Payback period

Based on 15.4% recovery rate. Cloud credits from AWS, GCP, Azure offset infrastructure cost at deployment. Enterprise contracts available from $50K ARR.

Pricing

Usage-based. You only pay when
we save your run.

Aligned incentives from day one. Start free on your next training run, scale to enterprise contracts as your GPU spend grows.

Starter
Free / always
Up to $50K monthly GPU spend. Full kernel access.
  • Full NaN interception kernel
  • pip install, 3-line integration
  • PyTorch + JAX support
  • Up to 8x H100 GPUs
  • Community support
  • Compute audit logs
Start Free
Enterprise
Custom / annual
From $50K ARR. On-prem, private cloud, SOC 2 available.
  • Everything in Pro
  • On-premise deployment
  • Custom CUDA extensions
  • MNDA + IP indemnification
  • Dedicated infra engineer
  • Private cloud deployment
  • SOC 2 Type II (roadmap)
Talk to Engineering
01 / MARKETPLACE
AWS · GCP · Azure listings
Cloud marketplace billing means enterprise procurement handles the contract. Zero additional sales cycle for qualified buyers.
02 / CREDITS
$450K non-dilutive runway
AWS Activate ($100K), GCP Startups ($200K), Azure Startups ($150K) fund design-partner validation runs. No equity cost.
03 / GROWTH
Usage compounds with scale
As model sizes increase, NaN frequency grows nonlinearly. Our TAM expands with the market — automatically.

From the engineers running the clusters.

Design partners across AI labs, enterprise ML teams, and cloud-native training pipelines.

"We were losing 12–18 epochs a week to gradient explosions on our 70B run. NoNans turned a recurring $40K monthly write-off into zero. That's the kind of ROI that doesn't need a slide deck."
MR
ML Infrastructure Lead
Series B AI Lab · H100 Cluster
"Three lines to integrate. The dashboard shows recovered compute in real time. When our CFO saw the monthly recovery number, they asked why we hadn't been running this from the start."
AK
Head of Training Infrastructure
Enterprise AI · 500+ GPU Fleet
"The 0ns latency claim is real. We profiled every layer and saw nothing. It sits transparently under PyTorch and catches what gradient clipping misses. This is the layer that should have existed years ago."
DL
Principal Research Engineer
Frontier Model Lab · 100B+ params
Monetization Path

Infrastructure credits →
enterprise ARR.

NoNans enters the market through cloud provider startup programs — $450K of non-dilutive capital that funds design partner runs on AWS, GCP, and Azure infrastructure.

Each validated enterprise deployment becomes a cloud marketplace listing. Customers buy through their existing cloud contracts — no new procurement process, no legal friction, instant activation.

The monetization ladder: free tier captures ML engineers, usage-based Pro converts teams with $50K+ monthly GPU spend, enterprise licenses ($50K+ ARR) target AI labs and verticals — pharma, finance, national security — where training reliability is mission-critical.

Google Cloud for Startups
Primary GTM — TPU + A100 credits. GCP ML customer base is primary ICP.
$200K
Azure for Startups
H100 cluster access. Path to Microsoft enterprise channel.
$150K
AWS Activate
EC2 GPU + SageMaker. AWS Marketplace listing target Q3.
$100K

Why Google can't copy this
in a sprint.

01 —
Technical Moat
Kernel-level access is a high bar
Writing CUDA extensions that intercept gradient computation — with measured zero latency — requires expertise at the intersection of numerical methods, systems programming, and ML training dynamics. This is not a Python wrapper. It's original systems work.
02 —
IP Moat
Patent-pending Numerical Continuity Architecture
The mathematical strategy for mapping gradient singularities to finite boundary states — preserving optimizer momentum in-kernel — is a novel invention. Patent application filed 2026. No prior art covers this approach at the CUDA extension level.
03 —
Data Moat
Production telemetry compounds with each deployment
Every production run generates proprietary data on NaN event types, layer positions, gradient distributions, and model architectures. This corpus improves mapping precision continuously — and cannot be replicated without production access.
04 —
Timing Moat
NaN frequency scales nonlinearly with model size
As parameter counts grow from 70B to 700B, gradient singularity events don't increase linearly — they compound. The market for training stability infrastructure is growing faster than the GPU market itself. The worst is yet to come. We're already there.
Pre-Seed · Open Now

If your clusters run large
runs, talk to us.

We're speaking with ML infrastructure leads, technical founders, and pre-seed investors who understand that compute waste is the largest controllable cost in frontier AI.

Contact Engineering Investor Inquiry →

Patent Pending · v1.0.4 · MNDA available · ahlem.makhebi@nonans.com