Replicate GPU Cloud Pricing
from $0.440/hr/hrServerless platform for running and hosting machine learning models via API, billed per second of GPU compute with a large community model library.
Start on Replicate
Current Pricing — 6 configurations
Provider Details
FAQ
What GPUs does Replicate offer?
Nvidia H100 (80GB), Nvidia A100 (80GB, SXM), Nvidia A100 (40GB), Nvidia RTX 4090 (24GB), Nvidia L40S (48GB), Nvidia T4 (16GB)
Where are Replicate data centers located?
Replicate operates in: us-central.
How does Replicate bill for GPU usage?
Replicate supports per-second billing.
Is Replicate reliable for production workloads?
Replicate has a trust score of 4.3/5. Features include: serverless, model-hosting, api-access, auto-scaling.
Last data refresh: April 29, 2026. Verify on Replicate's site.
Related Providers
Fal.ai
Serverless inference platform optimized for generative media workloads (images, video, audio) with sub-second cold starts and real-time streaming.
Modal
Developer-first serverless GPU platform with a Python-native SDK, per-second billing, and automatic cold-start optimization for ML workloads.
RunPod
Consumer and data-center GPU cloud with spot and on-demand instances, a large template marketplace, and a serverless inference platform.
TensorDock
Budget-friendly GPU marketplace aggregating data-center hardware from multiple hosts, offering broad GPU variety at competitive hourly rates.
Genesis Cloud
European GPU cloud powered by renewable energy, offering NVIDIA instances at competitive rates with a focus on sustainability.
Vast.ai
Peer-to-peer GPU marketplace that aggregates idle hardware from independent hosts, offering some of the lowest per-hour rates available.