Dynamic Workload Scheduler is a resource management and job scheduling platform designed for AI Hypercomputer. Dynamic Workload Scheduler improves your access to AI/ML resources, helps you optimize your spend, and can improve the experience of workloads such as training and fine-tuning jobs, by scheduling all the accelerators needed simultaneously. Dynamic Workload Scheduler supports TPUs and NVIDIA GPUs, and brings scheduling advancements from Google ML fleet to Google Cloud customers. Dynamic Workload Scheduler is also integrated in many of your preferred Google Cloud AI/ML services: Compute Engine Managed Instance Groups, Google Kubernetes Engine, Vertex AI, Batch, and more are planned.
Dynamic Workload Scheduler is built on Google Borg technology, which is responsible for real-time scheduling of millions of jobs on the Google ML Fleet, including one of the largest distributed LLM training jobs in the world (as of November 2023). With Flex Start and Calendar modes, Dynamic Workload Scheduler can provide you with more flexibility, improved access to GPUs and TPUs, better resource utilization, and lower costs. Customers and partners are already seeing the benefits of Dynamic Workload Scheduler.
Here is what Linum AI, a text-to-video generative AI company, had to say:
“The new Dynamic Workload Scheduler scheduling capabilities have been a game-changer in procuring sufficient GPU capacity for our training runs. We didn’t have to worry about wasting money on idle GPUs while refreshing the page hoping for sufficient compute resources to become available.” – Sahil Chopra, Co-Founder & CEO, Linum AI
sudoAI, a 3D generative AI company, trained its latest generative model using APIs enabled by Dynamic Workload Scheduler.
“We really like the convenience of finding capacity each time we need it without needing to worry about it. It enabled us to test new ideas, iterate, and also run longer training runs. We were able to fully train our latest 3D Gen AI model using the new Dynamic Workload Scheduler functionality and meet our internal deadlines to launch.” – Robin Han, Co-Founder and CEO, sudoAI