Introduction
Shoc Platform is a powerful, extensible system designed for running High Performance Computing (HPC) and Machine Learning (ML) workloads at scale. Whether you’re training large-scale AI models or running tightly coupled scientific simulations, Shoc provides the flexibility, performance, and control needed to orchestrate complex computational jobs efficiently.
What is Shoc Platform?
At its core, Shoc Platform is a job orchestration platform built for modern infrastructure. It simplifies the process of submitting, monitoring, and managing distributed compute jobs across dynamic clusters—whether they’re running on bare metal, virtual machines, or in the cloud.
Shoc Platform is tailored for teams and individuals who need to:
- Run GPU-accelerated ML training and inference jobs
- Launch distributed MPI/HPC workloads
- Manage clusters and compute nodes programmatically
- Maintain security and reproducibility across environments
Core Features
ML and HPC Job Orchestration
- Define, submit, and monitor jobs using a unified CLI.
- Support for containerized workloads with advanced resource scheduling.
- Monitoring for CPU and memory usage.
Cluster Management
- Register and manage compute clusters from multiple environments.
- Dynamically allocate jobs to available nodes based on resource availability.
- Node-level metrics: capacity, usage, health, and time-series monitoring.
Secrets Management
- Store and inject secrets (e.g., API keys, DB passwords) securely into jobs.
- Integrates with environment variable injection and volume mounts.
- Supports encrypted storage and RBAC-based access.
Image and Registry Support
- Pull images from public and private registries.
- Authenticate against container registries per user or organization.
Multi-User and Role-Based Access Control
- Namespace isolation for projects and teams.
- Fine-grained access controls for jobs, clusters, secrets, and registries.
Tooling and Interfaces
- CLI-first experience for power users and CI/CD integration.
- Web UI for job tracking, cluster status, and user management.
- REST API for integrating Shoc with your systems and automation scripts.
Last updated on