Back to jobs
Requirements
Advantages
Experience with GPU optimization, GPU scheduling, GPU resource efficiency, quantization, runtime acceleration, or large-scale model serving.
AI Security - AI Platform Team Lead
Cato NetworksTel Aviv, TA, IsraelMay 27, 2026
On-site
Full-time
Security Engineering
Management
Welcome to the future of cloud networking and security!
Cato Networks is the first company to converge enterprise networking and security into one centralized and global service that is delivered by cloud. It is led by networking and security pioneer Shlomo Kramer (Check Point, Imperva) and early investor (Palo Alto Networks, Exabeam, Trusteer and more). Cato’s unique technology inspired a brand-new product category, later named “SASE” by Gartner and a market expected to reach $28.5 billion by 2028.
This is your opportunity to get on the rocket ship and join a company that is building a cutting-edge enterprise network and secure cloud platform, and is on a fast track to becoming the worldwide market leader – don’t miss it!
Cato is building a real-time AI runtime platform for security algorithms running inline across our global cloud and physical PoPs.
We are looking for a hands-on AI Platform Team Lead to build and lead the team behind this platform: a high-throughput, low-latency engine that runs GPU-based models, from MMBERT-style models to LLMs, together with CPU-based heuristics and security logic.
This is a core infrastructure role for someone who wants to own the runtime layer of AI security at scale: performance, reliability, orchestration, GPU efficiency, and production-grade execution in the traffic path.
The team will also own the model lifecycle required to take AI security algorithms from research to large-scale production, working closely with research and algorithm teams.
Responsibilities
We are looking for a hands-on AI Platform Team Lead to build and lead the team behind this platform: a high-throughput, low-latency engine that runs GPU-based models, from MMBERT-style models to LLMs, together with CPU-based heuristics and security logic.
This is a core infrastructure role for someone who wants to own the runtime layer of AI security at scale: performance, reliability, orchestration, GPU efficiency, and production-grade execution in the traffic path.
The team will also own the model lifecycle required to take AI security algorithms from research to large-scale production, working closely with research and algorithm teams.
Responsibilities
- Build and lead Cato’s AI Platform team: hiring, mentoring, architecture, technical direction, and execution.
- Own the AI security runtime platform for high-throughput, low-latency inline security decisions across Cato’s global cloud and PoPs.
- Design the orchestration layer for running GPU models, CPU heuristics, and security logic as one production engine.
- Own production readiness: observability, SLOs, autoscaling, reliability, rollout, rollback, and operational health.
- Own the model lifecycle platform: registry, versioning, deployment, monitoring, and safe production rollout.
- Work closely with research and algorithm teams to productionize AI security models and algorithms at scale.
- Define the long-term platform strategy for AI runtime and model serving at Cato.
Requirements
- 3+ years of leadership experience as a team lead, tech lead, or engineering manager.
- 3+ years of hands-on experience in AI inference, production ML infrastructure, model serving, or AI runtime platforms.
- Strong experience with production inference technologies such as Triton, vLLM, CUDA, Kubernetes, Docker, PyTorch, ONNX, TensorRT, or similar.
- 3+ years of experience with Go, or strong experience with a similar high-performance backend language such as C++, Rust, or Java.
- Experience with performance optimization, scalability, observability, and SLO-driven production ownership.
- Strong system design skills, especially around distributed systems, performance, reliability, and production infrastructure.
Advantages
Experience with GPU optimization, GPU scheduling, GPU resource efficiency, quantization, runtime acceleration, or large-scale model serving.