BMO Financial logo

Principal ai cloud engineer

BMO Financial

Toronto, Canada

Share this job:
$103,200.00 - $192,000.00 Posted: 4 hours ago

Job Description

Application Deadline: 11/29/2025
Address: 100 King Street West

The Team We accelerate BMO’s AI journey by building enterprise-grade, cloud-native AI solutions. Our team combines engineering excellence with cutting-edge AI to deliver scalable, secure, and responsible solutions that power business innovation across the bank. We enable and accelerate our partners on their AI journeys across the enterprise, helping teams across BMO unlock value at scale. We support one another in times of need and take pride in our work. We are engineers, AI practitioners, platform builders, thought leaders, multipliers, and coders. Above all, we are a global team of diverse individuals who enjoy working together to create smart, secure, and scalable solutions that make an impact across the enterprise. Our ambition is bold: deploy our capital and resources to their highest and most profitable use through a digital-first operating model, powered by data and AI-driven decisions. The Impact As a Principal Cloud AI Engineer, you are a hands-on technical developer who designs, builds, and scales cloud-native AI solutions and products. You help set engineering standards, establish patterns, mentor senior engineers, and partner with multiple teams to deliver resilient, governed, and cost-efficient AI at enterprise scale. You’ll help shape and evolve our AI cloud strategy from model serving and LLMOps to security, observability, and compliance so teams across the bank can innovate safely and rapidly. You will advance BMO’s Digital First strategy by:
Defining reference and production-grade solutions for AI/Gen AI on cloud (Azure preferred; multi-cloud aware).
Building reusable, secure, and observable components (APIs, SDKs, microservices, pipelines).
Operationalizing LLMs and RAG with strong controls and Responsible AI guardrails.

Driving platform roadmaps that enable faster delivery, lower risk, and measurable business outcomes.

What’s In It for You
Influence the technical direction of enterprise AI and the platform primitives others build on.
Ship high-impact systems used across many business lines and products.
Work across the full stack: cloud infra, data/feature pipelines, model serving, LLMOps, and Dev Sec Ops.

Partner with a leadership team invested in your growth and thought leadership.

Responsibilities
Infrastructure & Platform Builder
Design, build, and operate cloud-native AI infrastructure for ML/Gen AI workloads:

Compute: GPU/CPU clusters, autoscaling, spot instance strategies
Networking: Azure VNet, Private Link, peering, multi-region HA/DR
Storage & Databases: high-performance data lakes (e.g., Azure Data Lake Storage) , relational DBs, vector DBs (FAISS, Milvus, Pinecone, pgvector) Security: IAM, Key Vault-backed secrets management, encryption, policy-as-code

Implement observability and reliability for AI infra:

Metrics (latency, throughput, GPU utilization, cost)
Logging/tracing (Open Telemetry), SLOs/SLIs for infra services
Build CI/CD and Git Ops pipelines for infrastructure-as-code (Terraform/Bicep) and AI platform components

Drive Fin Ops for AI infra: GPU rightsizing, caching, inference optimization, cost governance
Application & Service Enablement
Enable frontend and backend services for AI platforms:

Secure APIs, microservices, and event-driven architectures
Integration with custom model runtimes (Tensor RT-LLM, v LLM, Triton/KServe)
Provide infrastructure support for RAG systems: embeddings, chunking, retrieval pipelines

Ensure scalable serving infrastructure for LLMs and ML models with caching and token optimization
Strategy & Architecture
Define and evolve AI infrastructure reference architecture for cloud (Azure preferred):

Container orchestration (Kubernetes), service mesh, ingress
Serverless/event-driven patterns for AI pipelines
Multi-region, HA/DR, compliance-ready designs

Establish standards and best practices for containerization, Ia C, and secure networking for AI systems
Security, Risk & Governance
Implement defense-in-depth for AI infra:

IAM least privilege, private networking, KMS/Key Vault, SBOM, image signing

Ensure compliance and Responsible AI controls at infra level:

Data residency, encryption, lineage, audit readiness
Delivery & Operations
Lead infrastructure discovery and solution design with stakeholders
Operate platforms with SRE principles: error budgets, incident response, chaos testing

Mentor engineers; create reusable Ia C modules, templates, and golden paths
Must-Have Qualifications
Bachelor’s/Master’s/Ph D in CS, Engineering, or related field
7+ years building large-scale distributed cloud infrastructure
5+ years hands-on with Azure (preferred); AWS/GCP nice to have
Proven experience with AI/ML infra: GPU clusters, Kubernetes, CI/CD, observability
Strong in Ia C (Terraform/Bicep), Kubernetes, networking, security
Expertise in cloud-native patterns: containers, service mesh, serverless
Familiarity with MLOps/LLMOps infra: model serving, feature stores, vector DBs
Programming in Python (infra automation) and one of Go/Type Script for tooling
Understanding of frontend/backend integration for AI services
Familiarity with MLOps/LLMOps infra: model serving, feature stores, vector DBs
Programming in Python (infra automation) and one of Go/Type Script for tooling

Understanding of frontend/backend integration for AI services
Nice-to-Have
GPU optimization (CUDA/NCCL, Tensor RT-LLM)
Observability tools (Prometheus, Grafana, Open Telemetry)
Event streaming (Kafka/Azure Event Hubs), real-time systems

Experience with AI platform products (Azure ML, MLflow, KServe, Hugging Face)
Tech Stack
Cloud & Infra: Azure (AKS, Functions, Event Hubs, Key Vault), Terraform/Bicep, Git Hub Actions/Azure Dev Ops
AI Infra: Kubernetes, KServe/Triton, v LLM, Tensor RT-LLM, Ray, Spark
Ops: Prometheus, Grafana, Open Telemetry, Argo CD, OPA
Data: Feature stores (Feast), vector DBs (FAISS, Milvus, Pinecone), relational DBs

App Layer: APIs, microservices, frontend/backend integration for AI systems
Success Metrics
Reliability & Performance: SLOs met for infra services, GPU utilization optimized
Security & Compliance: Zero critical findings, auditable infra
Cost Efficiency: Reduced GPU/infra spend via Fin Ops strategies
Developer Velocity: Faster provisioning and deployment of AI infra

Technical Leadership: Influence on infra standards, mentorship, reusable patterns

Salary : $103,200.00 - $192,000.00

Pay Type: Salaried
About Us
At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting, positive change for our customers, our communities and our people. By working together, innovating and pushing boundaries, we transform lives and businesses, and power economic growth around the world.

As a member of the BMO team you are valued, respected and heard, and you have more ways to grow and make an impact. We strive to help you make an impact from day one – for yourself and our customers. We’ll support you with the tools and resources you need to reach new milestones, as you help our customers reach theirs. From in-depth training and coaching, to manager support and network-building opportunities, we’ll help you gain valuable experience, and broaden your skillset.

To find out more visit us at https://jobs.bmo.com/ca/en .
BMO is committed to an inclusive, equitable and accessible workplace. By learning from each other’s differences, we gain strength through our people and our perspectives. Accommodations are available on request for candidates taking part in all aspects of the selection process. To request accommodation, please contact your recruiter.

Note to Recruiters: BMO does not accept unsolicited resumes from any source other than directly from a candidate. Any unsolicited resumes sent to BMO, directly or indirectly, will be considered BMO property. BMO will not pay a fee for any placement resulting from the receipt of an unsolicited resume. A recruiting agency must first have a valid, written and fully executed agency agreement contract for service to submit resumes. .
Back to Listings

Create Your Resume First

Give yourself the best chance of success. Create a professional, job-winning resume with AI before you apply.

It's fast, easy, and increases your chances of getting an interview!

Create Resume

Application Disclaimer

You are now leaving Govtjobs.ca and being redirected to a third-party website to complete your application. We are not responsible for the content or privacy practices of this external site.

Important: Beware of job scams. Never provide your bank account details, credit card information, or any form of payment to a potential employer.