Senior Principal Engineer Site Reliability
Company: Dell
Location: Salt Lake City
Posted on: April 30, 2024
Job Description:
Senior Principal Engineer Site ReliabilityDell Technologies
customers rely on our products and services to drive progress. So,
we take the service we provide extremely seriously. Service
Delivery is all about making sure our technical solutions help
clients fulfil their priorities, challenges and initiatives. As
trusted advisors, we build in-depth knowledge of what each client
wants to achieve. Then we make sure the services delivered by Dell
Technologies deliver on all our promises. We also work closely with
Sales and Global Services colleagues to develop strategic account
growth plans, and to identify and pursue sales opportunities.Join
us to do the best work of your career and make a profound social
impact as a Senior Principal Engineer - Site Reliability
Engineering on our Service Delivery Team in Austin, Texas.What
you'll achieveThe Senior Principal Engineer- Site Reliability
Engineering supporting Artificial Intelligence/Machine
Learning/High Performance Compute Solutions, Service Delivery will
be responsible for providing the primary management,
administration, support, and ongoing maintenance of customer
Platforms within a 24x7x365 datacenter environment. This is a
technical leadership role. The ideal candidate will play a crucial
role in managing and supporting complex solutions and platforms for
our prestigious Fortune 100 clients.The role will be expected to
work in a positive and collaborative fashion with fellow team
members, senior engineering/architect staff, vendors, and
customers. The Senior Principal Engineer will assist with process
maturation, development, technical standards creation, and drive
operational excellence through consistent delivery and best
practices.You will:
- Serve as the top technical expert in deploying, upgrading,
troubleshooting Artificial Intelligence/Machine Learning/High
Performance Compute Solutions platforms
- Manage and maintain container platform (Kubernetes, OpenShift)
infrastructure, including installation, configuration, and upgrades
and optimize system performance, capacity, and availability of the
environment
- Act in the capacity of an SRE / DevOps expertTake the first
step towards your dream careerEvery Dell Technologies team member
brings something unique to the table. Here's what we are looking
for with this role:Essential Requirements
- Hands on experience working in an infrastructure managed
services environment, supporting complex engineered solution in
production with Artificial Intelligence/Machine Learning/High
Performance Compute Systems and Platforms, Converged/
Hyper-Converged infrastructure along with fluency in AI/ML
pipelines, Nvidia GPU optimization, InfiniBand networking, Machine
Learning operating systems such as cnvrg.io, Compute Orchestration
Platform such as runai etc
- Expert-level knowledge of cluster provisioning and resource
schedulers
- Programming experience with Python, Go, Ruby, Shell Scripts,
PowerShell along with hands on experience with ELK, Prometheus,
Grafana, Ansible, Git, or similar technologies
- Expertise in Kubernetes, OpenShift, Docker, Container
Networking, and Cloud Native Platform/ Applications
- Strong Networking Fundamentals along with Converged Infra
(CI)/Hyper Converged Infa (HCI) Management Certification along with
hands-on experience with Amazon Kubernetes Service (AKS), Amazon
EKS, Google Kubernetes Engine (GKE), RancherDesirable Requirements
- BE or MS in Computer Science or Computer Engineering or
acceptable combination of equivalent industry experience will be
considered
- Certified Kubernetes / OpenShift Admin, NSX T CertificationWho
we areWe believe that each of us has the power to make an impact.
That's why we put our team members at the center of everything we
do. If you're looking for an opportunity to grow your career with
some of the best minds and most advanced tech in the industry,
we're looking for you.Dell Technologies is a unique family of
businesses that helps individuals and organizations transform how
they work, live and play. Join us to build a future that works for
everyone because Progress Takes All of Us.Application closing date:
03/22/2024Dell Technologies is committed to the principle of equal
employment opportunity for all employees and to providing employees
with a work environment free of discrimination and harassment. Read
the full Equal Employment Opportunity Policy here.Job
ID:R241321Dell's Flexible & Hybrid Work CultureAt Dell
Technologies, we believe our best work is done when flexibility is
offered.We know that freedom and flexibility are crucial to all our
employees no matter where you are located and our flexible and
hybrid work style allows team members to have the freedom to
ideate, be innovative, and drive results their way. To learn more
about our work culture, please visit our locations page.
Keywords: Dell, Murray , Senior Principal Engineer Site Reliability, Professions , Salt Lake City, Utah
Didn't find what you're looking for? Search again!
Loading more jobs...