Job Brief
We are looking for a Site Reliability Engineer to join our team and develop software systems and automated solutions for operational aspects in an organization. Site Reliability Engineer responsibilities include monitoring computer systems and building alerts for various operational issues that computer systems can experience. Ultimately, you will work with our Engineering, Compliance & Security, and IT team to ensure our organization can continue to deliver our products and services.
-
Run the production environment and monitor high availability and system health
-
Improve reliability, quality, and time-to-market for all software versions
-
Build systems to manage applications and infrastructure
-
Gather and analyze data from operating systems to troubleshoot and fine-tune performance
-
Offer primary engineering and operational support for distributed software applications
-
Work with development teams to test and improve services
-
Measure and optimize system performance
-
Contribute to platform management, capacity planning, design consulting, service level objective (SLOs) establishment
-
Push for continuous improvement and anticipate customer needs
-
Use automation to create sustainable services
-
Proven work experience as a Site Reliability Engineer or similar role
-
Collaborate and communicate asynchronously
-
Document all the things so you don’t need to learn the same thing twice
-
Have an enthusiastic, go-for-it attitude
-
Relevant training and/or certifications as a Site Reliability Engineer
-
Experience with Kubernetes, Google Cloud Platform, DataDog, Pulumi, Github, and CI/CD
-
Experience with Doppler and/or Mint CI/CD (https://rwx.com) is a plus
About HighTide
HighTide is a fully custom CRM for nonprofits, with marketing, content, AI, and ops functionality to allow users to track all their data and build strategy from a total understanding of trends.