Return to jobs Return to jobs

Site Reliability Engineer

Oak National Academy

Clock

Posted over 30 days ago...

Expired

Shape the Future of Education as a Site Reliability Engineer at Oak's Dynamic Remote Team

Overview

icon Salary

£65460

icon Location

UK

icon Nomad Friendly?
Tick
icon Expires

Expires at anytime

Join Oak's mission to empower educators and students with free, high-quality online resources. Oak is a vibrant hub for educational innovation, dedicated to enhancing teaching and learning experiences through robust web applications. As part of the Product and Engineering team, you’ll be instrumental in elevating our platform's reliability and performance.

  • Ensure consistent high-standard monitoring across all key applications.
  • Collaborate with engineering teams to bolster application stability and security.
  • Drive the adoption of Site Reliability Engineering (SRE) principles throughout the company.
  • Lead initiatives on automation, improving platform usability, and enhancing performance and security.
  • Proficient understanding of SRE principles and a passion for automation.
  • Experience with complex systems, cloud infrastructures, and a willingness to learn.
  • Capability to work remotely with occasional UK meetings, within core working days.
  • Answer application questions thoroughly, as initial reviews are blind and CVs are not seen.
  • After closing, a panel will anonymously and randomly review responses.
  • Shortlisted candidates will undergo a multi-stage Zoom interview process with coding tests.

Please note that whilst we capture CVs as part of the application process, the initial sifting of applications occurs 'blind' and is based purely on the responses to the admin questions and sift questions. So please include sufficient detail as we won't have visibility of your CV at this stage. Please also make sure that you answer each question independently of each other, as they will be reviewed randomly, and the hiring panel will not know who the responses are from and how they relate to other questions at the initial sifting stage.

Please note that we are unable to offer sponsorship for this role.

Job Description

We are looking for a Site Reliability Engineer to join our Product and Engineering team. Oak’s websites provide free educational resources to teachers and pupils and we want to ensure we are providing them with the high level of service they deserve.

We have a range of web applications and some back end processes with monitoring configured by individual development squads based on their needs. Currently squads tend to defer infrastructure and availability issues to either platform team or management. In this role we are looking for someone who will bring our monitoring up to a high, consistent standard across all our key applications while also working closely with engineering teams to help them improve the stability and security of their applications and give engineers more of a sense of ownership of the operational concerns of their applications.

You will be someone who can share their passion for reliability engineering to lead on adoption of SRE principles by the rest of the department, and the wider organisation, to ensure we continue to maintain high levels of service availability.

As well as leading on SRE you will be a key driver of the departments use of automation and working alongside other members of the platform team helping to improve the usability, security and performance of our platform, infrastructure and tooling.

Candidates must have a good understanding of SRE principles and the value they bring to an organisation. While a good grounding in development practices, security fundamentals and infrastructure operation are key, specific technical skills are less important than a passion for automation, an ability to understand complex systems and a keenness to learn.

Site Reliability Engineer

Responsible to: Principal Engineer

Team: Product and Engineering

Term: Permanent

Location: Remote (with some occasional in-person UK meetings)

Hours: 36 hours per week (if full-time – flexible arrangements will be considered. Our core working days are Tuesday, Wednesday, Thursday, to allow effective collaboration time with colleagues).

Line management responsibility: none

Budget responsibility: none

Key external relationships: none

Responsibilities

  • Lead the continuous improvement of the performance, reliability and security of our web applications (Node, JS/TS, React, Next.js, Retool) and serverless/PaaS infrastructure (Netlify/Vercel, GCP, Cloudflare).

  • Promote and nurture a culture of quality across the product and engineering department, enabling teams to use SLO/SLAs to ensure they maintain a high quality of service delivery.

  • Take ownership of our monitoring, logging and reporting solutions to ensure they are easy to use and provide development teams with the information they need to understand service quality, resolve problems quickly, and gain meaningful insights into application behaviour.

  • Identify and implement ways to use automation to speed up development, secure systems, or improve the quality of the services we provide.

  • As a member of the Oak Team, contribute to the wider success of the organisation and support and role model our culture of inclusion, freedom, responsibility, and continuous improvement.

  • Work in cross-functional and product oriented squads with colleagues from across the organisation, as required. Oak has a strong focus on collaboration and mentoring.

  • Deputise for Principal Engineer and take on other general responsibilities as required.

Knowledge, skills, and experience

  • Considerable professional experience leading the continuous improvement of web service stability in a Site Reliability Engineer role (or similar), including recent experience working with serverless products and cloud infrastructure.

  • Extensive experience of designing and implementing monitoring and reporting solutions for complex cloud infrastructures.

  • Confident understanding and maintaining web application code and able to design and build small apps, preferably using JavaScript/TypeScript.

  • Experience working with Cloud computing platforms and a familiarity with Infrastructure as Code tools. We use Terraform but are happy for people to transfer skills from other relevant tools.

  • Comfortable promoting and leading a spirit of collaboration with a range of technical and non-technical stakeholders.

The successful candidate will have a desire to contribute in all areas to ensure Oak is successful. You will be comfortable working at pace, with a range of digital systems (including proprietary ones as required) and you will continuously look at ways that the team can keep getting better. You will be excellent at working as part of a remote team, building relationships and managing your time effectively.

Next Steps

You’ll answer some questions that are related to your day-to-day job. After the job closes, your answers will go through our sift process: all answers will be anonymised, randomised and then reviewed by a panel of reviewers.

If you are shortlisted, we’ll invite you to the next steps, all carried out over Zoom, which involve a one hour of questions with engineering colleagues, immediately followed by a one-hour simple coding test in JavaScript (a basic framework of code will already be in place), then a second one-hour interview stage with colleagues from the wider organisation. At the end of the application process, we will provide you with feedback.

We are aiming to start interviews mid June 2024.

We are experiencing really good responses to our job adverts. This may lead us to closing the role early, so if you are considering applying, then please get your application in early to avoid missing out.

We are an equal opportunities employer

We are committed to a policy of Equal Employment Opportunity and are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender, age, disability, religion, belief, sexual orientation, marital status, or race, or is disadvantaged by conditions or requirements which cannot be shown to be justifiable.

Medal
Computer

Hire with Escape

Showcase your progressive organisation and post your open roles to the biggest UK community of purpose driven job seekers.

Get Started