Intuit Staff Site Reliability Engineer in Mountain View, California

Description

 

 

Responsibilities

  • Applies full understanding of the business, the customer, and the solutions that a business offers to effectively design, develop, and implement operational capabilities, tools and processes that enable highly available, scalable & reliable customer experiences

  • Utilizes their deep knowledge of operations engineering, connected services, and information technology plus their knowledge of industry best practices to innovate and influence operational approaches and solutions

  • Works on significant assignments that are broad in scope and complexity, may cross several functional and organizational boundaries, and cover a wide range of issues

  • Exercises independent judgment in the selection of methods and techniques used to deliver operational solutions. Considers build, buy and partnering alternatives in the selection process

  • Creates formal internal and external networks outside of own area of expertise to leverage and adopt ideas, technologies and best practices that helps the organization move fast

  • Coaches and mentors other application operations engineers on methods and techniques

  • Coordinates technical dependencies with other teams

  • May be a technical lead for complex projects

  • Will participate in the definition of project objectives

  • May influence organizational goals beyond a specific project

 

Qualifications

 

 

  • Incident management reports, including initial problem analysis, management status, resolution, and follow up defect reporting

  • Technical documentation on supported applications & operational tools

  • Application Deployment Plan and implementation

  • Configuration of monitoring agents at the software layer, and the development of meaningful alerts and the escalation procedures

  • Responses to monitoring alerts according to defined playbooks and procedures

  • Participation in Root Cause Analysis (RCA) processes

  • Implementation of business operations standards

  • Suggestions for process improvements and enhanced operational efficiencies

  • Implementation of monitoring agents

  • Management of application deployment processes

  • Management of RCA processes for a specific application

  • Implementation of improved operational processes

  • Real Time Application Dashboards showing overall health of the system

  • Code reviews of operational solutions

  • Facilitate the creation of the Operational readiness documents

  • Review and development of performance and capacity plans (operational capacity and load requirements)

  • Specifications for onboarding new offerings, including trouble shooting, patch processes, cross organizational incident management processes, security breach response plans, etc.

  • Implementation plans for application disaster recovery, migration, roll-back plans, expansion, routine deployments, and system upgrades

  • Metrics reporting on applications performance, availability, reliability, etc.

  • Design reviews of operational approaches and solutions

  • Contributions to Operational Standards and Requirements

  • Risk Analysis and root cause analysis

  • Technical feasibility and approach decisions

  • Public presentations at internal events

 

 

EOE AA M/F/Vet/Disability