SRE (Remote) 210031

U.S.A. Remote /
Technology & Operations – Cloud Operations /
Full time
Ellie Mae is the leading cloud-based platform provider for the mortgage finance industry. Ellie Mae’s technology solutions enable lenders to originate more loans, reduce origination costs, and reduce the time to close, all while ensuring the highest levels of compliance, quality and efficiency. Visit ‪ to learn more.

SRE to assist with day to day activities supporting Platform SRE services related to incidents. Build actionable alerts/automation for preventing incidents, detecting performance bottlenecks and identifying maintenance activities.


    • Employ deep troubleshooting skills to improve the availability, performance, and security of Ellie Mae Services.
    • Coding and Automation of Applications on Cloud Platform
    • Implement automated tests, automated deployments, and operational tools
    • Collaborate with Product and Support teams to plan and deploy product releases
    • Set Strategic and Operational goals for team, and work with team to deliver on goals.
    • Work with Cloud Platform and Operations leaders to develop narratives, backlog grooming, epic planning and overall sprint planning processes
    • Work with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams
    • Ensure services are designed with 24/7 availability and operational readiness and rigor
    • Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
    • Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
    • Contribute to product development / engineering as needed to ensure Quality of Service of Highly Available services
    • Identifies, evaluates and executes preventive measures to minimize/avoid impact to the customers experience. Proactive v/s Customer escalated
    • Resolution of product/service defects or design changes, infrastructure changes, or operational changes
    • Partner with other SREs and lead by example - contributor more than a delegator


    • 5+ years of Systems/Applications automation in 24x7 Production Services environments
    • BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
    • Fluency with one or more current generation scripting language used by DevOps professionals (Python, Perl, PHP, Ruby) + Java Development and/or .NET
    • Excellent troubleshooter, utilizing a systematic problem-solving approach
    • Demonstrated experience in designing, analyzing, and diagnosing large-scale distributed systems + Windows Server and/or Linux systems internals (system libraries, file systems, client-server protocols)
    • Experience with elastically scalable, fault tolerance and other cloud architecture patterns
    • Experience operating on AWS (both PaaS and IaaS offerings)
    • Experience in both Windows (2k8R2+) and Linux
    • Experience with Continuous Integration and Continuous Delivery concepts, including Infrastructure as code utilizing tools like Terraform, Cloudformation and Chef/SaltStack
    • Experience in Containerization concepts like Docker, and
    • PaaS services on AWS.
    • NoSQL/Docker/Micro-services experience
    • Proven strength in SaaS services, experience in massive scale web operations

Ellie Mae is an equal opportunity and affirmative action employer. Women, minorities, people with disabilities, and veterans are encouraged to apply.

We do not accept resumes from headhunters, placement agencies, or other suppliers that have not signed a formal agreement with us.