E01 Site Reliability Engineer IV
Remote
EXPANSIA – EXPANSIA /
Full Time /
Remote
Start Date: Immediate
EXPANSIA is a service-disabled veteran-owned company that empowers organizations to be mission ready now with data, people, and ecosystems. As experts in continuous-delivery methods that drive digital adoption, we are dedicated to innovation, efficiency, and technology that benefit the warfighter. EXPANSIA specializes in integration, automation, and sustainment modernization through technology-enabled delivery models, digital engineering, and cloud-ready solutions.
OVERVIEW
Full-time/Permanent Employee
Location: Remote
As a Site Reliability Engineer IV, you will be responsible for ensuring the security, availability, and performance of complex DoD mission systems and applications. You will apply SRE principles—such as automation, observability, reliability engineering, and incident response—to design, operate, and continuously improve resilient systems. Leveraging your technical expertise, you will implement site reliability best practices, integrate cybersecurity controls in accordance with DoD 8140 requirements, and collaborate with cross-functional teams to deliver secure, scalable, and highly available solutions. Your work will help minimize downtime, prevent failures, and ensure compliance with mission-critical standards.
The proposed salary range for this position is $118,485-$144,000.There are a host of factors that can influence final salary including, but not limited to, Federal Government contract labor categories and contract wage rates, relevant prior work experience, specific skills and competencies, geographic location, education, and certifications. Our employees value the flexibility EXPANSIA allows them to balance quality work and their personal lives. We offer competitive compensation, benefits and learning and development opportunities. Our unique mix of benefits options is designed to support and protect employees and their families. Employment benefits include health and wellness programs, income protection, paid leave and retirement and savings.
RESPONSIBILTIES
- Design, implement, and maintain systems with high availability, fault tolerance, and disaster recovery capabilities.
- Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to balance reliability with innovation.
- Develop observability solutions (logging, monitoring, tracing, alerting) to proactively detect anomalies and mitigate risks.
- Lead incident response efforts, perform root cause analysis, and conduct blameless postmortems to drive continuous improvement.
- Automate system deployments, configuration management, and operational tasks to reduce manual intervention and human error.
- Build self-healing and auto-scaling solutions that adapt to mission demands while maintaining compliance with DoD cybersecurity requirements.
- Implement, validate, and maintain cybersecurity controls aligned with DoD 8140/8570, RMF, and NIST 800-53 standards.
- Perform vulnerability assessments, patch management, and system hardening to safeguard mission systems against evolving threats.
- Partner with software engineering, DevSecOps, and infrastructure teams to integrate reliability and cybersecurity into the development lifecycle.
- Support subcontractor and vendor evaluations, ensuring compliance with reliability, security, and DoD standards.
- Analyze system failure data, usage patterns, and mission performance metrics to identify trends and recommend improvements.
- Contribute to process optimization initiatives, quality improvements, and the adoption of new reliability and security technologies.
- Ensure all contractual deliverables are met or exceeded to customer satisfaction
- Complete personal PDP and attend Staff Meeting and Storytime (with camera on)
- Build productive and positive professional relationships with clients within the program
- Execute all contract requirements in accordance with contract-specific LCAT and requirements
- Perform other related duties as assigned
KEY QUALIFICATIONS
- Clearance: Secret Clearance
- Education and Years of Experience: Bachelor's degree (or equivalent) with 8-10 years of experience, or a Master’s degree with 6-8 years of experience .
- Demonstrated experience in site reliability engineering, systems engineering, or DevSecOps in secure or defense environments.
- Strong knowledge of system observability, monitoring, and incident response practices.
- Familiarity with cloud environments (AWS, DoD IL environments) and container orchestration platforms (AWS ECS).
- Proficiency in automation tools (Ansible, Terraform, CI/CD pipelines) and scripting languages (Python, Bash, PowerShell).
- Understanding of RMF, NIST SP 800-53, DISA STIGs, and related DoD cybersecurity frameworks.
- Strong analytical and communication skills to work effectively across engineering, operations, and cybersecurity teams.
- Ability to work under general direction while independently determining and implementing solutions for complex reliability challenges
- Effective problem-solving and communication skills to collaborate with cross-functional teams and stakeholders
- Proficiency in reliability modeling, failure mode analysis, and predictive maintenance methodologies
- Security + certification or equivalent
PREFERRED ADDITIONAL QUALIFICATIONS
- Experience supporting DoD programs, secure networks, or mission-critical systems.
- Hands-on expertise with chaos engineering, fault injection, and reliability testing.
- Familiarity with compliance automation tools (e.g., OpenSCAP, Evaluate-STIG, Windows Group Policy).
- Background in DevSecOps, CI/CD pipelines, microservices architectures, and zero-trust security models.
- Experience with performance testing, predictive maintenance, and capacity planning in secure environments.
- Red Hat Systems Administrator (RHSA); Linux Professional Institute Certification (LPIC); AWS Certified Cloud Practitioner; AWS Certified Solutions Architect - Associate
EXPANSIA is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, age, national origin, disability, status as a protected veteran, or any other protected characteristic.