ExpertGrid
← All jobs

Site Reliability Engineer

Typical $25–60/hr Worldwide Remote · worldwide coding Contract / freelance
Pay rate · Typical $25–60/hr
Typical hourly range for this type of role — the exact rate is confirmed by the hiring company.
  • Job Description
  • Site Reliability Engineer
  • Contractor
  • Remote

Job Summary

In this role, you'll apply your expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality, real-world input. No prior experience in AI is required — your domain knowledge is what matters.

Key Responsibilities

  • Design, implement, and maintain scalable infrastructure using Linux, Kubernetes, and Prometheus.
  • Monitor system health, analyze performance metrics, and proactively address bottlenecks or potential failures.
  • Automate operational processes to minimize manual intervention and increase system reliability.
  • Respond swiftly to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures.
  • Collaborate closely with development and operations teams to deliver seamless deployments and high system availability.
  • Create comprehensive documentation and clear runbooks for operational excellence and knowledge sharing.
  • Champion best practices in SRE, security, and compliance across the customer's ecosystem.

Required Skills and Qualifications

  • Expert-level hands-on experience with Linux system administration and troubleshooting.
  • Advanced proficiency with Kubernetes, including cluster deployment, operations, and management.
  • Deep knowledge of Prometheus for monitoring, metrics collection, and alerting.
  • Strong scripting abilities (Bash, Python, or similar) for automation and tooling.
  • Excellent written and verbal communication skills, with the ability to document and share knowledge effectively.
  • Proven track record in site reliability engineering or similar roles in high-availability environments.
  • Demonstrated commitment to proactive problem-solving and collaborative teamwork.

Preferred Qualifications

  • Experience with other cloud-native tools (e.g., Grafana, Helm, Istio, or similar).
  • Certifications in Kubernetes, Linux, or cloud platforms.
  • Background in high-growth or large-scale production environments.
Fill in your name, country and email to proceed to next step.
Looking for something else? Browse all AI jobs →