Automation Test Engineer

Typical $25–60/hr Worldwide Remote · worldwide coding Contract / freelance

Pay rate · Typical $25–60/hr

Typical hourly range for this type of role — the exact rate is confirmed by the hiring company.

Overview

Diagnose and verify automated test failures in Docker environments.

About the hiring company

The hiring company is one of the world’s fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.
The hiring company helps customers in two ways: Working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that work to build real-world AI systems that solve mission-critical priorities for companies.

About the Role

Testers verify that each task's automated test suite works correctly. You run the reference solution inside a Docker container and confirm it scores 1.0 against the verifier. If it doesn't pass, you diagnose the failure, determine whether the issue lies in the solution, the verifier, or the environment, and escalate accordingly. Testers are the last checkpoint before a task is marked ready to ship.

What You Will Do

Execute reference solutions inside Docker containers and capture the output Run the pytest-based test suite against the solution output
Diagnose failures: distinguish between a broken solution, a misconfigured verifier, an environment issue, or a task design flaw
Report failures clearly to the Pod Lead or Annotator with enough detail to reproduce the issue Track test results in the shared tracker and flag tasks that are consistently failing or require environment changes

Required

Bachelor's degree in Computer Science, Engineering, or a related technical field
5+ years in software QA, DevOps, backend development, or automated testing Comfortable working in a Linux command-line environment Hands-on experience with Docker (building images, running containers, reading logs)
Ability to read and understand Python code and basic pytest output Systematic debugging mindset:
Ability to isolate whether a failure is in the code, the config, or the environment

Nice to Have:

Familiarity with CI/CD pipelines and containerised test environments Experience with domain-specific libraries (e.g., pandas, openpyxl, pdfplumber, librosa, astropy) depending on batch assignment Understanding of how LLM evaluation pipelines work

Offer Details

Commitments Required

40 hours per week with overlap 4 hours with PST

Engagement type

Contractor assignment(no medical/paid leave)

Duration of contract

: 2 months; [expected start date is next week]

Location

: India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, Mexico

Evaluation Process (approximately 60 mins)

1 round of interviews (Round 1 - 60 min technical )

Fill in your name, country and email to proceed to next step.

Looking for something else? Browse all AI jobs →