ExpertGrid
← All jobs

GenAI Engineer – Test Agent / CI Integration

Typical $25–60/hr Worldwide Remote · worldwide coding Contract / freelance
Pay rate · Typical $25–60/hr
Typical hourly range for this type of role — the exact rate is confirmed by the hiring company.

Overview

  • Build AI-driven test agents for validating LLM-powered applications.
  • GenAI Engineer – Test Agent / CI Integration
  • Location: Gurugram, India ( hybrid)
  • Experience: 4–8 Years
  • Employment Type: Full-Time

Job Description

  • We are looking for a highly skilled GenAI Engineer – Test Agent / CI Integration to build and scale intelligent AI-powered testing systems for next-generation applications. The ideal candidate will work on automated test-agent frameworks, synthetic data generation, evaluation harnesses, and CI/CD-integrated AI testing pipelines.
  • This role requires strong expertise in Python backend development, LLM evaluation frameworks, retrieval-grounded testing systems, and modern DevOps practices.

Key Responsibilities

  • AI Test Agent Development
  • • Design and develop autonomous AI-driven test agents for validating GenAI and LLM-powered applications

• Build systems for

  • • Synthetic data generation
  • • Test-case synthesis
  • • Scenario generation
  • • Adversarial and edge-case testing
  • • Develop reusable evaluation harnesses for benchmarking model quality, accuracy, safety, and reliability

Context-Aware Test Generation

  • • Integrate test agents with BLK’s knowledge/context graph for retrieval-grounded testing
  • • Enable contextual test generation using RAG pipelines and graph-based retrieval systems
  • • Ensure generated tests align with enterprise knowledge sources and real-world workflows

CI/CD & Automation

  • • Integrate AI test agents into CI/CD pipelines as first-class pipeline jobs
  • • Automate regression testing, evaluation runs, and quality scoring during deployments
  • • Build scalable validation workflows for continuous model monitoring and release gating

Evaluation Frameworks & Quality Engineering

• Work with LLM evaluation frameworks such as

  • • DeepEval
  • • Ragas
  • • Custom evaluation frameworks

• Develop automated scoring mechanisms for

  • • Hallucination detection
  • • Faithfulness
  • • Relevance
  • • Toxicity
  • • Response quality
  • • Integrate with pytest and existing QA ecosystems

Backend & Infrastructure

  • • Build and maintain Python backend services powering evaluation workflows
  • • Optimize distributed evaluation execution for scalability and performance
  • • Collaborate with platform, MLOps, and DevOps teams for production deployment

Required Skills & Qualifications

  • Technical Skills
  • • Strong proficiency in Python

• Experience with

  • Pytest
  • CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, etc.)
  • REST APIs & backend development

• Hands-on experience with

  • LLM evaluation frameworks (DeepEval, Ragas, LangSmith, custom evaluators)
  • RAG systems and retrieval pipelines
  • Synthetic dataset generation
  • Prompt engineering and evaluation strategies
  • AI/ML & GenAI Expertise

• Strong understanding of

  • Large Language Models (LLMs)
  • Agentic systems
  • AI evaluation methodologies
  • Context grounding and knowledge retrieval
  • • Familiarity with vector databases, embeddings, and knowledge graphs

DevOps & Automation

  • • Experience integrating AI workflows into CI/CD environments
  • • Understanding of automated quality gates and testing orchestration

Preferred Qualifications

  • • Experience working with knowledge graphs or graph databases
  • • Exposure to LangChain, LlamaIndex, or similar orchestration frameworks
  • • Familiarity with Kubernetes, Docker, and cloud platforms (AWS/GCP/Azure)
  • • Experience in enterprise-scale AI platform engineering

Important Note

  • This is a niche requirement and not a regular GenAI developer role. We are specifically looking for candidates with experience in:
  • • AI validation/testing
  • • QE automation
  • • Python backend
  • • CI/CD integration
  • • LLM evaluation frameworks
  • • RAG and retrieval-grounded systems
Fill in your name, country and email to proceed to next step.
Looking for something else? Browse all AI jobs →