Typical hourly range for this type of role — the exact rate is confirmed by the hiring company.
Overview
Build AI-driven test agents for validating LLM-powered applications.
GenAI Engineer – Test Agent / CI Integration
Location: Gurugram, India ( hybrid)
Experience: 4–8 Years
Employment Type: Full-Time
Job Description
We are looking for a highly skilled GenAI Engineer – Test Agent / CI Integration to build and scale intelligent AI-powered testing systems for next-generation applications. The ideal candidate will work on automated test-agent frameworks, synthetic data generation, evaluation harnesses, and CI/CD-integrated AI testing pipelines.
This role requires strong expertise in Python backend development, LLM evaluation frameworks, retrieval-grounded testing systems, and modern DevOps practices.
Key Responsibilities
AI Test Agent Development
• Design and develop autonomous AI-driven test agents for validating GenAI and LLM-powered applications
• Build systems for
• Synthetic data generation
• Test-case synthesis
• Scenario generation
• Adversarial and edge-case testing
• Develop reusable evaluation harnesses for benchmarking model quality, accuracy, safety, and reliability
Context-Aware Test Generation
• Integrate test agents with BLK’s knowledge/context graph for retrieval-grounded testing
• Enable contextual test generation using RAG pipelines and graph-based retrieval systems
• Ensure generated tests align with enterprise knowledge sources and real-world workflows
CI/CD & Automation
• Integrate AI test agents into CI/CD pipelines as first-class pipeline jobs
• Automate regression testing, evaluation runs, and quality scoring during deployments
• Build scalable validation workflows for continuous model monitoring and release gating
Evaluation Frameworks & Quality Engineering
• Work with LLM evaluation frameworks such as
• DeepEval
• Ragas
• Custom evaluation frameworks
• Develop automated scoring mechanisms for
• Hallucination detection
• Faithfulness
• Relevance
• Toxicity
• Response quality
• Integrate with pytest and existing QA ecosystems
Backend & Infrastructure
• Build and maintain Python backend services powering evaluation workflows
• Optimize distributed evaluation execution for scalability and performance
• Collaborate with platform, MLOps, and DevOps teams for production deployment