[Hiring] Reinforcement Learning Engineer @ Verita AI

digitado ⋅ 3 de March de 2026

Verita AI is building the “Gym” for LLM reasoning. We are moving beyond simple chat-based RLHF into complex, grounded RL environments where models must solve multi-step engineering and research problems to receive a reward.

The Mission

Design robust, un-hackable RL environments (Prompt + Judge + Tools) that challenge top-tier models (GPT-5.2, Claude opus 4.6). Think SWE-Bench, but for AI/ML research.

What We’re Looking For

Technical Fluency: Deep PyTorch/JAX knowledge and the ability to debug distributed training.
Adversarial Thinking: You can spot “shortcuts” a model might use to trick a reward function.
Research Intuition: You can translate a theoretical paper into a practical coding challenge.

Technical Assessment (Initial Step)

We skip the LeetCode. Your first task is to design an RL environment for LLM training. Requirements:

Prompt: A challenging, unambiguous task for an AI researcher.
Judge: A script that outputs a score (Pass/Fail or Continuous) with zero reward hacking.
Difficulty: If an LLM solves it in one shot, it’s too easy.

Apply Here

Fill out our initial assessment form to get started: Link to Application Form

submitted by /u/MutedJeweler9205
[link] [comments]

Like 0

Liked Liked