[Hiring] Reinforcement Learning Engineer @ Verita AI
Verita AI is building the “Gym” for LLM reasoning. We are moving beyond simple chat-based RLHF into complex, grounded RL environments where models must solve multi-step engineering and research problems to receive a reward.
The Mission
Design robust, un-hackable RL environments (Prompt + Judge + Tools) that challenge top-tier models (GPT-5.2, Claude opus 4.6). Think SWE-Bench, but for AI/ML research.
What We’re Looking For
- Technical Fluency: Deep PyTorch/JAX knowledge and the ability to debug distributed training.
- Adversarial Thinking: You can spot “shortcuts” a model might use to trick a reward function.
- Research Intuition: You can translate a theoretical paper into a practical coding challenge.
Technical Assessment (Initial Step)
We skip the LeetCode. Your first task is to design an RL environment for LLM training. Requirements:
- Prompt: A challenging, unambiguous task for an AI researcher.
- Judge: A script that outputs a score (Pass/Fail or Continuous) with zero reward hacking.
- Difficulty: If an LLM solves it in one shot, it’s too easy.
Apply Here
Fill out our initial assessment form to get started: Link to Application Form
submitted by /u/MutedJeweler9205
[link] [comments]
Like
0
Liked
Liked