DinoDS isn’t “more scraped data.” It’s behavior engineering for LLMs.
I don’t think the interesting question anymore is “how much data did you scrape?” It’s: what exact model behavior did you engineer? That’s how we’ve been thinking about DinoDS. Not as one giant text pile, but as narrower training slices for things like: retrieval judgment grounded answering fixed structured output action / connector behavior safety boundaries The raw data matters, obviously. But the real value feels more and more like: task design, workflow realism, and how clearly the […]