DinoDS isn’t “more scraped data.” It’s behavior engineering for LLMs.

DinoDS isn’t “more scraped data.” It’s behavior engineering for LLMs.

I don’t think the interesting question anymore is “how much data did you scrape?”

It’s:
what exact model behavior did you engineer?

That’s how we’ve been thinking about DinoDS.

Not as one giant text pile, but as narrower training slices for things like:

  • retrieval judgment
  • grounded answering
  • fixed structured output
  • action / connector behavior
  • safety boundaries

The raw data matters, obviously.

But the real value feels more and more like:
task design, workflow realism, and how clearly the behavior is isolated.

That’s the shift I’m most interested in right now.

Less scraping.
More behavior engineering.

Curious if others here are thinking about datasets the same way.

Check it www.dinodsai.com :))

submitted by /u/JayPatel24_
[link] [comments]

Liked Liked