[D] Evaluating a hybrid actuarial/ML mortality model — how would you assess whether the NN is adding real value?

I’ve been experimenting with a hybrid setup where a traditional actuarial model provides a baseline mortality prediction, and a small neural network learns a residual correction on top of it. The idea is to test whether ML can add value after a strong domain model is already in place.

Setup:

– 10 random seeds

– 10‑fold CV per seed

– deterministic initialization

– isotonic calibration

– held‑out external validation file

– hybrid = weighted blend of actuarial + NN residual (weights learned per‑sample)

Cross‑validated AUC lift (hybrid – actuarial):

Lift by seed:

0 0.0421

1 0.0421

2 0.0413

3 0.0415

4 0.0404

5 0.0430

6 0.0419

7 0.0421

8 0.0421

9 0.0406

Folds where hybrid > actuarial:

seed

0 10

1 10

2 10

3 10

4 9

5 9

6 10

7 9

8 9

9 9

Overall averages:

Pure AUC: 0.7001

Hybrid AUC: 0.7418

Net lift: 0.0417

Avg weight: 0.983

External validation (held‑out file):

Brier (Actuarial): 0.011871

Brier (Hybrid): 0.011638

The actuarial model is already strong, so the NN seems to be making small bias corrections rather than large structural changes. The lift is consistent but modest.

My question:

For those who have worked with hybrid domain‑model + NN systems, how do you evaluate whether the NN is providing meaningful value?

I’m especially interested in:

– interpreting small but consistent AUC/Brier gains

– tests you’d run to confirm the NN isn’t just overfitting noise

– any pitfalls you’ve seen when combining deterministic models with learned components

Happy to share more details if useful.

submitted by /u/richtnyc
[link] [comments]

Liked Liked