The idea behind hypernetworks is that they enable Gelman-style partial pooling to explicitly modeling the data generation process while leveraging the flexibility of neural network tooling. I’m curious to read more about your recommendations: their connection to the described problems is not immediately obvious to me but I would be curious to dig a bit deeper.
I agree that hypernetworks have some challenges associated with them due to the fragility of maximum likelihood estimates. In the follow-up post, I dug into how explicit Bayesian sampling addresses these issues.