Fresh Hacker News | Learnings from 4 months of Image-Video VAE experiments

▲Learnings from 4 months of Image-Video VAE experiments(linum.ai)

48 points by schopra909 1 day ago | 6 comments

▲asaiacai 8 minutes ago

its cool to see the iterative improvements to your model laid out, but for everything that workedm i imagine there were at least a million other things you also tried but didnt work out. whats your process of trying these different techniques/architectures? do you just wait for one experiment to finish and visually inspect the results everytime. seems hard since these take a while to train. how do you shorten the feedback loop in this space?

▲greatgib 15 minutes ago

Very nice well written article!

The kind that I like so much on HN. It tickle your mind but is still clear enough for an advanced beginner.

▲schopra909 1 day ago

Hi HN, I’m one of the two authors of the post and the Linum v2 text-to-video model (https://news.ycombinator.com/item?id=46721488). We're releasing our Image-Video VAE (open weights) and a deep dive on how we built it. Happy to answer questions about the work!

▲selridge 20 minutes ago

No questions but I appreciate the write-up! Thank you for sharing.

▲DonThomasitos 41 minutes ago

Nice summary! I missed the mention of EQ-VAE when it comes to generation quality. Tiny trick, huge impact! Have you tried it?

▲schopra909 20 minutes ago

Hadn’t seen that before! Seems very in line with what with the broader points about regularization. In table 4 they show faster convergence in 200 epochs when used alongside REPA. I’d be curious to see if it ended up beating REPA by itself with full 800 epochs of training — or if something about this new latent space, leads to plateauing itself (learns faster but caps out on expressivity). We’ve seen that phenomena before in other situations (eg UNET learns faster than DiT because of convolutions, but stops learning beyond a certain point).

▲lastdong 1 hour ago

This seems like a great model to experiment fine tuning with original art, given it’s relatively small and with open license. Is that a fair assessment?

Thanks for the great write up and making it available to us all.

▲schopra909 1 hour ago

yep, Apache 2.0! so anyone's welcome to download and hack away

▲fjejfhdh 1 hour ago

[flagged]