reCAPTCHA WAF Session Token
Data Science and ML

Video Highlights: Ultimate Guide To Scaling ML Models – Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

In this video presentation, Aleksa Gordić explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta’s OPT-175B, BigScience BLOOM 176B, EleutherAI’s GPT-NeoX-20B, GPT-J, OpenAI’s GPT-3, Google’s PaLM, DeepMind’s Chinchilla/Gopher models, etc. He covers the ideas of data parallelism, model/pipeline parallelism (e.g. GPipe, PipeDream, etc.), model/tensor parallelism (Megatron-LM), activation checkpointing, mixed precision training, ZeRO (zero redundancy optimizer) from Microsoft’s DeepSpeed library and many more. Along the way, many top research papers are highlighted. The video presentation is sponsored by AssemblyAI.

Papers:

✅ Megatron-LM paper: https://arxiv.org/abs/1909.08053

✅ ZeRO (DeepSpeed) paper: https://arxiv.org/abs/1910.02054v3

✅ Mixed precision training paper: https://arxiv.org/abs/1710.03740

✅ Gpipe (pipeline parallelism) paper: https://arxiv.org/abs/1811.06965

Articles:

✅ Collective ops: https://en.wikipedia.org/wiki/Collect…

✅ IEEE float16 format: https://en.wikipedia.org/wiki/Half-pr…

✅ Google Brain’s bfloat16 format: https://cloud.google.com/blog/product…

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
WP Twitter Auto Publish Powered By : XYZScripts.com
SiteLock Consent Preferences

Adblock Detected

Please consider supporting us by disabling your ad blocker