Ongoing research training transformer models at scale

16,553 stars
4,023 forks
Python
52 views