kingoflolz/mesh-transformer-jax

https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b, на сайте с May 05, 2023 09:06
A haiku library using the xmap/pjit operators in JAX for model parallelism of transformers. The parallelism scheme is similar to the original Megatron-LM, which is efficient on TPUs due to the high speed 2d mesh network. There is also an experimental model version which implements ZeRo style sharding.