You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Unified Sequence Parallelism](https://huggingface.co/papers/2405.07719) combines Ring Attention and Ulysses Attention into a single approach for efficient long-sequence processing. It applies Ulysses's *all-to-all* communication first to redistribute heads and sequence tokens, then uses Ring Attention to process the redistributed data, and finally reverses the *all-to-all* to restore the original layout.
@@ -360,4 +341,24 @@ We ran a benchmark with Ulysess, Ring, and Unified Attention with [this script](
360
341
| ring | 13076.492 | 3.82 | 56.02 |
361
342
| unified_balanced | 11068.705 | 4.52 | 33.85 |
362
343
363
-
From the above table, it's clear that Ulysses provides better throughput, but the number of devices it can use remains limited to number of attention-heads, a limitation that is solved by unified attention.
344
+
From the above table, it's clear that Ulysses provides better throughput, but the number of devices it can use remains limited to the number of attention heads, a limitation that is solved by unified attention.
345
+
346
+
### parallel_config
347
+
348
+
Pass `parallel_config` during model initialization to enable context parallelism.
0 commit comments