Model parallelism has a to build a pipeline of GPU. Each GPU handles a layer of the model. This is used when the model is too big to fit on a single GPU memory. The efficiency is affected with bubble in the pipeline when data passing through GPU at different speed
Data parallelism is used if the model can fit on a single GPU. Data us the divided into mini-batch to run in ea h GPU. Gradients are computed and combined to adjust the weighs
Tensor parallelism is to map different part of the model to multiple GPU. It is different from model parallelism such that portion of layers and not the entire layer is map to a GPU Input is split to feed into different GPU based on the mapping boundary
No comments:
Post a Comment