distributed wrapper does not compute the local rank loss

Now distributed loss wrapper computes a loss for the global batch, it is global loss and every rank has the same loss,  it may be not right. Is it better to compute a loss using local batch vs global batch,  and every lank has its own loss，what would you think about?