Jointly estimating interactions and head, body pose of interactors from distant social scenes