We introduce a novel deep neural network architecture that links visual
regions to corresponding textual segments including phrases and words. To
Use your arXiv email address to see your arXiv papers in GroundAI.
By signing up you accept our content policy
Already have an account? Sign in
No a member yet? Create an account