Learning an effective attention mechanism for multimodal data is important in
many vision-and-language tasks that require a synergic understanding of…
Use your arXiv email address to see your arXiv papers in GroundAI.
By signing up you accept our content policy
Already have an account? Sign in
No a member yet? Create an account