In this paper, we propose an end-to-end model that learns to interpret
natural language describing a scene to generate an abstract pictorial
Use your arXiv email address to see your arXiv papers in Ground AI.
By signing up you accept our content policy
Already have an account? Sign in
No a member yet? Create an account