Abstract--- Exploiting the spatial structure in scene images is a key research direction for scene recognition. Due to the large intra-class structur…
Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pre-trained convolutional neural networks (CN…
Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for…
Recent works using artificial neural networks based on distributed word representation greatly boost performance on various natural language processi…
Video person re-identification (re-ID) plays an important role in surveillance video analysis. However, the performance of video re-ID degenerates se…
Rotation-invariant face detection, i.e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applicatio…
Visual Question Answering (VQA) is a challenging task for evaluating the ability of comprehensive understanding of the world. Existing benchmarks usu…
In crowdsourced preference aggregation, it is often assumed that all the annotators are subject to a common preference or social utility function whi…
Person re-identification (reID) benefits greatly from deep convolutional neural networks (CNNs) which learn robust feature embeddings. However, CNNs …
Recent years have seen remarkable progress in semantic segmentation. Yet, it remains a challenging task to apply segmentation techniques to video-bas…
Benefitted from its great success on many tasks, deep learning is increasingly used on low-computational-cost devices, e.g. smartphone, embedded devi…
Recent works using artificial neural networks based on word distributed representation greatly boost the performance of various natural language lear…
In video captioning task, the best practice has been achieved by attention-based models which associate salient visual components with sentences in t…
In many scenarios of Person Re-identification (Re-ID), the gallery set consists of lots of surveillance videos and the query is just an image, thus R…
Multi-view face detection in open environment is a challenging task due to diverse variations of face appearances and shapes. Most multi-view face de…
Image-text retrieval of natural scenes has been a popular research topic. Since image and text are heterogeneous cross-modal data, one of the key cha…
Facial attribute editing aims to modify either single or multiple attributes on a face image. Since it is practically infeasible to collect images wi…
Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural net…
In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additio…
By signing up you accept our content policy
Already have an account? Sign in
No a member yet? Create an account