VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets

VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets

Abstract

We present a visualization tool to exhaustively search and browse through a set of large-scale machine learning datasets. Built on the top of the VizWiz dataset, our dataset browser tool has the potential to support and enable a variety of qualitative and quantitative research, and open new directions for visualizing and researching with multimodal information. The tool is publicly available at https://vizwiz.org/browse.

\iccvfinalcopy

1 Introduction

A major challenge of working with large-scale machine learning datasets is the difficulty of exploratory data analysis [6]. Researchers often want to immerse themselves in the data. For datasets containing thousands of images and annotations, there is no straightforward way to do this. Visualization efforts involve writing very specific programs or scripts to generate plots, such as bar charts containing counts of different categories, or sunburst diagrams showing relative proportions of different annotations. However, these visualization attempts can produce aggregated results only, thereby hiding interesting examples. Even for data cleaning and quality control purposes, manually going through each individual image and its annotation is tedious, and is prone to human error.

To overcome these challenges, we developed the VizWiz dataset browser. The VizWiz dataset originates from people who are blind, who used mobile phones to snap photos and record questions about them (\eg, “what type of beverage is in this bottle?” or “has the milk expired?”), and contains images paired with questions about the image [3]. Subsequent research has generated a variety of annotations on top the VizWiz dataset. These include: ten crowdsourced answers to each visual question [5], reasons explaining why the ten answers can differ, if they do [2]; captions for describing the images to users with visual impairments; the multitude of skills needed by an AI system to automatically answer the visual question; quality issues present in the images (since they were captured by users who could not see the photo they were capturing), and whether text is present in the image. As more and more annotations were being collected, we felt the need to view all these different kinds of rich data in a single platform, in order to get a holistic view of the information contained within these datasets.

2 Design and Implementation

The VizWiz Dataset Browser is a single-page web-application built on the Linux - Apache - MariaDB - PHP (LAMP) stack. It supports searching for textual annotations, and filtering for categorical annotations. The main purpose of the tool is to view images, and search for those images using the ‘meta-data’ provided by the annotations. To scale effortlessly with an increasing variety of annotations, we decided to keep the search functionalities on the left side of the screen, in its own independently scrollable section. By not opting for a horizontal layout of the search and filter options, we can display more dynamic information above the fold. Similar design choices are employed by popular eCommerce websites which display numerous filters on their search-results page [1, 4, 7].

2.1 Visualization Section

Figure 1 shows a screenshot of the main information visualization area. The image and the textual annotations: (a) question, (b) ten answers, and (c) five captions, are displayed in their natural form, while the categorical annotations: (d) answer-difference reasons, (e) skills, and (f) quality issues, are displayed as one-dimensional heatmaps, based on how many crowdworkers (out of 5) selected a categorical label.

2.2 Summary of Results

The top portion of the visualization section shows a summary of the search results. This includes the number of total images found for the current search and/or filter query, and the range of images shown on the current page. To support minimal page loading times, we decided to show a maximum of 50 images per page. Users can choose to view the thumbnails of all the images displayed on the current page (as shown in Figure 1) by clicking on ‘Expand Summary of Images’. Clicking on a thumbnail image within the ‘Summary of Images’ section will take the user to the details-section of the image.

Figure 1: The summary section shows an overview of the different images returned for the search or filter query. Clicking a thumbnail image lets the user view the details of the image, as in Figure LABEL:fig:viz_section. This example was obtained by searching for the word “glass” in the question.

2.3 Searching for Images by Textual Annotations

Text searching capabilities are present for searching for words and phrases within the visual question, the ten answers, and the five crowdsourced captions. Full-text searching is powered by MariaDB relational database1. Additionally, users can search for an image using its specific filename. These search capabilities are shown in Figure 2.

Figure 2: Different ways to search for images using textual annotations. Users can search for words and phrases within the question, the ten answers, and the five captions.

2.4 Filtering Images by Categorical Annotations

Figure 3: Filtering for images using categorical annotations. The screenshot shows the labels for the answer-difference dataset [2].

The visualization tool can be used to filter images based on the different types of categorical annotations available: (a) answer-difference reasons, (b) skills, and (c) quality issues. This functionality proves to be useful when we want to explore relationships between the different datasets. For example, selecting DFF (Difficult Question) as an answer-difference reason, and ROT (image needs to be rotated) as an image-quality issue, we can view the specific cases where the visual questions are difficult to answer because the images need to be rotated. The filtering capabilities for the answer-difference reasons are shown in Figure 3.

2.5 Ordering of Search Results

Figure 4: Various options for ordering the search results.

The search results can be ordered (sorted) using the options shown in Figure 4. When searching for textual annotations (words or phrases in the question, answers, or captions), the result are sorted in decreasing order of the number of matched words in the annotation. ‘Diversity of answers’ orders the results based on how different the ten answers are, using the Shannon Entropy of the ten answers. For categorical annotations (answer-difference reasons, skills, quality issues, text-presence), the results are ranked based on how many crowdworkers (out of five) annotated the images using the chosen categorical labels.

2.6 Toggling Display of Annotations

Figure 5: Options to hide or show different datasets.

Viewing all the different annotations at once can be overwhelming. Often, the user may want to selectively view certain annotations (\eg, for taking screenshots). For this purpose, the ‘View’ section, as shown in Figure 5, can be used to hide or show the different datasets as desired.

3 Conclusion

In summary, the VizWiz Dataset Browser can prove to be a useful tool to search, filter, and visualize multiple large datasets. It is already being used to aid a variety of ongoing research efforts in the domains of computer vision, accessibility, and human-computer interaction. We are hopeful that future researchers who choose to work with the VizWiz dataset will find the tool useful for answering interesting research questions.

Acknowledgements

We thank the crowdworkers for providing the annotations. We thank Kenneth R. Fleischmann, Meredith Morris, Ed Cutrell, and Abigale Stangl for their valuable feedback about this tool and paper. This work is supported in part by funding from the National Science Foundation (IIS-1755593) and Microsoft.

Footnotes

  1. https://mariadb.com/kb/en/library/full-text-index-overview

References

  1. Amazon.com: Online Shopping. Note: \urlhttps://www.amazon.com[Online; accessed 11-Dec-2019] Cited by: §2.
  2. N. Bhattacharya, Q. Li and D. Gurari (2019) Why does a visual question have different answers?. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4271–4280. Cited by: §1, Figure 3.
  3. J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White and S. White (2010) VizWiz: nearly real-time answers to visual questions. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 333–342. Cited by: §1.
  4. eBay Inc.. Note: \urlhttps://www.ebay.com[Online; accessed 11-Dec-2019] Cited by: §2.
  5. D. Gurari, Q. Li, A. J. Stangl, A. Guo, C. Lin, K. Grauman, J. Luo and J. P. Bigham (2018) VizWiz Grand Challenge: Answering Visual Questions from Blind People. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3608–3617. Cited by: §1.
  6. J. W. Tukey (1977) Exploratory data analysis. Addison-Wesley Publishing Company. Cited by: §1.
  7. Walmart Inc.. Note: \urlhttps://www.walmart.com[Online; accessed 11-Dec-2019] Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
402589
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description