Logo

Objective of Lazarillo Project

The aim is building, annotating and create a dataset, the “ImageNet” for the blind, i.e., a dataset of images taken from a wearable camera (or phone if preferred) by visually impaired people, that can later facilitate image captioning and question answering to help them be more independent in their daily lives. The aim of project Lazarillo is developing this mobile application that has as end-user the visually impaired.

Once enough data has been collected, the application will help describe, with text or speech, the surroundings to the blind person, in the most needed times, or in most beautiful moments.  We think people who can benefit the most from AI are people with special needs, and that is the motivation to build datasets and models that can make cities more visible simply by touching a button of a wearable camera or phone. Its activation will translate the scene into a vocal description, helping to give the blind person context about the environment around him, or even answer questions posed by the user.

Lazarillo Application Use Cases

1. The blind user may choose to send for processing only the textual description of the situation that the image its wearable camera takes, or also the visual description, depending on the degree of complexity of help he wants to obtain from the application, or the degree of privacy or speed he requires for the answer.

2. Only elements relevant to the situation in which the user specifically requires help will be processed. In such situations, the user will share the image of his environment through the wearable camera and then will receive, through an audio, the caption (a legend or subtitle) that would accompany the image of the real situation where he is located; that is, a description of e.g., objects, people and the general context that surrounds him in his current enviroment. The application can also provide answers to questions about a given image.

 

We seek:

AI Researchers and engineers: to collect, annotate, document and publish datasets, and develop models for image captioning and visual question answering (VQA) from the collected dataset. The dataset should be similar to MS-COCO (at least with captions per image and possibly answers to questions -posed or not by the blind person). If you are willing to contribute, join ContinualAI.org Slack and ask Natalia Diaz to add you to the channel.


Visually impaired/ blind volunteers. Do you know visually impaired people that would be happy to wear the Google Clips camera and save images every time he would find useful getting a description of the environment? We would provide them the camera if they are comfortable wearing it (otherwise, a mobile phone will suffice). Volunteers should be happy to share pictures of their lives in the situations they feel they would need most help with. In order to assist even more, they could as well ask a question about the picture that could potentially be answered by an AI model (i.e., as if we would provide Siri with images). The data will be made public for research purposes and so that the scientific community can build and improve upon such models.

  1. More concretely, we would train a model to transform images to the relevant information that the blind person would need from the images surrounding him. Therefore, at the time of saving, it would be great to annotate if possible some context for the blind person (what would he like to be told about the image?  e.g. if he is in a supermarket, in what food section I am? or in a new hotel, In what direction should I walk to find the reception? The blind user may provide context for each situation when he takes a picture where he thinks he would benefit from a textual description of the environment. The idea is that this contextual information will help a question answering application improve responding questions about his surroundings in the future. This information should be answerable by just looking at the images of the wearable (or phone) camera.