Imagine your GPS could see. To answer the question "Do I turn there where that big tree is?", a camera is not enough; the GPS needs to connect what you say to the portion of reality that surrounds your car. AMORE builds machines that connect language to reality, and seeks an understanding of how people do it. The main challenges are: 1) identifying which entities ("that big tree") are being talked about, both on the visual and on the linguistic camps; 2) tracking the entities as they appear again, adding new information about them as needed; 3) learning these two abilities directly from examples. We face the machine with different tasks that require using language to talk about the world, and the machine progressively learns to represent both the entities and the language that we use to refer to them.