TU LogoTechnische Universität Wien


Multimodal Scene Perception

Our research on Multimodal Scene Perceptions deals with the acquisition and interpretation of scenes using different sensors (e.g., optical, infrared or depth cameras) or images captured with spatial/temporal displacement (stereo/video analysis). As part of this research, we develop methods for computing 3D scene models from range cameras and stereo cameras as input devices, with particular attention being paid to calibration issues and accuracy analysis. An important focus of research is interactive shot steering for sport transmissions, where the user gets control of a virtual camera to generate personalized views that meet his/her own interests. In this context, we pay special attention to efficient video coding and transmission techniques, and related bandwidth considerations. As regards 3D video/TV applications, methods need to be developed that adjust the acquired and computed views to suitable (e.g. mobile or stereoscopic) displays. The structural representation of video content is a fundamental research problem that is still largely unexplored. Structural techniques such as graph pyramids and 2D combinatorial maps may be exploited to derive new data structures for structural representations in tracking.

Depth map computed from multimodal input devices. Temporal changes in a panoramic view.

Visual Attention and Human Behavior

Our research on Visual Attention and Human Behavior reflects recent trends towards emphasizing the computerized analysis, interpretation and support of humans and their needs. An important part of our research activities concentrates on the development of human-like perception techniques for future household robots. Multi-modal sensory information is processed and abstracted, in order to develop machine perception systems - in interdisciplinary collaboration with, e.g., psychologists - that seek to imitate human visual analysis. A further research focus deals with the video-based analysis, interpretation and visualization of human movements in sports.

The human visual system plays also a central role in our research on Content Based Image Retrieval, which explores the so far little researched area of quality‐based and affect-based photo management. The various research topics are well supported and complemented by computer graphics research on predicting visual attention in virtual environments, for example, with the aim to develop optimized visualization techniques.

Interaction between humans and robots. Analysis of visual attention using eye-tracking.

Object Recognition

Object Recognition is a classical and very active field of computer vision, with applications in a broad range of automation and high‐level scene interpretation tasks. In particular, object recognition plays a central role in our research on multimodal object perception and manipulation. The idea is that a system (e.g., robot) should autonomously manipulate objects (e.g., by pushing or grasping) to generate those views that optimize the chances of subsequent learning or recognition tasks. A related research topic focuses on shape-based object analysis, with the goal to find a computational representation of shape (a shape descriptor) which allows answering similarity queries very efficiently.

Perception, manipulation and recognition of objects.