On Computer Vision

My Research Highlights

Humans process vision to understand and comprehend the surrounding world where the world is perceived as three dimensional (3D) spaces. Most of the cameras provide only two dimensional (2D) spaces. Thus, the perceived input videos signals is a projection from a real 3D space onto a 2D space. Throughout the projection process, enormous information, regarding the image depth, is lost. However, while humans can easily understand the dynamic structure of a 2D image sequence, it is considered a significant complex and computational expensive task for computers. The lack of 3D information constitutes a fundamental challenge in the most computer vision applications. Exploration the video along the consecutive frames (Temporal Analysis) is the main key to understand the dynamic structures of a scene for most computer vision applications.

Video Segmentation:

Image and video segmentations enable to understand the static structures of still images and the dynamic structures of video sequences. Segmenting the contents of videos and images is the core for many of computer vision applications. For example, the performance of many compression techniques is strongly linked to the ability to segment video images. In surveillance systems segmenting the video content is crucial for reliable analysis of a given scene. Video segmentation is also a key for video indexing techniques to obtain fast and effective browsing, retrieval and management.

A coherent evolving framework to handle the problems of spatial and temporal segmentation in image and video sequences is my research focus. Such framework address its related computer vision applications such as object based change detection, still image segmentation, MPEG blocks segmentation, moving object segmentation and object tracking. These algorithms are situated in the core technology of many advanced multimedia applications for understanding and recognition the contents of images and video sequences. The spatial and temporal processing of the algorithms in this thesis are based on iterative adaptive approaches and iterative analysis of the link between them. Hence, the presented techniques enable to automatically determine whether we have sufficient and reliable spatio-temporal information to segment, extract and track the moving objects.

Solving the problem of Image and video segmentations can be handled using graph data structures and their related algorithms such as breadth-first-search (BFS), depth-first-search (DFS), graph contraction, minimum spanning tree (MST), finding k

shortest path between two vertices, to obtain efficient implementations. Thus, a set of related constructive algorithms for high-level processing that has linear, almostlinear and polynomial time complexity can be provided using these methods. Linear or almost linear time-complexity algorithms are proportional to the image size n, a polynomial time complexity for some algorithms may be reflected when the complexity is proportional to number of arcs E in the image segmentation boundaries. Then the complexity becomes O(E2) rather than O(n2) and E << n . I belived that iterative graph based algorithm achieve maximum accuracy while giving the ability to lower the number of iterations in order to reduce the computational load.

Objects Segmentation:

Segmentation of moving objects aims at partitioning an image sequence into its physical moving objects and static background. The semantics of the moving object definition stems from the way humans analyze a video sequence. In general, it is agreed that humans analyze a video in terms of the objects of interest and their motions, where an object refers to a meaningful spatial and temporal region of a sequence. Despite the fact that the human visual system can easily distinguish between moving objects and background, robust video segmentation without any prior information is known to be one of the most challenging problems in the field of video processing.

Therefore, Object Segmentation has attracted the attention of many researchers. Many applications related to image processing, video compression and pattern recognition rely on moving object segmentation and can utilize the new functionalities. For example, the ability to extract the moving objects of video surveillance systems may signifcantly reduce the number of false alarms caused by luminance changes. Algorithms for video indexing become a challenging problem to create fast and efective browsing, retrieval and management of visual databases. Current systems for video indexing and browsing are frame-based where each frame is analyzed by its global features such as its color histogram [2]. Therefore, events, which take place inside the frame, fail to be represented by a frame-based method. However, in object-based indexing, the object has to be identifed. Low-level features such as color, shape, texture and motion, are available for the object's region and are attached to each object. Thus, an object-based indexing system can be much more detailed inside frame events and activities. In MPEG-4 standard, a video sequence is considered to consist of independently moving objects and the encoding can be based on segmented objects. It also allows easy access to bitstreams of individual objects, manipulation of bitstreams and multiple use of content information by scene composition, which are all suitable for multimedia applications. Another important aspect is the immense popularity of the Internet and the WEB, which clearly demonstrate, that nteractivity-based content is a key factor in many multimedia applications. Huge eforts have been invested over the last decade to find a solution for the object segmentation problem.

Change Detection:

The common goal of the the change detection algorithm is to extract the objects that appear only in one of two registered images. A typical application is surveillance, where a scene is sampled at different times. A significant illumination difference between the two images need to be assumed. For example, one image may be captured during daylight while the other image may be captured at night with infrared device. By analyzing the connectivity along gray-levels, all the blobs that are candidates to be classified as ‘change’ are extracted from both images. Then, the candidate blobs from both images are analyzed. A Blob from one image that has no matched blob in the other image is considered as a ‘change’. Such an algorithm was found to be reliable, fast, accurate, and robust even under significant changes in illumination. The worst-case time complexity of the algorithm is almost linear in the image size. Therefore, it is suitable for real-time applications.

Object Tracking:

Object tracking is an important task in a variety of computer vision applications such as monitoring, perceptual user interfaces and video compression. Some of the tasks include vehicular navigation, robotic control, motion recognition and video surveillance. These applications require reliable object tracking techniques, which meet real time constraints. For example, robots involved in testing and inspecting manufactured parts by monitoring them through video cameras require the use of unsupervised segmentation of video and object tracking. In video processing and compression, object tracking is a key component for efficient algorithms and for content-based indexing. In MPEG-4, the visual information is organized on the basis of the video object concept, which represents a time-varying visual entity with arbitrary shape. Tracking these video objects along the scene enables individual manipulation of its shape and combines it with other similar entities to produce a scene.

Though object tracking is easily performed by humans, it presents a significant challenge for a robust solution by an automatic procedure. An object may be partially and temporarily occluded by a different object, or slowly vanished from the scene and reappeared after several frames. Many of the tracking techniques utilize the model shape of the tracked object to overcome the errors produced by the use of the estimated motion vector. Unfortunately, an object may translate and rotate in three-dimensional space while its projection on the image plane undergoes projective transformations that cause substantial deformations in its two-dimensional shape. Furthermore, an object may change its real (original) shape in physical space, for example, in the case of a human object changing its body position. In this situation it is necessary to decide whether the model that describes the object has to be updated, or whether the change in its shape will be considered as a transitory event. In other cases, the first frame of the sequence may consist of two different objects that are very close to each other, and the object extraction process considers them as a single object. Then, the "single" object may split, after a few frames, into two objects. Depending on the adopted based technique, the object's shape model will have to be reinitialized according to the new tracked objects in successive frames.

Many attempts have been made over the last decade to produce efficient algorithmic solutions that handle tracking. The existing techniques in the literature can be roughly classified into the following groups of trackers: region-based, contour-based and model-based. The region-based tracker utilizes the information of the entire object's region. The information may be produced by motion detection, color segmentation and texture analysis methods. Many of these methods track homogeneous regions of the object by their color, luminance or texture. Then, a merging process that is based on motion estimation is used to obtain the complete object in the next frame. The contour-based trackers process only the boundary pixels of the object. They rely on the assumption that the main information about the shape of a 2D object lies on its contour. Furthermore, since 2D contours are closed curves, it is possible to represent them by a one-dimensional model, and as a result, to reduce the computational complexity of the algorithms particularly in the computations required for shape comparisons. Model-based techniques utilize the definition of a parameterized object model as a priori knowledge of the object shape and type. Thus, it can deal successfully with a large variety of (known) object motions between consecutive frames and also with the occlusion problem of the tracked object.