Tag Archives: PhD

PhD Thesis

The research of my PhD thesis [1] was fulfilled in the context of wearable video monitoring of patients with aged dementia. The idea was to provide a new tool to medical practitioners for the early diagnosis of elderly dementia such as the Alzheimer disease [2]. More precisely, Instrumental Activities of Daily Living (IADL) had to be indexed in videos recorded with a wearable recording device.

Such videos present specific characteristics i.e. strong motion or strong lighting changes. Furthermore, the tackled recognition task is of a very strong semantics. In this difficult context, the first step of analysis was to define an equivalent to the notion of “shots” in edited videos. We therefore developed a method for partitioning continuous video streams into viewpoints according to the observed motion in the image plane [3]. For the recognition of IADLs we developed a solution based on the formalism of Hidden Markov Models (HMM) [4]. A hierarchical HMM with two levels modeling semantic activities or intermediate states has been introduced [5]. A complex set of features (dynamic, static, low-level, mid-level) was proposed and the most effective description spaces were identified experimentally [6].

In the mid-level features for activities recognition we focused on the semantic objects the person manipulates in the camera view. We proposed a new concept for object/image description using local features (SURF) and the underlying semi-local connected graphs. We introduced a nested approach for graphs construction when the same scene can be described by levels of graphs with increasing number of nodes. We build these graphs with Delaunay triangulation on SURF points thus preserving good properties of local features i.e. the invariance with regard to affine transformation of image plane: rotation, translation and zoom. We use the graph features in the Bag-of-Visual-Words framework, hence introducing the Graph Words [7]. The problem of distance or dissimilarity definition between graphs for clustering or recognition is obviously arisen. We propose a dissimilarity measure based on the Context Dependent Kernel of H. Sahbi and show its relation with the classical entry-wise norm when comparing trivial graphs (SURF points).

Related publications

[1] [pdf] S. Karaman, “Indexing of Activities in Wearable Videos : Application to Epidemiological Studies of Aged Dementia,” PhD Thesis, 2011.
[Bibtex]
@phdthesis{karaman2011phd,
title={Indexing of Activities in Wearable Videos : Application to Epidemiological Studies of Aged Dementia},
author={Karaman, Svebor},
year={2011},
school={Universit{\'e} Sciences et Technologies-Bordeaux I}
}
[2] [pdf] Y. Gaëstel, S. Karaman, R. Megret, O. Cherifa, T. Francoise, B. Jenny, and J. Dartigues, “Autonomy at home and early diagnosis in Alzheimer’s Disease: Utility of video indexing applied to clinical issues, the IMMED project,” in Alzheimer’s Association International Conference on Alzheimer’s Disease (AAICAD), Paris, France, 2011, p. S245.
[Bibtex]
@inproceedings{gaestel2011,
hal_id = {hal-00978228},
url = {http://hal.archives-ouvertes.fr/hal-00978228},
title = {Autonomy at home and early diagnosis in Alzheimer's Disease: Utility of video indexing applied to clinical issues, the IMMED project},
author = {Ga{\"e}stel, Yann and Karaman, Svebor and Megret, R{\'e}mi and Cherifa, Onifade-Fagbe and Francoise, Trophy and Jenny, Benois-Pineau and Dartigues, Jean-Fran{\c c}ois},
abstract = {With ageing of the population in the world, patients with Alzheimer's disease (AD) consequently increase. People suffering from this pathology show early modifications in their "activities of daily living". Those abilities modifications are part of the dementia diagnosis, but are often not reported by the patients or their families. Being able to capture these early signs of autonomy loss could be a way to diagnose earlier dementia and to prevent insecurity at home. We first developed a wearable camera (shoulder mounted) to capture people's activity at home in a non-invasive manner. We then developed a video-indexing methodology to help physicians explore their patients' home-recorded video. This video indexing system requires video and audio analyses to automatically identify and index activities of interest where insecurity or risks could be highlightened. Patients are recruited among the Bagatelle (Talence, France) Memory clinic department patients and are suffering from mild cognitive impairments or very mild AD. We met ten patients at home and we recorded one hour of daily activities for each. The data (video and questionnaires: Activities of Daily Living/Instrumental Activities of Daily Living) are now collected on an extended sample of people suffering from mild cognitive impairments and from very mild AD. We aimed at evaluating behavioral modifications and ability loss detection by comparing the subjects' self reported questionnaires and the video analyses. This project is a successful collaboration between various fields of research. Here, technology is developed to be helpful in everyday challenges that people suffering from dementia of the Alzheimer type are faced with. The automation of the video indexing could be a great step forward in video analysis if it could reduce the time needed to embrace the patient's lifestream, helping in early diagnosis of dementia and becoming a very useful tool to keep individuals safe at home. In fact, many goals could be reached with such video analyses: an early diagnosis of dementia of the Alzheimer type, avoiding danger in home living and evaluating the progression of the disease or the effects of the various therapies (drug-therapy and others).},
language = {Anglais},
affiliation = {Institut de Sant{\'e} Publique, d'Epid{\'e}miologie et de D{\'e}veloppement - ISPED , Laboratoire Bordelais de Recherche en Informatique - LaBRI , Laboratoire de l'int{\'e}gration, du mat{\'e}riau au syst{\`e}me - IMS , MSPB Bagatelle - MSPB , Epid{\'e}miologie et Biostatistique},
booktitle = {{Alzheimer's Association International Conference on Alzheimer's Disease (AAICAD)}},
pages = {S245},
address = {Paris, France},
editor = {Alzheimer's \& Dementia: The Journal of the Alzheimer's Association },
audience = {internationale },
note = {Poster presentation. Abstract published in Journal of Alzheimer's and Dementia, volume 7 (4), pp. S245, July 2011},
collaboration = {IMMED },
year = {2011},
month = {Jul}
}
[3] [pdf] [doi] S. Karaman, J. Benois-Pineau, R. Mégret, J. Pinquier, Y. Gaestel, and J. -F. Dartigues, “Activities of daily living indexing by hierarchical HMM for dementia diagnostics,” in 9th International Workshop on Content-Based Multimedia Indexing (CBMI), Madrid, Spain, 2011, pp. 79-84.
[Bibtex]
@INPROCEEDINGS{karamanCBMI2011,
author={Karaman, S. and Benois-Pineau, J. and Mégret, R. and Pinquier, J. and Gaestel, Y. and Dartigues, J.-F.},
booktitle={9th International Workshop on Content-Based Multimedia Indexing (CBMI)},
title={Activities of daily living indexing by hierarchical HMM for dementia diagnostics},
year={2011},
month={June},
address = {Madrid, Spain},
pages={79-84},
abstract={This paper presents a method for indexing human activities in videos captured from a wearable camera being worn by patients, for studies of progression of the dementia diseases. Our method aims to produce indexes to facilitate the navigation throughout the individual video recordings, which could help doctors search for early signs of the disease in the activities of daily living. The recorded videos have strong motion and sharp lighting changes, inducing noise for the analysis. The proposed approach is based on a two steps analysis. First, we propose a new approach to segment this type of video, based on apparent motion. Each segment is characterized by two original motion descriptors, as well as color, and audio descriptors. Second, a Hidden-Markov Model formulation is used to merge the multimodal audio and video features, and classify the test segments. Experiments show the good properties of the approach on real data.},
keywords={hidden Markov models;image colour analysis;image segmentation;indexing;medical diagnostic computing;medical disorders;video recording;audio descriptors;color descriptors;daily living indexing;dementia diagnostics;dementia diseases;hidden-Markov model formulation;hierarchical HMM;human activities indexing;multimodal audio features;original motion descriptors;recorded videos;test segments;two steps analysis;video features;video recordings;wearable camera;Accuracy;Cameras;Dynamics;Hidden Markov models;Histograms;Motion segmentation;Videos},
doi={10.1109/CBMI.2011.5972524},
note={Oral Presentation},
ISSN={1949-3983}
}
[4] [pdf] [doi] S. Karaman, J. Benois-Pineau, R. Mégret, V. Dovgalecs, J. -F. Dartigues, and Y. Gaëstel, “Human Daily Activities Indexing in Videos from Wearable Cameras for Monitoring of Patients with Dementia Diseases,” in 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 2010, pp. 4113-4116.
[Bibtex]
@INPROCEEDINGS{karamanICPR2010,
author={Karaman, S. and Benois-Pineau, J. and Mégret, R. and Dovgalecs, V. and Dartigues, J.-F. and Gaëstel, Y.},
booktitle={20th International Conference on Pattern Recognition (ICPR)},
title={Human Daily Activities Indexing in Videos from Wearable Cameras for Monitoring of Patients with Dementia Diseases},
year={2010},
month={Aug},
pages={4113-4116},
abstract={Our research focuses on analysing human activities according to a known behaviorist scenario, in case of noisy and high dimensional collected data. The data come from the monitoring of patients with dementia diseases by wearable cameras. We define a structural model of video recordings based on a Hidden Markov Model. New spatio-temporal features, color features and localization features are proposed as observations. First results in recognition of activities are promising.},
keywords={feature extraction;hidden Markov models;image colour analysis;image motion analysis;video cameras;video recording;video signal processing;activity recognition;behaviorist scenario;color features;dementia disease patients;hidden Markov model;human activity indexing;localization features;patient monitoring;spatiotemporal features;video recordings;wearable cameras;Biomedical monitoring;Cameras;Hidden Markov models;Histograms;Image color analysis;Motion segmentation;Videos;Bag of Features;HMM;Localization;Monitoring;Video Indexing},
doi={10.1109/ICPR.2010.999},
note={Oral Presentation},
ISSN={1051-4651},
address={Istanbul, Turkey}
}
[5] [pdf] [doi] S. Karaman, J. Benois-Pineau, V. Dovgalecs, R. Mégret, J. Pinquier, R. André-Obrecht, Y. Gaëstel, and J. Dartigues, “Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia,” Multimedia Tools and Applications (MTAP), vol. 69, iss. 3, p. 1–29, 2012.
[Bibtex]
@article{karaman2012hierarchical,
title={Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia},
author={Karaman, Svebor and Benois-Pineau, Jenny and Dovgalecs, Vladislavs and M{\'e}gret, R{\'e}mi and Pinquier, Julien and Andr{\'e}-Obrecht, R{\'e}gine and Ga{\"e}stel, Yann and Dartigues, Jean-Fran{\c{c}}ois},
journal={Multimedia Tools and Applications (MTAP)},
pages={1--29},
year={2012},
volume={69},
number={3},
doi={10.1007/s11042-012-1117-x},
publisher={Springer}
}
[6] [pdf] J. Pinquier, S. Karaman, L. Letoupin, P. Guyot, R. Megret, J. Benois-Pineau, Y. Gaestel, and J. -F. Dartigues, “Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors,” in 21st International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, 2012, pp. 3192-3195.
[Bibtex]
@INPROCEEDINGS{Pinquier2012,
author={Pinquier, J. and Karaman, S. and Letoupin, L. and Guyot, P. and Megret, R. and Benois-Pineau, J. and Gaestel, Y. and Dartigues, J.-F.},
booktitle={21st International Conference on Pattern Recognition (ICPR)},
title={Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors},
year={2012},
month={Nov},
pages={3192-3195},
abstract={In this paper, we further develop the research on recognition of activities, in videos recorded with wearable cameras, with Hierarchical Hidden Markov Model classifiers. The visual scenes being of a strong complexity in terms of motion and visual content, good performances have been obtained using multiple visual and audio cues. The adequate fusion of features from physically different description spaces remains an open issue not only for this particular task, but in multiple problems of pattern recognition. A study of optimal fusion strategies in the HMM framework is proposed. We design and exploit early, intermediate and late fusions with emitting states in the H-HMM. The results obtained on a corpus recorded by healthy volunteers and patients in a longitudinal dementia study allow choosing optimal fusion strategies as a function of target activity.},
keywords={gesture recognition;hidden Markov models;image fusion;video signal processing;H-HMM;activity recognition;description spaces;early fusions;healthy volunteers;hierarchical HMM classifier;hierarchical hidden Markov model classifiers;intermediate fusions;late fusions;longitudinal dementia study;motion content;multiple feature fusion;optimal fusion strategies;pattern recognition;strong complexity;target activity;visual content;visual scenes;wearable audiovisual sensors;wearable cameras;Cameras;Hidden Markov models;Multimedia communication;Pattern recognition;Streaming media;Videos;Visualization},
url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=6460843},
note={Poster},
address = {Tsukuba, Japan},
ISSN={1051-4651}
}
[7] [pdf] [doi] S. Karaman, J. Benois-Pineau, R. Mégret, and A. Bugeau, “Multi-layer Local Graph Words for Object Recognition,” in Advances in Multimedia Modeling, K. Schoeffmann, B. Merialdo, A. Hauptmann, C. Ngo, Y. Andreopoulos, and C. Breiteneder, Eds., Klagenfurt, Austria: Springer Berlin Heidelberg, 2012, vol. 7131, pp. 29-39.
[Bibtex]
@incollection{karamanMMM2012,
isbn={978-3-642-27354-4},
booktitle={Advances in Multimedia Modeling},
volume={7131},
series={Lecture Notes in Computer Science},
editor={Schoeffmann, Klaus and Merialdo, Bernard and Hauptmann, AlexanderG. and Ngo, Chong-Wah and Andreopoulos, Yiannis and Breiteneder, Christian},
doi={10.1007/978-3-642-27355-1_6},
title={Multi-layer Local Graph Words for Object Recognition},
url={http://dx.doi.org/10.1007/978-3-642-27355-1_6},
publisher={Springer Berlin Heidelberg},
keywords={Feature representation; Structural features; Bag-of-Visual-Words; Graph Words; Delaunay triangulation; Context Dependent Kernel},
author={Karaman, Svebor and Benois-Pineau, Jenny and Mégret, Rémi and Bugeau, Aurélie},
note={Oral Presentation},
address = {Klagenfurt, Austria},
pages={29-39},
year={2012}
}

About me

I am a French Computer Vision and Machine Learning researcher, currently a  Research Manager at Dataminr. Previously, I spent three years as a PostDoc at the MICC (Media Integration and Communication Center) of the University of Florence in Italy, and five years as an Associate Research Scientist in the DVMM Lab at Columbia University.

Research themes

My research themes are image and video analysis, computer vision, and machine learning. I am particularly interested in semantic concept recognition in images and videos.

I did my Ph.D. at the LaBRI – University of Bordeaux, under the supervision of Jenny Benois-Pineau and Rémi Mégret. During my Ph.D. thesis, I worked on human activity recognition by Hidden Markov Models (HMM) in videos recorded from a wearable device within the IMMED project. I have also developed an object recognition approach in the Bag-of-Visual-Words framework which integrates spatial information within semi-local features: the Graph-Words. I defended my Ph.D. entitled “Indexing of Activities in Wearable Videos: Application to Epidemiological Studies of Aged Dementia” in 2011.

While at the MICC, I have been highly involved in the MNEMOSYNE project. In this project, multiple aspects of computer vision such as person detection, person tracking, and re-identification are used to passively profile the interests of visitors in a museum to provide personalized multimedia content delivery. I was also working on more general image and video classification problems.

At the DVMM Lab, I have been working mostly on large-scale image indexing and retrieval problems but I also published works on other projects such as social media understanding, grounding, scene graph generation, visual parsing, and GAN detections…

At Dataminr, I’m working on computer vision and multimodal-related problems.

Keywords

Computer Vision, Machine Learning, Image Analysis, Video Analysis, Video Indexing, Object Recognition, Person Detection, Re-Identification, Passive Profiling, Behavior Analysis, Action Recognition…