The technical progress in the last decades makes photo and video recording devices omnipresent. This change has a significant impact, among others, on police work. It is no longer unusual that a myriad of digital data accumulates after a criminal act, which must be reviewed by criminal investigators to collect evidence or solve the crime. This paper presents the VICTORIA Interactive 4D Scene Reconstruction and Analysis Framework (“ISRA-4D” 1.0), an approach for the visual consolidation of heterogeneous video and image data in a 3D reconstruction of the corresponding environment. First, by reconstructing the environment in which the materials were created, a shared spatial context of all available materials is established. Second, all footage is spatially and temporally registered within this 3D reconstruction. Third, a visualization of the hereby created 4D reconstruction (3D scene + time) is provided, which can be analyzed interactively. Additional information on video and image content is also extracted and displayed and can be analyzed with supporting visualizations. The presented approach facilitates the process of filtering, annotating, analyzing, and getting an overview of large amounts of multimedia material. The framework is evaluated using four case studies which demonstrate its broad applicability. Furthermore, the framework allows the user to immerse themselves in the analysis by entering the scenario in virtual reality. This feature is qualitatively evaluated by means of interviews of criminal investigators and outlines potential benefits such as improved spatial understanding and the initiation of new fields of application.
Sensors 2020, 20(18), 5426; https://doi.org/10.3390/s20185426
The forensic investigation of a terrorist attack poses a significant challenge to the investigative authorities, as often several thousand hours of video footage must be viewed. Large scale Video Analytic Platforms (VAP) assist law enforcement agencies (LEA) in identifying suspects and securing evidence. Current platforms focus primarily on the integration of different computer vision methods and thus are restricted to a single modality. We present a video analytic platform that integrates visual and audio analytic modules and fuses information from surveillance cameras and video uploads from eyewitnesses. Videos are analyzed according their acoustic and visual content. Specifically, Audio Event Detection is applied to index the content according to attack-specific acoustic concepts. Audio similarity search is utilized to identify similar video sequences recorded from different perspectives. Visual object detection and tracking are used to index the content according to relevant concepts. Innovative user-interface concepts are introduced to harness the full potential of the heterogeneous results of the analytical modules, allowing investigators to more quickly follow-up on leads and eyewitness reports.·
In proceedings of ACM Symposium on Neural Gaze Detection, June 03–05, 2018, Woodstock, NY. ACM, New York, NY, USA, 5 pages.
Previous research has exposed the discrepancy between the subject of analysis (real world) and the actual data on which the analysis is performed (data world) as a critical weak spot in visual analysis pipelines. In this paper, we demonstrate how Virtual Reality (VR) can help to verify the correspondence of both worlds in the context of Information Visualization (InfoVis) and Visual Analytics (VA). Immersion allows the analyst to dive into the data world and collate it to familiar real-world scenarios. If the data world lacks crucial dimensions, then these are also missing in created virtual environments, which may draw the analyst’s attention to inconsistencies between the database and the subject of analysis. When situating VR in a generic visualization pipeline, we can confirm its basic equality compared to other mediums as well as possible benefits. To overcome the guarded stance of VR in InfoVis and VA, we present a structured analysis of arguments, exhibiting the circumstances that make VR a viable medium for visualizations. As a further contribution, we discuss how VR can aid in minimizing the gap between the data world and the real world and present a use case that demonstrates two solution approaches. Finally, we report on initial expert feedback attesting the applicability of our approach in a real-world scenario for crime scene investigation.
We present an approach to unsupervised audio representation learning. Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness. By applying Latent Semantic Indexing (LSI) we embed corresponding textual information into a latent vector space from which we derive track relatedness for online triplet selection. This LSI topic modelling facilitates fine-grained selection of similar and dissimilar audio-track pairs to learn the audio representation using a Convolution Recurrent Neural Network (CRNN). By this we directly project the semantic context of the unstructured text modality onto the learned representation space of the audio modality without deriving structured ground-truth annotations from it. We evaluate our approach on the Europeana Sounds collection and show how to improve search in digital audio libraries by harnessing the multilingual meta-data provided by numerous European digital libraries. We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection. The learned representations perform comparable to the baseline of handcrafted features, respectively exceeding this baseline in similarity retrieval precision at higher cut-offs with only 15% of the baseline’s feature vector length.
Alexander Schindler, Sergiu Gordea, and Peter Knees. 2020. Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text. In The 35th ACM/SIGAPP Symposium on Applied Computing (SAC ’20), March 30-April 3, 2020, Brno, Czech Republic. ACM, New York, NY, USA, Article 4, 8 pages.
To this day, processing video footage has been a hassle for investigators in charge of preventing or solving criminal and terrorist cases. The only available technology to speed up their analysis was the… fast-forward button. Soon, video analytics technology brought by the VICTORIA project could substantially ease their jobs.
Article published in the Research*eu magazine special feature on ‘How tech is taking on terrorism’, December 2019.
In case of a crime or terrorist attack, nowadays much video footage is available from surveillance and mobile cameras recorded by witnesses. While immediate results can be crucial for the prevention of further incidents, the investigation of such events is typically very costly due to the human resources and time that are needed to process the mass data for an investigation. In this paper, we present an approach that creates a 4D reconstruction from mass data, which is a spatio-temporal reconstruction computed from all available images and video footage. The resulting 4D reconstruction gives investigators an intuitive overview of all camera locations and their viewing directions. It provides investigators the ability to view the original video or image footage at any specific point in time. Combined with an innovative 4D interface, our resulting 4D reconstruction enables investigators to view a crime scene in a way that is similar to watching a video where one can freely navigate in space and time. Furthermore, our approach augments the scene with automatic detections and their trajectories and enrich the crime scene with annotations serving as clues.
In proceedings of (ICDP-19), London, UK, 16-18 Dec. 2019.
This paper shows a Novel Low Processing Time System focused on criminal activities detection based on real-time video analysis applied to Command and Control Citizen Security Centers. This system was applied to the detection and classification of criminal events in a real-time video surveillance subsystem in the Command and Control Citizen Security Center of the Colombian National Police. It was developed using a novel application of Deep Learning, specifically a Faster Region-Based Convolutional Network (R-CNN) for the detection of criminal activities treated as “objects” to be detected in real-time video. In order to maximize the system efficiency and reduce the processing time of each video frame, the pretrained CNN (Convolutional Neural Network) model AlexNet was used and the fine training was carried out with a dataset built for this project, formed by objects commonly used in criminal activities such as short firearms and bladed weapons. In addition, the system was trained for street theft detection. The system can generate alarms when detecting street theft, short firearms and bladed weapons, improving situational awareness and facilitating strategic decision making in the Command and Control Citizen Security Center of the Colombian National Police.
The publication was issued after the 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) jointly organized with the 2nd International Workshop on Research & Innovation for Secure Societies – RISS 2019, 10-12, October 2019, Timisoara, Romania.
Video recordings have become a major resource for legal investigations after crimes and terrorist acts. However, currently no mature video investigation tools are available and trusted by LEAs. The project VICTORIA addresses this need and aims to deliver a video analysis platform that will accelerate video analysis tasks by a factor of 15 to 100. In this paper, we describe concept and work in progress done by AIT GmbH within the project, focusing on the development of a state-of-the-art tool for generic object detection and tracking in videos. We develop a detection, classification and tracking tool, based on deep convolutional and recurrent neural networks, trained on a large number of object classes, and optimized for the project context. Tracking is extended to the multi-class multi-target case. The generic object and motion analytics is integrated in a novel framework developed by AIT, denoted as Connected Vision, which provides a modular and service-oriented (scalable) approach, allowing to process computer vision tasks in a distributed manner.
Large amounts of data have become an essential requirement in the development of modern computer vision algorithms, e.g. the training of neural networks. Due to data protection laws, overflight permissions for UAVs or expensive equipment, data collection is often a costly and time-consuming task. Especially, if the ground truth is generated by manually annotating the collected data. By means of synthetic data generation, large amounts of image- and metadata can be extracted directly from a virtual scene, which in turn can be customized to meet the specific needs of the algorithm or the use-case. Furthermore, the use of virtual objects avoids problems that might arise due to data protection issues and does not require the use of expensive sensors. In this work we propose a framework for synthetic test data generation utilizing the Unreal Engine. The Unreal Engine provides a simulation environment that allows one to simulate complex situations in a virtual world, such as data acquisition with UAVs or autonomous diving. However, our process is agnostic to the computer vision task for which the data is generated and, thus, can be used to create generic datasets. We evaluate our framework by generating synthetic test data, with which a CNN for object detection as well as a V-SLAM algorithm are trained and evaluated. The evaluation shows that our generated synthetic data can be used as an alternative to real data.
This article has been published in proceedings of the International Conference on Content-Based Multimedia Indexing (CBMI2019). Dublin, Ireland, 4-6 Sept 2019.
This paper presents a novel approach to music representation learning. Triplet loss based networks have become popular for representation learning in various multimedia retrieval domains. Yet, one of the most crucial parts of this approach is the appropriate selection of triplets, which is indispensable, considering that the number of possible triplets grows cubically. We present an approach to harness multi-tag annotations for triplet selection, by using Latent Semantic Indexing to project the tags onto a high-dimensional space. From this we estimate tag-relatedness to select hard triplets. The approach is evaluated in a multi-task scenario for which we introduce four large multi-tag annotations for the Million Song Dataset for the music properties genres, styles, moods, and themes.
Added to IEEE Xplore: 21 October 2019
By Laurens Naudts, European University Institute (Centre for Media Pluralism and Media Freedom).
Chapter 4 of Legal and Ethical Aspects of Public Security, Cyber Security and Critical Infrastructure Security (pp 63-96).
Presentation of the same name made at the three decades @ the crossroads of IP, ICT and Law, celebratory Conference, 30 Years ICRI/CIR/CITIP, 4 October 2019, Leuven, Belgium
Within a democratic society, and governed by the rule of law, law enforcement and intelligence agencies serve the maintenance of public tranquillity, law and order. They seek to prevent, detect and combat crime, and provide assistance and service functions to the public. Tasked with the protection against and prevention of threats to public and national security and to the fundamental interests of society, they are bound to protect and respect fundamental rights, and in particular those enshrined within the European Convention on Human Rights (hereinafter ECHR). Nevertheless, for the performance of their functions, and in order to safeguard their independence, effectiveness and impartiality, they have been granted a wide degree of discretion…..
…… The validity and evaluation of profiling can be approached through many different angles, this paper aims to contribute to the current discourse by evaluating one specific risk of profiling: the potential discriminatory nature of profiling practices.
The consortium has published for the first time an article in the digital magazine OpenAccess.Gouv introducing the project to those who are still not familiar about it and giving an overview of the work progress after more than 2 years of work.
By the end of the project in April 2020, the consortium will deliver an ethical and legally compliant Video Analysis Platform (VAP) prototype that will accelerate the video analysis tasks of Law Enforcement Agencies.
The VAP integrates a set of robust, accurate and advanced video analytics modules that have been developed based on the specific needs of legal investigations. The modules, accompanied by a 4D crime scene reconstruction technology and advanced metadata querying mechanisms, integrated into the VICTORIA Video Analysis Platform enable investigators to easily navigate through a vast amount of video materials.
After several workshops between project technical partners and LEAs, the VAP is now ready to be deployed at the project LEAs’ premises (Spain, United Kingdom, Romania and France). The field trials will provide substantial important feedback on the performance of each technical component and the usability of the last version of the VAP in real case scenarios.
More information on the article at https://www.openaccessgovernment.org/video-analysis-criminal-and-terrorist-activities/73475/
Recovering a 3D scene from unordered photo collections is a long-studied topic in computer vision. Existing reconstruction pipelines, both incremental and global, have already achieved remarkable results. This paper addresses the problem of fusing multiple existing partial 3D reconstructions, in particular finding the overlapping regions and transformations (7 DOF) between partial reconstructions. Unlike the previous methods which have to take the entire epipolar geometry (EG) graph as the input and reconstruct the scene, we propose an approach that reuses the existing reconstructed 3D models as input and merges them by utilizing all the internal information to avoid repeated work. This approach is divided into two steps. The first is to find overlapping areas between partial reconstructions based on Fisher similarity lists. Then, based on those overlaps, pairwise rotation between partial reconstructions is estimated by solving an l1 approximation optimization problem. After global rotation estimation, translation and scale between each pair of partial reconstructions are computed simultaneously in a global manner. In order to find the optimal transformation path, the maximal spanning tree (MST) is constructed in the second stage. Our approach is evaluated on diverse challenging public datasets and compared to state-of-the-art Structure from Motion (SfM) methods. Experiments show that our merging approach achieves high computational efficiency while preserving similar reconstruction accuracy and robustness. In addition, our method has superior extensibility which can add partial 3D reconstructions gradually to extend an existing 3D scene.
Kraus, M.; Weiler, N.; Breitkreutz, T.; Keim, D. and Stein, M. (2019). Breaking the Curse of Visual Data Exploration: Improving Analyses by Building Bridges between Data World and Real World. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, IVAPP 2019 – 25-27 Feb, 2019 - Prague, Czech Republic - Volume 3: IVAPP, ISBN 978-989-758-354-4, pages 19-27. DOI: 10.5220/0007257400190027
Visual data exploration is a useful means to extract relevant information from large sets of data. The visual analytics pipeline processes data recorded from the real world to extract knowledge from gathered data. Subsequently, the resulting knowledge is associated with the real world and applied to it. However, the considered data for the analysis is usually only a small fraction of the actual real-world data and lacks above all in context information. It can easily happen that crucial context information is disregarded, leading to false conclusions about the real world. Therefore, conclusions and reasoning based on the analysis of this data pertain to the world represented by the data, and may not be valid for the real world. The purpose of this paper is to raise awareness of this discrepancy between the data world and the real world which has a high impact on the validity of analysis results in the real world. We propose two strategies which help to identify and remove specific differences between the data world and the real world. The usefulness and applicability of our strategies are demonstrated via several use cases.
Big data analytics allow law enforcement agencies to build profiles of individuals, and groups of individuals, in order to guide the decisions they need to make for the prediction, prevention, detection and combat of crime. This chapter analyses the role of non-discrimination law, and more specifically the legal discourse on non-discrimination grounds, as a lens to evaluate the use of (group) profiles for security purposes. The chapter will first explore the Law Enforcement Directive (Directive 2016/680) to ascertain the legal limits of profiling from a data protection perspective. Mainly aimed towards safeguarding the fundamental rights to privacy and data protection, data protection laws nonetheless remain sensitive towards the potential discriminatory nature of personal data processing. In the chapter’s second section, it will be ascertained, through an analysis of the European Court of Human Rights’ (ECtHR) case law, to what extent the use of profiles, as they are deployed to differentiate amongst individuals or groups of individuals, can be considered problematic from a non-discrimination perspective. The chapter therefore aims to identify the key criteria developed by the ECtHR for new differentiation grounds to engage the European Convention on Human Rights’ non-discrimination clause. The legal analysis will be juxtaposed to the risks big data analytics pose to the fundamental rights of equality and non-discrimination. It will be argued that in order to adequately respond to the new threats of technology, both a return to a procedural and instrumental conception of equality and non-discrimination, and a thorough insight into the data and tools used by criminal authorities, might be needed.
Available at SSRN: https://ssrn.com/abstract=3508020
This publication was issued for the 25th International Conference on MultiMedia Modeling, held in Thessaloniki, Greece on 8-11 January, 2019 and was written by Alexander Schindler et al from AIT, Austria.
It has been published in the refereed proceedings of the conference under the DOI: 10.1007/978-3-030-05716-9 or can be also found at https://arxiv.org/abs/1811.11623.
The forensic investigation of a terrorist attack poses a huge challenge to the investigative authorities, as several thousand hours of video footage need to be spotted. To assist law enforcement agencies (LEA) in identifying suspects and securing evidences, the platform presented in this paper has been developed. This platform integrates analytical modules on a scalable architecture. Videos are analyzed according their acoustic and visual content. Specifically, Audio Event Detection is applied to index the content according to attack-specific acoustic concepts. Audio similarity search is applied to identify similar video sequences recorded from different perspectives. Visual object detection is applied to index the content according to relevant concepts. This index of visual and acoustic concepts makes it possible to quickly start an investigation, follow traits and investigate hints from eyewitnesses.
This publication was issued for the 25th International Conference on MultiMedia Modeling, held in Thessaloniki, Greece on 8-11 January, 2019 and was written by P. Guyot et al, from IRIT, Toulouse.
It has been published in the refereed proceedings of the conference under the DOI: 10.1007/978-3-030-05710-7_33.
Audio and video parts of an audiovisual document interact to produce an audiovisual, or multi-modal, perception. Yet, automatic analysis on these documents are usually based on separate audio and video annotations. Regarding the audiovisual content, these annotations could be incomplete, or not relevant. Besides, the expanding possibilities of creating audiovisual documents lead to consider different kinds of contents, including videos filmed in uncontrolled conditions (i.e. fields recordings), or scenes filmed from different points of view (multi-view).
In this paper we propose an original procedure to produce manual annotations in different contexts, including multi-modal and multi-view documents. This procedure, based on using both audio and video annotations, ensures consistency considering audio or video only, and provides additionally audiovisual information at a richer level.
Finally, different applications are made possible when considering such annotated data. In particular, we present an example application in a network of recordings in which our annotations allow multi-source retrieval using mono or multi-modal queries.
Naudts, Laurens, How Machine Learning Generates Unfair Inequalities and How Data Protection Instruments May Help in Mitigating Them (2018). R. Leenes, R. van Brakel, S. Gutwirth & P. De Hert (Authors), Data Protection and Privacy: The Internet of Bodies (Computers, Privacy and Data Protection) 2019, Available at SSRN: https://ssrn.com/abstract=3468121
"How Machine Learning Generates Unfair Inequalities and How Data Protection Instruments May Help in Mitigating Them" has been written by Laurens Naudt, KUL, CITIP. This is a chapter of the Data Protection and Privacy book issued after the eleventh annual International Conference on Computers, Privacy, and Data Protection, CPDP 2018, held in Brussels in January 2018.
This volume offers conceptual analyses, highlight issues, propose solutions, and discuss practices regarding privacy and data protection.
The book is available at the following link.
Publication provided for the European Intelligence and Security Informatics Conference (EISIC) 2018, held on 23-25 October 2018 in Sweden and written by D. Schreiber, M. Boyer, E. Broneder, A. Opitz, S. Veigl from AIT.
The EISIC conference book will be published under reference DOI 10.1109/EISIC.2018.00024.
Video recordings have become a major resource for legal investigations after crimes and terrorist acts. However, currently no mature video investigation tools are available and trusted by LEAs. The project VICTORIA addresses this need and aims to deliver a video analysis platform that will accelerate video analysis tasks by a factor of 15 to 100 (depending on the use case). In this paper we describe concept and work in progress done by AIT GmbH within the project, namely developing a state-of-the-art tool for generic object detection and tracking in videos. We develop a detection, classification and tracking tool, based on deep convolutional and recurrent neural networks, trained on a large number of object classes, and optimized for the project context. Tracking is extended to the multi-class multi-target case. This generic object and motion analytics is integrated with a novel framework developed by AIT, denoted as Connected Vision, which provides a modular and service-oriented (scalable) approach, allowing to process computer vision tasks in a distributed manner. We report encouraging intermediate results in terms of accuracy and performance.
Publication issued for the "BDVA 2018" Big Data Visual and Immersive Analytics Symposium, 17-19/10/2018, Germany, written by Niklas Weiler et al, from the University of Konstanz, Germany.
Submitted for inclusion to IEEE Xplore.
During criminal investigations, every second saved can be valuable to catch a suspect or to prevent further damage. However, sometimes the amount of evidence that needs to be investigated is so large, that it can not be processed fast enough. Especially after incidents in public, the law enforcement agencies receive a lot of video and image material from persons and surveillance cameras. Currently,all these videos are viewed manually and annotated by criminal investigators. The goal of our tool is to make this process faster by allowing the investigators to watch a combination of several videos at the same time and giving them a common spatial and temporal reference.
Having Yes, Using No? About the new legal regime for biometric data, DOI: org/10.1016/j.clsr.2017.11.004 Computer Law and Security Review - Volume 34, Issue 3, June 2018, Pages 523-538 - E.J. Kindt, KU Leuven – Citip, Leuven, Belgium.
The rise of biometric data use in personal consumer objects and governmental (surveillance) applications is irreversible. This article analyses the latest attempt by the General Data Protection Regulation (EU) 2016/679 and the Directive (EU) 2016/680 to regulate biometric data use in the European Union....
Publication issued for the "Data for Policy" event, 06-07/09/2017, London. http://dx.doi.org/10.2139/ssrn.3043707, 18 August 2017, Laurens Naudts, KU Leuven, imec-CITIP (Centre for IT & IP Law)
Differentiation is often intrinsic to the functioning of algorithms. Within large data sets, ‘differentiating grounds’, such as correlations or patterns, are found, which in turn, can be applied by decision-makers to distinguish between individuals or groups of individuals. As the use of algorithms becomes more wide-spread, the chance that algorithmic forms of differentiation result in unfair outcomes increases. Intuitively, certain (random) algorithmic, classification acts, and the decisions that are based on them, seem to run counter to the fundamental notion of equality....