M. Testolina, V. Hosu, M. Jenadeleh, D. Lazzarotto, D. Saupe, and T. Ebrahimi, “JPEG AIC-3 Dataset: Towards Defining the High Quality to Nearly Visually Lossless Quality Range,” in
15th International Conference on Quality of Multimedia Experience (QoMEX), in 15th International Conference on Quality of Multimedia Experience (QoMEX). 2023, pp. 55–60. doi:
10.1109/QoMEX58391.2023.10178554.
Abstract
Visual data play a crucial role in modern society, and the rate at which images and videos are acquired, stored, and exchanged every day is rapidly increasing. Image compression is the key technology that enables storing and sharing of visual content in an efficient and cost-effective manner, by removing redundant and irrelevant information. On the other hand, image compression often introduces undesirable artifacts that reduce the perceived quality of the media. Subjective image quality assessment experiments allow for the collection of information on the visual quality of the media as perceived by human observers, and therefore quantifying the impact of such distortions. Nevertheless, the most commonly used subjective image quality assessment methodologies were designed to evaluate compressed images with visible distortions, and therefore are not accurate and reliable when evaluating images having higher visual qualities. In this paper, we present a dataset of compressed images with quality levels that range from high to nearly visually lossless, with associated quality scores in JND units. The images were subjectively evaluated by expert human observers, and the results were used to define the range from high to nearly visually lossless quality. The dataset is made publicly available to researchers, providing a valuable resource for the development of novel subjective quality assessment methodologies or compression methods that are more effective in this quality range.BibTeX
M. Jenadeleh, J. Zagermann, H. Reiterer, U.-D. Reips, R. Hamzaoui, and D. Saupe, “Relaxed forced choice improves performance of visual quality assessment methods,” in
2023 15th International Conference on Quality of Multimedia Experience (QoMEX), in 2023 15th International Conference on Quality of Multimedia Experience (QoMEX). 2023, pp. 37–42. doi:
10.1109/QoMEX58391.2023.10178467.
Abstract
In image quality assessment, a collective visual quality score for an image or video is obtained from the individual ratings of many subjects. One commonly used format for these experiments is the two-alternative forced choice method. Two stimuli with the same content but differing visual quality are presented sequentially or side-by-side. Subjects are asked to select the one of better quality, and when uncertain, they are required to guess. The relaxed alternative forced choice format aims to reduce the cognitive load and the noise in the responses due to the guessing by providing a third response option, namely, “not sure”. This work presents a large and comprehensive crowdsourcing experiment to compare these two response formats: the one with the “not sure” option and the one without it. To provide unambiguous ground truth for quality evaluation, subjects were shown pairs of images with differing numbers of dots and asked each time to choose the one with more dots. Our crowdsourcing study involved 254 participants and was conducted using a within-subject design. Each participant was asked to respond to 40 pair comparisons with and without the “not sure” response option and completed a questionnaire to evaluate their cognitive load for each testing condition. The experimental results show that the inclusion of the “not sure” response option in the forced choice method reduced mental load and led to models with better data fit and correspondence to ground truth. We also tested for the equivalence of the models and found that they were different. The dataset is available at http://database.mmsp-kn.de/cogvqa-database.html.BibTeX
S. Su
et al., “Going the Extra Mile in Face Image Quality Assessment: A Novel Database and Model,”
IEEE Transactions on Multimedia, vol. 26, pp. 2671–2685, 2023, doi:
10.1109/TMM.2023.3301276.
Abstract
Computer vision models for image quality assessment (IQA) predict the subjective effect of generic image degradation, such as artefacts, blurs, bad exposure, or colors. The scarcity of face images in existing IQA datasets (below 10\%) is limiting the precision of IQA required for accurately filtering low-quality face images or guiding CV models for face image processing, such as super-resolution, image enhancement, and generation. In this paper, we first introduce the largest annotated IQA database to date that contains 20,000 human faces (an order of magnitude larger than all existing rated datasets of faces), of diverse individuals, in highly varied circumstances, quality levels, and distortion types. Based on the database, we further propose a novel deep learning model, which re-purposes generative prior features for predicting subjective face quality. By exploiting rich statistics encoded in well-trained generative models, we obtain generative prior information of the images and serve them as latent references to facilitate the blind IQA task. Experimental results demonstrate the superior prediction accuracy of the proposed model on the face IQA task.BibTeX
O. Wiedemann, V. Hosu, S. Su, and D. Saupe, “Konx: cross-resolution image quality assessment,”
Quality and User Experience, vol. 8, no. 8, Art. no. 8, Aug. 2023, doi:
10.1007/s41233-023-00061-8.
Abstract
Scale-invariance is an open problem in many computer vision subfields. For example, object labels should remain constant across scales, yet model predictions diverge in many cases. This problem gets harder for tasks where the ground-truth labels change with the presentation scale. In image quality assessment (IQA), down-sampling attenuates impairments, e.g., blurs or compression artifacts, which can positively affect the impression evoked in subjective studies. To accurately predict perceptual image quality, cross-resolution IQA methods must therefore account for resolution-dependent discrepancies induced by model inadequacies as well as for the perceptual label shifts in the ground truth. We present the first study of its kind that disentangles and examines the two issues separately via KonX, a novel, carefully crafted cross-resolution IQA database. This paper contributes the following: 1. Through KonX, we provide empirical evidence of label shifts caused by changes in the presentation resolution. 2. We show that objective IQA methods have a scale bias, which reduces their predictive performance. 3. We propose a multi-scale and multi-column deep neural network architecture that improves performance over previous state-of-the-art IQA models for this task. We thus both raise and address a novel research problem in image quality assessment.BibTeX
X. Zhao
et al., “CUDAS: Distortion-Aware Saliency Benchmark,”
IEEE Access, vol. 11, pp. 58025–58036, 2023, doi:
10.1109/ACCESS.2023.3283344.
Abstract
Visual saliency prediction remains an academic challenge due to the diversity and complexity of natural scenes as well as the scarcity of eye movement data on where people look in images. In many practical applications, digital images are inevitably subject to distortions, such as those caused by acquisition, editing, compression or transmission. A great deal of attention has been paid to predicting the saliency of distortion-free pristine images, but little attention has been given to understanding the impact of visual distortions on saliency prediction. In this paper, we first present the CUDAS database - a new distortion-aware saliency benchmark, where eye-tracking data was collected for 60 pristine images and their corresponding 540 distorted formats. We then conduct a statistical evaluation to reveal the behaviour of state-of-the-art saliency prediction models on distorted images and provide insights on building an effective model for distortion-aware saliency prediction. The new database is made publicly available to the research community.BibTeX
G. Chen, H. Lin, O. Wiedemann, and D. Saupe, “Localization of Just Noticeable Difference for Image Compression,” in
2023 15th International Conference on Quality of Multimedia Experience (QoMEX), in 2023 15th International Conference on Quality of Multimedia Experience (QoMEX). 2023, pp. 61–66. doi:
10.1109/QoMEX58391.2023.10178653.
Abstract
The just noticeable difference (JND) is the minimal difference between stimuli that can be detected by a person. The picture-wise just noticeable difference (PJND) for a given reference image and a compression algorithm represents the minimal level of compression that causes noticeable differences in the reconstruction. These differences can only be observed in some specific regions within the image, dubbed as JND-critical regions. Identifying these regions can improve the development of image compression algorithms. Due to the fact that visual perception varies among individuals, determining the PJND values and JND-critical regions for a target population of consumers requires subjective assessment experiments involving a sufficiently large number of observers. In this paper, we propose a novel framework for conducting such experiments using crowdsourcing. By applying this framework, we created a novel PJND dataset, KonJND++, consisting of 300 source images, compressed versions thereof under JPEG or BPG compression, and an average of 43 ratings of PJND and 129 self-reported locations of JND-critical regions for each source image. Our experiments demonstrate the effectiveness and reliability of our proposed framework, which is easy to be adapted for collecting a large-scale dataset. The source code and dataset are available at https://github.com/angchen-dev/LocJND.BibTeX
H. Lin
et al., “Large-Scale Crowdsourced Subjective Assessment of Picturewise Just Noticeable Difference,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, Art. no. 9, 2022, doi:
10.1109/TCSVT.2022.3163860.
Abstract
The picturewise just noticeable difference (PJND) for a given image, compression scheme, and subject is the smallest distortion level that the subject can perceive when the image is compressed with this compression scheme. The PJND can be used to determine the compression level at which a given proportion of the population does not notice any distortion in the compressed image. To obtain accurate and diverse results, the PJND must be determined for a large number of subjects and images. This is particularly important when experimental PJND data are used to train deep learning models that can predict a probability distribution model of the PJND for a new image. To date, such subjective studies have been carried out in laboratory environments. However, the number of participants and images in all existing PJND studies is very small because of the challenges involved in setting up laboratory experiments. To address this limitation, we develop a framework to conduct PJND assessments via crowdsourcing. We use a new technique based on slider adjustment and a flicker test to determine the PJND. A pilot study demonstrated that our technique could decrease the study duration by 50% and double the perceptual sensitivity compared to the standard binary search approach that successively compares a test image side by side with its reference image. Our framework includes a robust and systematic scheme to ensure the reliability of the crowdsourced results. Using 1,008 source images and distorted versions obtained with JPEG and BPG compression, we apply our crowdsourcing framework to build the largest PJND dataset, KonJND-1k (Konstanz just noticeable difference 1k dataset). A total of 503 workers participated in the study, yielding 61,030 PJND samples that resulted in an average of 42 samples per source image. The KonJND-1k dataset is available at http://database.mmsp-kn.de/konjnd-1k-database.htmlBibTeX
M. Zameshina
et al., “Fairness in generative modeling: do it unsupervised!,” in
Proceedings of the Genetic and Evolutionary Computation Conference Companion, in Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Jul. 2022, pp. 320--323. doi:
10.1145/3520304.3528992.
Abstract
We design general-purpose algorithms for addressing fairness issues and mode collapse in generative modeling. More precisely, to design fair algorithms for as many sensitive variables as possible, including variables we might not be aware of, we assume no prior knowledge of sensitive variables: our algorithms use unsupervised fairness only, meaning no information related to the sensitive variables is used for our fairness-improving methods. All images of faces (even generated ones) have been removed to mitigate legal risks.BibTeX
H. Lin, H. Men, Y. Yan, J. Ren, and D. Saupe, “Crowdsourced Quality Assessment of Enhanced Underwater Images - a Pilot Study,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, Sep. 2022, pp. 1--4. doi:
10.1109/QoMEX55416.2022.9900904.
Abstract
Underwater image enhancement (UIE) is essential for a high-quality underwater optical imaging system. While a number of UIE algorithms have been proposed in recent years, there is little study on image quality assessment (IQA) of enhanced underwater images. In this paper, we conduct the first crowdsourced subjective IQA study on enhanced underwater images. We chose ten state-of-the-art UIE algorithms and applied them to yield enhanced images from an underwater image benchmark. Their latent quality scales were reconstructed from pair comparison. We demonstrate that the existing IQA metrics are not suitable for assessing the perceived quality of enhanced underwater images. In addition, the overall performance of 10 UIE algorithms on the benchmark is ranked by the newly proposed simulated pair comparison of the methods.BibTeX
J. Lou, H. Lin, D. Marshall, D. Saupe, and H. Liu, “TranSalNet: Towards perceptually relevant visual saliency prediction,”
Neurocomputing, vol. 494, pp. 455–467, 2022, doi:
https://doi.org/10.1016/j.neucom.2022.04.080.
Abstract
Convolutional neural networks (CNNs) have significantly advanced computational modelling for saliency prediction. However, accurately simulating the mechanisms of visual attention in the human cortex remains an academic challenge. It is critical to integrate properties of human vision into the design of CNN architectures, leading to perceptually more relevant saliency prediction. Due to the inherent inductive biases of CNN architectures, there is a lack of sufficient long-range contextual encoding capacity. This hinders CNN-based saliency models from capturing properties that emulate viewing behaviour of humans. Transformers have shown great potential in encoding long-range information by leveraging the self-attention mechanism. In this paper, we propose a novel saliency model that integrates transformer components to CNNs to capture the long-range contextual visual information. Experimental results show that the transformers provide added value to saliency prediction, enhancing its perceptual relevance in the performance. Our proposed saliency model using transformers has achieved superior results on public benchmarks and competitions for saliency prediction models. The source code of our proposed saliency model TranSalNet is available at: https://github.com/LJOVO/TranSalNet.BibTeX
F. Götz-Hahn, V. Hosu, and D. Saupe, “Critical Analysis on the Reproducibility of Visual Quality Assessment Using Deep Features,”
PLoS ONE, vol. 17, no. 8, Art. no. 8, 2022, doi:
10.1371/journal.pone.0269715.
Abstract
Data used to train supervised machine learning models are commonly split into independent training, validation, and test sets. This paper illustrates that complex data leakage cases have occurred in the no-reference image and video quality assessment literature. Recently, papers in several journals reported performance results well above the best in the field. However, our analysis shows that information from the test set was inappropriately used in the training process in different ways and that the claimed performance results cannot be achieved. When correcting for the data leakage, the performances of the approaches drop even below the state-of-the-art by a large margin. Additionally, we investigate end-to-end variations to the discussed approaches, which do not improve upon the original.BibTeX
F. Götz-Hahn, V. Hosu, H. Lin, and D. Saupe, “KonVid-150k : A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild,”
IEEE Access, vol. 9, pp. 72139--72160, 2021, doi:
10.1109/ACCESS.2021.3077642.
Abstract
Video quality assessment (VQA) methods focus on particular degradation types, usually artificially induced on a small set of reference videos. Hence, most traditional VQA methods under-perform in-the-wild. Deep learning approaches have had limited success due to the small size and diversity of existing VQA datasets, either artificial or authentically distorted. We introduce a new in-the-wild VQA dataset that is substantially larger and diverse: KonVid-150k. It consists of a coarsely annotated set of 153,841 videos having five quality ratings each, and 1,596 videos with a minimum of 89 ratings each. Additionally, we propose new efficient VQA approaches (MLSP-VQA) relying on multi-level spatially pooled deep-features (MLSP). They are exceptionally well suited for training at scale, compared to deep transfer learning approaches. Our best method, MLSP-VQA-FF, improves the Spearman rank-order correlation coefficient (SRCC) performance metric on the commonly used KoNViD-1k in-the-wild benchmark dataset to 0.82. It surpasses the best existing deep-learning model (0.80 SRCC) and hand-crafted feature-based method (0.78 SRCC). We further investigate how alternative approaches perform under different levels of label noise, and dataset size, showing that MLSP-VQA-FF is the overall best method for videos in-the-wild. Finally, we show that the MLSP-VQA models trained on KonVid-150k sets the new state-of-the-art for cross-test performance on KoNViD-1k and LIVE-Qualcomm with a 0.83 and 0.64 SRCC, respectively. For KoNViD-1k this inter-dataset testing outperforms intra-dataset experiments, showing excellent generalization.BibTeX
B. Roziere
et al., “Tarsier: Evolving Noise Injection in Super-Resolution GANs,” in
2020 25th International Conference on Pattern Recognition (ICPR), in 2020 25th International Conference on Pattern Recognition (ICPR). 2021, pp. 7028–7035. doi:
10.1109/ICPR48806.2021.9413318.
Abstract
Super-resolution aims at increasing the resolution and level of detail within an image. The current state of the art in general single-image super-resolution is held by NESRGAN+, which injects a Gaussian noise after each residual layer at training time. In this paper, we harness evolutionary methods to improve NESRGAN+ by optimizing the noise injection at inference time. More precisely, we use Diagonal CMA to optimize the injected noise according to a novel criterion combining quality assessment and realism. Our results are validated by the PIRM perceptual score and a human study. Our method outperforms NESRGAN+ on several standard super-resolution datasets. More generally, our approach can be used to optimize any method based on noise injection.BibTeX
H. Men, H. Lin, M. Jenadeleh, and D. Saupe, “Subjective Image Quality Assessment with Boosted Triplet Comparisons,”
IEEE Access, vol. 9, pp. 138939–138975, 2021, doi:
10.1109/ACCESS.2021.3118295.
Abstract
In subjective full-reference image quality assessment, a reference image is distorted at increasing distortion levels. The differences between perceptual image qualities of the reference image and its distorted versions are evaluated, often using degradation category ratings (DCR). However, the DCR has been criticized since differences between rating categories on this ordinal scale might not be perceptually equidistant, and observers may have different understandings of the categories. Pair comparisons (PC) of distorted images, followed by Thurstonian reconstruction of scale values, overcomes these problems. In addition, PC is more sensitive than DCR, and it can provide scale values in fractional, just noticeable difference (JND) units that express a precise perceptional interpretation. Still, the comparison of images of nearly the same quality can be difficult. We introduce boosting techniques embedded in more general triplet comparisons (TC) that increase the sensitivity even more. Boosting amplifies the artefacts of distorted images, enlarges their visual representation by zooming, increases the visibility of the distortions by a flickering effect, or combines some of the above. Experimental results show the effectiveness of boosted TC for seven types of distortion (color diffusion, jitter, high sharpen, JPEG 2000 compression, lens blur, motion blur, multiplicative noise). For our study, we crowdsourced over 1.7 million responses to triplet questions. We give a detailed analysis of the data in terms of scale reconstructions, accuracy, detection rates, and sensitivity gain. Generally, boosting increases the discriminatory power and allows to reduce the number of subjective ratings without sacrificing the accuracy of the resulting relative image quality values. Our technique paves the way to fine-grained image quality datasets, allowing for more distortion levels, yet with high-quality subjective annotations. We also provide the details for Thurstonian scale reconstruction from TC and our annotated dataset, KonFiG-IQA , containing 10 source images, processed using 7 distortion types at 12 or even 30 levels, uniformly spaced over a span of 3 JND units.BibTeX
H. Lin, G. Chen, and F. W. Siebert, “Positional Encoding: Improving Class-Imbalanced Motorcycle Helmet use Classification,” in
2021 IEEE International Conference on Image Processing (ICIP), in 2021 IEEE International Conference on Image Processing (ICIP). 2021, pp. 1194–1198. doi:
10.1109/ICIP42928.2021.9506178.
Abstract
Recent advances in the automated detection of motorcycle riders’ helmet use have enabled road safety actors to process large scale video data efficiently and with high accuracy. To distinguish drivers from passengers in helmet use, the most straightforward way is to train a multi-class classifier, where each class corresponds to a specific combination of rider position and individual riders’ helmet use. However, such strategy results in long-tailed data distribution, with critically low class samples for a number of uncommon classes. In this paper, we propose a novel approach to address this limitation. Let n be the maximum number of riders a motorcycle can hold, we encode the helmet use on a motorcycle as a vector with 2n bits, where the first n bits denote if the encoded positions have riders, and the latter n bits denote if the rider in the corresponding position wears a helmet. With the novel helmet use positional encoding, we propose a deep learning model that stands on existing image classification architecture. The model simultaneously trains 2n binary classifiers, which allows more balanced samples for training. This method is simple to implement and requires no hyperparameter tuning. Experimental results demonstrate our approach outperforms the state-of-the-art approaches by 1.9% accuracy.BibTeX
S. Su, V. Hosu, H. Lin, Y. Zhang, and D. Saupe, “KonIQ++: Boosting No-Reference Image Quality Assessment in the Wild by Jointly Predicting Image Quality and Defects,” in
32nd British Machine Vision Conference, in 32nd British Machine Vision Conference. 2021, pp. 1–12. [Online]. Available:
https://www.bmvc2021-virtualconference.com/assets/papers/0868.pdfAbstract
Although image quality assessment (IQA) in-the-wild has been researched in computer vision, it is still challenging to precisely estimate perceptual image quality in the presence of real-world complex and composite distortions. In order to improve machine learning solutions for IQA, we consider side information denoting the presence of distortions besides the basic quality ratings in IQA datasets. Specifically, we extend one of the largest in-the-wild IQA databases, KonIQ-10k, to KonIQ++, by collecting distortion annotations for each image, aiming to improve quality prediction together with distortion identification. We further explore the interactions between image quality and distortion by proposing a novel IQA model, which jointly predicts image quality and distortion by recurrently refining task-specific features in a multi-stage fusion framework. Our dataset KonIQ++, along with the model, boosts IQA performance and generalization ability, demonstrating its potential for solving the challenging authentic IQA task. The proposed model can also accurately predict distinct image defects, suggesting its application in image processing tasks such as image colorization and deblurring.BibTeX
B. Roziere
et al., “EvolGAN: Evolutionary Generative Adversarial Networks,” in
Computer Vision -- ACCV 2020, in Computer Vision -- ACCV 2020. Cham: Springer International Publishing, Nov. 2021, pp. 679--694. doi:
10.1007/978-3-030-69538-5_41.
Abstract
We propose to use a quality estimator and evolutionary methods to search the latent space of generative adversarial networks trained on small, difficult datasets, or both. The new method leads to the generation of significantly higher quality images while preserving the original generator’s diversity. Human raters preferred an image from the new version with frequency 83.7% for Cats, 74% for FashionGen, 70.4% for Horses, and 69.2% for Artworks - minor improvements for the already excellent GANs for faces. This approach applies to any quality scorer and GAN generator.BibTeX
O. Wiedemann, V. Hosu, H. Lin, and D. Saupe, “Foveated Video Coding for Real-Time Streaming Applications,” in
2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), in 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX). 2020, pp. 1–6. doi:
10.1109/QoMEX48832.2020.9123080.
Abstract
Video streaming under real-time constraints is an increasingly widespread application. Many recent video encoders are unsuitable for this scenario due to theoretical limitations or run time requirements. In this paper, we present a framework for the perceptual evaluation of foveated video coding schemes. Foveation describes the process of adapting a visual stimulus according to the acuity of the human eye. In contrast to traditional region-of-interest coding, where certain areas are statically encoded at a higher quality, we utilize feedback from an eye-tracker to spatially steer the bit allocation scheme in real-time. We evaluate the performance of an H.264 based foveated coding scheme in a lab environment by comparing the bitrates at the point of just noticeable distortion (JND). Furthermore, we identify perceptually optimal codec parameterizations. In our trials, we achieve an average bitrate savings of 63.24% at the JND in comparison to the unfoveated baseline.BibTeX
H. Lin, M. Jenadeleh, G. Chen, U. Reips, R. Hamzaoui, and D. Saupe, “Subjective Assessment of Global Picture-Wise Just Noticeable Difference,” in
Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi:
10.1109/ICMEW46912.2020.9106058.
Abstract
The picture-wise just noticeable difference (PJND) for a given image and a compression scheme is a statistical quantity giving the smallest distortion that a subject can perceive when the image is compressed with the compression scheme. The PJND is determined with subjective assessment tests for a sample of subjects. We introduce and apply two methods of adjustment where the subject interactively selects the distortion level at the PJND using either a slider or keystrokes. We compare the results and times required to those of the adaptive binary search type approach, in which image pairs with distortions that bracket the PJND are displayed and the difference in distortion levels is reduced until the PJND is identified. For the three methods, two images are compared using the flicker test in which the displayed images alternate at a frequency of 8 Hz. Unlike previous work, our goal is a global one, determining the PJND not only for the original pristine image but also for a sequence of compressed versions. Results for the MCL-JCI dataset show that the PJND measurements based on adjustment are comparable with those of the traditional approach using binary search, yet significantly faster. Moreover, we conducted a crowdsourcing study with side-byside comparisons and forced choice, which suggests that the flicker test is more sensitive than a side-by-side comparison.BibTeX
T. Guha
et al., “ATQAM/MAST’20: Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends,” in
Proceedings of the 28th ACM International Conference on Multimedia, in Proceedings of the 28th ACM International Conference on Multimedia. Seattle, WA, USA: Association for Computing Machinery, 2020, pp. 4758–4760. doi:
10.1145/3394171.3421895.
Abstract
The Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends (ATQAM/ MAST) aims to bring together researchers and professionals working in fields ranging from computer vision, multimedia computing, multimodal signal processing to psychology and social sciences. It is divided into two tracks: ATQAM and MAST. ATQAM track: Visual quality assessment techniques can be divided into image and video technical quality assessment (IQA and VQA, or broadly TQA) and aesthetics quality assessment (AQA). While TQA is a long-standing field, having its roots in media compression, AQA is relatively young. Both have received increased attention with developments in deep learning. The topics have mostly been studied separately, even though they deal with similar aspects of the underlying subjective experience of media. The aim is to bring together individuals in the two fields of TQA and AQA for the sharing of ideas and discussions on current trends, developments, issues, and future directions. MAST track: The research area of media content analytics has been traditionally used to refer to applications involving inference of higher-level semantics from multimedia content. However, multimedia is typically created for human consumption, and we believe it is necessary to adopt a human-centered approach to this analysis, which would not only enable a better understanding of how viewers engage with content but also how they impact each other in the process.BibTeX
B. Roziere
et al., “Evolutionary Super-Resolution,” in
Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, in Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. Cancún, Mexico: Association for Computing Machinery, 2020, pp. 151–152. doi:
10.1145/3377929.3389959.
Abstract
Super-resolution increases the resolution of an image. Using evolutionary optimization, we optimize the noise injection of a super-resolution method for improving the results. More generally, our approach can be used to optimize any method based on noise injection.BibTeX
H. Lin
et al., “SUR-FeatNet: Predicting the Satisfied User Ratio Curvefor Image Compression with Deep Feature Learning,”
Quality and User Experience, vol. 5, no. 1, Art. no. 1, 2020, doi:
10.1007/s41233-020-00034-1.
Abstract
The satisfied user ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, characterizes the complementary cumulative distribution function of the just noticeable difference (JND), the smallest distortion level that can be perceived by a subject when a reference image is compared to a distorted one. A sequence of JNDs can be defined with a suitable successive choice of reference images. We propose the first deep learning approach to predict SUR curves. We show how to apply maximum likelihood estimation and the Anderson-Darling test to select a suitable parametric model for the distribution function. We then use deep feature learning to predict samples of the SUR curve and apply the method of least squares to fit the parametric model to the predicted samples. Our deep learning approach relies on a siamese convolutional neural network, transfer learning, and deep feature learning, using pairs consisting of a reference image and a compressed image for training. Experiments on the MCL-JCI dataset showed state-of-the-art performance. For example, the mean Bhattacharyya distances between the predicted and ground truth first, second, and third JND distributions were 0.0810, 0.0702, and 0.0522, respectively, and the corresponding average absolute differences of the peak signal-to-noise ratio at a median of the first JND distribution were 0.58, 0.69, and 0.58 dB. Further experiments on the JND-Pano dataset showed that the method transfers well to high resolution panoramic images viewed on head-mounted displays.BibTeX
X. Zhao, H. Lin, P. Guo, D. Saupe, and H. Liu, “Deep Learning VS. Traditional Algorithms for Saliency Prediction of Distorted Images,” in
2020 IEEE International Conference on Image Processing (ICIP), in 2020 IEEE International Conference on Image Processing (ICIP). 2020, pp. 156–160. doi:
10.1109/ICIP40778.2020.9191203.
Abstract
Saliency has been widely studied in relation to image quality assessment (IQA). The optimal use of saliency in IQA metrics, however, is nontrivial and largely depends on whether saliency can be accurately predicted for images containing various distortions. Although tremendous progress has been made in saliency modelling, very little is known about whether and to what extent state-of-the-art methods are beneficial for saliency prediction of distorted images. In this paper, we analyse the ability of deep learning versus traditional algorithms in predicting saliency, based on an IQA-aware saliency benchmark, the SIQ288 database. Building off the variations in model performance, we make recommendations for model selections for IQA applications.BibTeX
M. Jenadeleh, M. Pedersen, and D. Saupe, “Blind Quality Assessment of Iris Images Acquired in Visible Light for Biometric Recognition,”
Sensors, vol. 20, no. 5, Art. no. 5, 2020, doi:
10.3390/s20051308.
Abstract
Image quality is a key issue affecting the performance of biometric systems. Ensuring the quality of iris images acquired in unconstrained imaging conditions in visible light poses many challenges to iris recognition systems. Poor-quality iris images increase the false rejection rate and decrease the performance of the systems by quality filtering. Methods that can accurately predict iris image quality can improve the efficiency of quality-control protocols in iris recognition systems. We propose a fast blind/no-reference metric for predicting iris image quality. The proposed metric is based on statistical features of the sign and the magnitude of local image intensities. The experiments, conducted with a reference iris recognition system and three datasets of iris images acquired in visible light, showed that the quality of iris images strongly affects the recognition performance and is highly correlated with the iris matching scores. Rejecting poor-quality iris images improved the performance of the iris recognition system. In addition, we analyzed the effect of iris image quality on the accuracy of the iris segmentation module in the iris recognition system.BibTeX
V. Hosu, H. Lin, T. Sziranyi, and D. Saupe, “KonIQ-10k : An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment,”
IEEE Transactions on Image Processing, vol. 29, pp. 4041--4056, 2020, doi:
10.1109/TIP.2020.2967829.
Abstract
Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models (512 × 384 ). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.BibTeX
O. Wiedemann and D. Saupe, “Gaze Data for Quality Assessment of Foveated Video,” in
ACM Symposium on Eye Tracking Research and Applications, in ACM Symposium on Eye Tracking Research and Applications. Stuttgart, Germany: Association for Computing Machinery, 2020. doi:
10.1145/3379157.3391656.
Abstract
This paper presents current methodologies and challenges in the context of subjective quality assessment with a focus on adaptively encoded video streams.BibTeX
V. Hosu
et al., “From Technical to Aesthetics Quality Assessment and Beyond: Challenges and Potential,” in
Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends, in Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends. Seattle, WA, USA: Association for Computing Machinery, 2020, pp. 19–20. doi:
10.1145/3423268.3423589.
Abstract
Every day 1.8+ billion images are being uploaded to Facebook, Instagram, Flickr, Snapchat, and WhatsApp 6. The exponential growth of visual media has made quality assessment become increasingly important for various applications, from image acquisition, synthesis, restoration, and enhancement, to image search and retrieval, storage, and recognition.There have been two related but different classes of visual quality assessment techniques: image quality assessment (IQA) and image aesthetics assessment (IAA). As perceptual assessment tasks, subjective IQA and IAA share some common underlying factors that affect user judgments. Moreover, they are similar in methodology (especially NR-IQA in-the-wild and IAA). However, the emphasis for each is different: IQA focuses on low-level defects e.g. processing artefacts, noise, and blur, while IAA puts more emphasis on abstract and higher-level concepts that capture the subjective aesthetics experience, e.g. established photographic rules encompassing lighting, composition, and colors, and personalized factors such as personality, cultural background, age, and emotion.IQA has been studied extensively over the last decades 3, 14, 22. There are three main types of IQA methods: full-reference (FR), reduced-reference (RR), and no-reference (NR). Among these, NRIQA is the most challenging as it does not depend on reference images or impose strict assumptions on the distortion types and level. NR-IQA techniques can be further divided into those that predict the global image score 1, 2, 10, 17, 26 and patch-based IQA 23, 25, naming a few of the more recent approaches.BibTeX
H. Men, V. Hosu, H. Lin, A. Bruhn, and D. Saupe, “Visual Quality Assessment for Interpolated Slow-Motion Videos Based on a Novel Database,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). 2020, pp. 1–6. doi:
10.1109/QoMEX48832.2020.9123096.
Abstract
Professional video editing tools can generate slow-motion video by interpolating frames from video recorded at astandard frame rate. Thereby the perceptual quality of such in-terpolated slow-motion videos strongly depends on the underlyinginterpolation techniques. We built a novel benchmark databasethat is specifically tailored for interpolated slow-motion videos(KoSMo-1k). It consists of 1,350 interpolated video sequences,from 30 different content sources, along with their subjectivequality ratings from up to ten subjective comparisons per videopair. Moreover, we evaluated the performance of twelve exist-ing full-reference (FR) image/video quality assessment (I/VQA)methods on the benchmark. In this way, we are able to show thatspecifically tailored quality assessment methods for interpolatedslow-motion videos are needed, since the evaluated methods –despite their good performance on real-time video databases – donot give satisfying results when it comes to frame interpolation.BibTeX
M. Lan Ha, V. Hosu, and V. Blanz, “Color Composition Similarity and Its Application in Fine-grained Similarity,” in
2020 IEEE Winter Conference on Applications of Computer Vision (WACV), in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway, NJ: IEEE, 2020, pp. 2548--2557. doi:
10.1109/WACV45572.2020.9093522.
Abstract
Assessing visual similarity in-the-wild, a core ability of the human visual system, is a challenging problem for computer vision methods because of its subjective nature and limited annotated datasets. We make a stride forward, showing that visual similarity can be better studied by isolating its components. We identify color composition similarity as an important aspect and study its interaction with category-level similarity. Color composition similarity considers the distribution of colors and their layout in images. We create predictive models accounting for the global similarity that is beyond pixel-based and patch-based, or histogram level information. Using an active learning approach, we build a large-scale color composition similarity dataset with subjective ratings via crowd-sourcing, the first of its kind. We train a Siamese network using the dataset to create a color similarity metric and descriptors which outperform existing color descriptors. We also provide a benchmark for global color descriptors for perceptual color similarity. Finally, we combine color similarity and category level features for fine-grained visual similarity. Our proposed model surpasses the state-of-the-art performance while using three orders of magnitude less training data. The results suggest that our proposal to study visual similarity by isolating its components, modeling and combining them is a promising paradigm for further development.BibTeX
H. Lin, J. D. Deng, D. Albers, and F. W. Siebert, “Helmet Use Detection of Tracked Motorcycles Using CNN-Based Multi-Task Learning,”
IEEE Access, vol. 8, pp. 162073–162084, 2020, doi:
10.1109/ACCESS.2020.3021357.
Abstract
Automated detection of motorcycle helmet use through video surveillance can facilitate efficient education and enforcement campaigns that increase road safety. However, existing detection approaches have a number of shortcomings, such as the inabilities to track individual motorcycles through multiple frames, or to distinguish drivers from passengers in helmet use. Furthermore, datasets used to develop approaches are limited in terms of traffic environments and traffic density variations. In this paper, we propose a CNN-based multi-task learning (MTL) method for identifying and tracking individual motorcycles, and register rider specific helmet use. We further release the HELMET dataset, which includes 91,000 annotated frames of 10,006 individual motorcycles from 12 observation sites in Myanmar. Along with the dataset, we introduce an evaluation metric for helmet use and rider detection accuracy, which can be used as a benchmark for evaluating future detection approaches. We show that the use of MTL for concurrent visual similarity learning and helmet use classification improves the efficiency of our approach compared to earlier studies, allowing a processing speed of more than 8 FPS on consumer hardware, and a weighted average F-measure of 67.3% for detecting the number of riders and helmet use of tracked motorcycles. Our work demonstrates the capability of deep learning as a highly accurate and resource efficient approach to collect critical road safety related data.BibTeX
H. Men, V. Hosu, H. Lin, A. Bruhn, and D. Saupe, “Subjective annotation for a frame interpolation benchmark using artefact amplification,”
Quality and User Experience, vol. 5, no. 1, Art. no. 1, 2020, doi:
10.1007/s41233-020-00037-y.
Abstract
Current benchmarks for optical flow algorithms evaluate the estimation either directly by comparing the predicted flow fields with the ground truth or indirectly by using the predicted flow fields for frame interpolation and then comparing the interpolated frames with the actual frames. In the latter case, objective quality measures such as the mean squared error are typically employed. However, it is well known that for image quality assessment, the actual quality experienced by the user cannot be fully deduced from such simple measures. Hence, we conducted a subjective quality assessment crowdscouring study for the interpolated frames provided by one of the optical flow benchmarks, the Middlebury benchmark. It contains interpolated frames from 155 methods applied to each of 8 contents. For this purpose, we collected forced-choice paired comparisons between interpolated images and corresponding ground truth. To increase the sensitivity of observers when judging minute difference in paired comparisons we introduced a new method to the field of full-reference quality assessment, called artefact amplification. From the crowdsourcing data (3720 comparisons of 20 votes each) we reconstructed absolute quality scale values according to Thurstone’s model. As a result, we obtained a re-ranking of the 155 participating algorithms w.r.t. the visual quality of the interpolated frames. This re-ranking not only shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks, the results also provide the ground truth for designing novel image quality assessment (IQA) methods dedicated to perceptual quality of interpolated images. As a first step, we proposed such a new full-reference method, called WAE-IQA, which weights the local differences between an interpolated image and its ground truth.BibTeX
C. Fan
et al., “SUR-Net: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Learning,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2019, pp. 1–6. doi:
10.1109/QoMEX.2019.8743204.
Abstract
The Satisfied User Ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, characterizes the probability distribution of the Just Noticeable Difference (JND) level, the smallest distortion level that can be perceived by a subject. We propose the first deep learning approach to predict such SUR curves. Instead of the direct approach of regressing the SUR curve itself for a given reference image, our model is trained on pairs of images, original and compressed. Relying on a Siamese Convolutional Neural Network (CNN), feature pooling, a fully connected regression-head, and transfer learning, we achieved a good prediction performance. Experiments on the MCL-JCI dataset showed a mean Bhattacharyya distance between the predicted and the original JND distributions of only 0.072.BibTeX
V. Hosu, B. Goldlücke, and D. Saupe, “Effective Aesthetics Prediction with Multi-level Spatially Pooled Features,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9367–9375, 2019, doi:
10.1109/CVPR.2019.00960.
Abstract
We propose an effective deep learning approach to aesthetics quality assessment that relies on a new type of pre-trained features, and apply it to the AVA data set, the currently largest aesthetics database. While previous approaches miss some of the information in the original images, due to taking small crops, down-scaling or warping the originals during training, we propose the first method that efficiently supports full resolution images as an input, and can be trained on variable input sizes. This allows us to significantly improve upon the state of the art, increasing the Spearman rank-order correlation coefficient (SRCC) of ground-truth mean opinion scores (MOS) from the existing best reported of 0.612 to 0.756. To achieve this performance, we extract multi-level spatially pooled (MLSP) features from all convolutional blocks of a pre-trained InceptionResNet-v2 network, and train a custom shallow Convolutional Neural Network (CNN) architecture on these new features.BibTeX
H. Lin, V. Hosu, and D. Saupe, “KADID-10k: A Large-scale Artificially Distorted IQA Database,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2019, pp. 1–3. doi:
10.1109/QoMEX.2019.8743252.
Abstract
Current artificially distorted image quality assessment (IQA) databases are small in size and limited in content. Larger IQA databases that are diverse in content could benefit the development of deep learning for IQA. We create two datasets, the Konstanz Artificially Distorted Image quality Database (KADID-10k) and the Konstanz Artificially Distorted Image quality Set (KADIS-700k). The former contains 81 pristine images, each degraded by 25 distortions in 5 levels. The latter has 140,000 pristine images, with 5 degraded versions each, where the distortions are chosen randomly. We conduct a subjective IQA crowdsourcing study on KADID-10k to yield 30 degradation category ratings (DCRs) per image. We believe that the annotated set KADID-10k, together with the unlabelled set KADIS-700k, can enable the full potential of deep learning based IQA methods by means of weakly-supervised learning.BibTeX
H. Men, H. Lin, V. Hosu, D. Maurer, A. Bruhn, and D. Saupe, “Visual Quality Assessment for Motion Compensated Frame Interpolation,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2019, pp. 1–6. doi:
10.1109/QoMEX.2019.8743221.
Abstract
Current benchmarks for optical flow algorithms evaluate the estimation quality by comparing their predicted flow field with the ground truth, and additionally may compare interpolated frames, based on these predictions, with the correct frames from the actual image sequences. For the latter comparisons, objective measures such as mean square errors are applied. However, for applications like image interpolation, the expected user's quality of experience cannot be fully deduced from such simple quality measures. Therefore, we conducted a subjective quality assessment study by crowdsourcing for the interpolated images provided in one of the optical flow benchmarks, the Middlebury benchmark. We used paired comparisons with forced choice and reconstructed absolute quality scale values according to Thurstone's model using the classical least squares method. The results give rise to a re-ranking of 141 participating algorithms w.r.t. visual quality of interpolated frames mostly based on optical flow estimation. Our re-ranking result shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks.BibTeX
V. Hosu, H. Lin, and D. Saupe, “Expertise Screening in Crowdsourcing Image Quality,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2018, pp. 276–281. doi:
https://dx.doi.org/10.1109/QoMEX.2018.8463427.
Abstract
We propose a screening approach to find reliable and effectively expert crowd workers in image quality assessment (IQA). Our method measures the users' ability to identify image degradations by using test questions, together with several relaxed reliability checks. We conduct multiple experiments, obtaining reproducible results with a high agreement between the expertise-screened crowd and the freelance experts of 0.95 Spearman rank order correlation (SROCC), with one restriction on the image type. Our contributions include a reliability screening method for uninformative users, a new type of test questions that rely on our proposed database 1 of pristine and artificially distorted images, a group agreement extrapolation method and an analysis of the crowdsourcing experiments.BibTeX
M. Jenadeleh, M. Pedersen, and D. Saupe, “Realtime Quality Assessment of Iris Biometrics Under Visible Light,” in
Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPRW), CVPR Workshops, in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPRW), CVPR Workshops. IEEE, 2018, pp. 443–452. doi:
10.1109/CVPRW.2018.00085.
Abstract
Ensuring sufficient quality of iris images acquired by handheld imaging devices in visible light poses many challenges to iris recognition systems. Many distortions affect the input iris images, and the source and types of these distortions are unknown in uncontrolled environments. We propose a fast no-reference image quality assessment measure for predicting iris image quality to handle severely degraded iris images. The proposed differential sign-magnitude statistics index (DSMI) is based on statistical features of the local difference sign-magnitude transform, which are computed by comparing the local mean with the central pixel of the patch and considering the noticeable variations. The experiments, conducted with a reference iris recognition system and three visible light datasets, showed that the quality of iris images strongly affects the recognition performance. Using the proposed method as a quality filtering step improved the performance of the iris recognition system by rejecting poor quality iris samples.BibTeX
D. Varga, D. Saupe, and T. Szirányi, “DeepRN: A Content Preserving Deep Architecture for Blind Image Quality Assessment,” in
Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2018, pp. 1–6. doi:
10.1109/ICME.2018.8486528.
Abstract
This paper presents a blind image quality assessment (BIQA) method based on deep learning with convolutional neural networks (CNN). Our method is trained on full and arbitrarily sized images rather than small image patches or resized input images as usually done in CNNs for image classification and quality assessment. The resolution independence is achieved by pyramid pooling. This work is the first that applies a fine-tuned residual deep learning network (ResNet-101) to BIQA. The training is carried out on a new and very large, labeled dataset of 10, 073 images (KonIQ-10k) that contains quality rating histograms besides the mean opinion scores (MOS). In contrast to previous methods we do not train to approximate the MOS directly, but rather use the distributions of scores. Experiments were carried out on three benchmark image quality databases. The results showed clear improvements of the accuracy of the estimated MOS values, compared to current state-of-the-art algorithms. We also report on the quality of the estimation of the score distributions.BibTeX
H. Men, H. Lin, and D. Saupe, “Spatiotemporal Feature Combination Model for No-Reference Video Quality Assessment,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2018, pp. 1–3. doi:
10.1109/QoMEX.2018.8463426.
Abstract
One of the main challenges in no-reference video quality assessment is temporal variation in a video. Methods typically were designed and tested on videos with artificial distortions, without considering spatial and temporal variations simultaneously. We propose a no-reference spatiotemporal feature combination model which extracts spatiotemporal information from a video, and tested it on a database with authentic distortions. Comparing with other methods, our model gave satisfying performance for assessing the quality of natural videos.BibTeX
V. Hosu
et al., “The Konstanz natural video database (KoNViD-1k).,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2017, pp. 1–6. doi:
10.1109/QoMEX.2017.7965673.
Abstract
Subjective video quality assessment (VQA) strongly depends on semantics, context, and the types of visual distortions. Currently, all existing VQA databases include only a small number of video sequences with artificial distortions. The development and evaluation of objective quality assessment methods would benefit from having larger datasets of real-world video sequences with corresponding subjective mean opinion scores (MOS), in particular for deep learning purposes. In addition, the training and validation of any VQA method intended to be `general purpose' requires a large dataset of video sequences that are representative of the whole spectrum of available video content and all types of distortions. We report our work on KoNViD-1k, a subjectively annotated VQA database consisting of 1,200 public-domain video sequences, fairly sampled from a large public video dataset, YFCC100m. We present the challenges and choices we have made in creating such a database aimed at `in the wild' authentic distortions, depicting a wide variety of content.BibTeX
M. Spicker, F. Hahn, T. Lindemeier, D. Saupe, and O. Deussen, “Quantifying Visual Abstraction Quality for Stipple Drawings,” in
Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR), ACM, Ed., in Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR). Association for Computing Machinery, 2017, pp. 8:1-8:10. [Online]. Available:
https://doi.org/http://dx.doi.org/10.1145/3092919.3092923Abstract
We investigate how the perceived abstraction quality of stipple illustrations is related to the number of points used to create them. Since it is difficult to find objective functions that quantify the visual quality of such illustrations, we gather comparative data by a crowdsourcing user study and employ a paired comparison model to deduce absolute quality values. Based on this study we show that it is possible to predict the perceived quality of stippled representations based on the properties of an input image. Our results are related to Weber-Fechner's law from psychophysics and indicate a logarithmic relation between numbers of points and perceived abstraction quality. We give guidance for the number of stipple points that is typically enough to represent an input image well.BibTeX
S. Egger-Lampl
et al., “Crowdsourcing Quality of Experience Experiments,” in
Information Systems and Applications, incl. Internet/Web, and HCI, vol. Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments. Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22 – 27, 2015, Revised Contributions, no. LNCS 10264, D. Archambault, H. Purchase, and T. Hossfeld, Eds., in Information Systems and Applications, incl. Internet/Web, and HCI, vol. Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments. Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22 – 27, 2015, Revised Contributions. , Springer International Publishing, 2017, pp. 154–190. doi:
10.1007/978-3-319-66435-4_7.
BibTeX
U. Gadiraju
et al., “Crowdsourcing Versus the Laboratory: Towards Human-centered Experiments Using the Crowd,” in
Information Systems and Applications, incl. Internet/Web, and HCI, vol. Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments. Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22 – 27, 2015, Revised Contributions, no. LNCS 10264, D. Archambault, H. Purchase, and T. Hossfeld, Eds., in Information Systems and Applications, incl. Internet/Web, and HCI, vol. Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments. Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22 – 27, 2015, Revised Contributions. , Springer International Publishing, 2017, pp. 6–26. doi:
10.1007/978-3-319-66435-4_2.
BibTeX
I. Zingman, D. Saupe, O. A. B. Penatti, and K. Lambers, “Detection of Fragmented Rectangular Enclosures in Very High Resolution Remote Sensing Images,”
IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, Art. no. 8, 2016, doi:
10.1109/TGRS.2016.2545919.
Abstract
We develop an approach for the detection of ruins of livestock enclosures (LEs) in alpine areas captured by high-resolution remotely sensed images. These structures are usually of approximately rectangular shape and appear in images as faint fragmented contours in complex background. We address this problem by introducing a rectangularity feature that quantifies the degree of alignment of an optimal subset of extracted linear segments with a contour of rectangular shape. The rectangularity feature has high values not only for perfectly regular enclosures but also for ruined ones with distorted angles, fragmented walls, or even a completely missing wall. Furthermore, it has a zero value for spurious structures with less than three sides of a perceivable rectangle. We show how the detection performance can be improved by learning a linear combination of the rectangularity and size features from just a few available representative examples and a large number of negatives. Our approach allowed detection of enclosures in the Silvretta Alps that were previously unknown. A comparative performance analysis is provided. Among other features, our comparison includes the state-of-the-art features that were generated by pretrained deep convolutional neural networks (CNNs). The deep CNN features, although learned from a very different type of images, provided the basic ability to capture the visual concept of the LEs. However, our handcrafted rectangularity-size features showed considerably higher performance.BibTeX
D. Saupe, F. Hahn, V. Hosu, I. Zingman, M. Rana, and S. Li, “Crowd Workers Proven Useful: A Comparative Study
of Subjective Video Quality Assessment,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). 2016, pp. 1–2. [Online]. Available:
https://www.uni-konstanz.de/mmsp/pubsys/publishedFiles/SaHaHo16.pdfAbstract
We carried out crowdsourced video quality as-sessments using paired comparisons and converting the resultsto differential mean opinion scores (DMOS). A previous lab-based study had provided corresponding MOS-values for absolutecategory ratings. Using a simple linear transformation to fit thecrowdsourcing-based DMOS values to the lab-based MOS values,we compared the results in terms of correlation coefficients andvisually checked the relationship on scatter plots. The comparisonresult is surprisingly good with correlation coefficients more than0.96, although (1) the original video sequences had to be croppedand downscaled in the crowdsourcing-based experiments, (2) thecontrol of the experimental setup for the crowdsourcing casewas much less and (3) it was widely believed that data fromcrowdsourcing workers are less reliable. Our result suggestscrowdsourcing workers can actually be used to collect reliableVQA data in some applications.BibTeX
V. Hosu, F. Hahn, I. Zingman, and D. Saupe, “Reported Attention as a Promising Alternative to Gaze in IQA Tasks,” in
Proceedings of the 5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016), in Proceedings of the 5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016). 2016, pp. 117–121. doi:
10.21437/PQS.2016-25.Abstract
We study the use of crowdsourcing for self-reported attention in image quality assessment (IQA) tasks. We present the results from two crowdsourcing campaigns: one where participants indicated via mouse clicks the image locations that influenced their rating of quality, and another where participants chose locations they looked at in a free-viewing setting. The results are compared to in-lab eye tracking experiments. Our analysis shows a strong connection between the in-lab and self-reported IQA locations. This suggests that crowdsourced studies are an affordable and valid alternative to eye tracking for IQA tasks.BibTeX
V. Hosu, F. Hahn, O. Wiedemann, S.-H. Jung, and D. Saupe, “Saliency-driven Image Coding Improves Overall Perceived JPEG Quality,” in
Proceedings of the Picture Coding Symposium (PCS), in Proceedings of the Picture Coding Symposium (PCS). IEEE, 2016, pp. 1–5. doi:
10.1109/PCS.2016.7906397.
Abstract
Saliency-driven image coding is well worth pursuing. Previous studies on JPEG and JPEG2000 have suggested that region-of-interest coding brings little overall benefit compared to the standard implementation. We show that our saliency-driven variable quantization JPEG coding method significantly improves perceived image quality. To validate our findings, we performed large crowdsourcing experiments involving several hundred contributors, on 44 representative images. To quantify the level of improvement, we devised an approach to equate Likert-type opinions to bitrate differences. Our saliency-driven coding showed 11% bpp average benefit over the standard JPEG.BibTeX