H. Lin, M. Jenadeleh, G. Chen, U. Reips, R. Hamzaoui, and D. Saupe, “Subjective Assessment of Global Picture-Wise Just Noticeable Difference,” in
Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), 2020, pp. 1–6, doi:
10.1109/ICMEW46912.2020.9106058.
Abstract
The picture-wise just noticeable difference (PJND) for a given image and a compression scheme is a statistical quantity giving the smallest distortion that a subject can perceive when the image is compressed with the compression scheme. The PJND is determined with subjective assessment tests for a sample of subjects. We introduce and apply two methods of adjustment where the subject interactively selects the distortion level at the PJND using either a slider or keystrokes. We compare the results and times required to those of the adaptive binary search type approach, in which image pairs with distortions that bracket the PJND are displayed and the difference in distortion levels is reduced until the PJND is identified. For the three methods, two images are compared using the flicker test in which the displayed images alternate at a frequency of 8 Hz. Unlike previous work, our goal is a global one, determining the PJND not only for the original pristine image but also for a sequence of compressed versions. Results for the MCL-JCI dataset show that the PJND measurements based on adjustment are comparable with those of the traditional approach using binary search, yet significantly faster. Moreover, we conducted a crowdsourcing study with side-byside comparisons and forced choice, which suggests that the flicker test is more sensitive than a side-by-side comparison.BibTeX
H. Men, V. Hosu, H. Lin, A. Bruhn, and D. Saupe, “Visual Quality Assessment for Interpolated Slow-Motion Videos Based on a Novel Database,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2020, pp. 1–6, doi:
10.1109/QoMEX48832.2020.9123096.
Abstract
Professional video editing tools can generate slow-motion video by interpolating frames from video recorded at astandard frame rate. Thereby the perceptual quality of such in-terpolated slow-motion videos strongly depends on the underlyinginterpolation techniques. We built a novel benchmark databasethat is specifically tailored for interpolated slow-motion videos(KoSMo-1k). It consists of 1,350 interpolated video sequences,from 30 different content sources, along with their subjectivequality ratings from up to ten subjective comparisons per videopair. Moreover, we evaluated the performance of twelve exist-ing full-reference (FR) image/video quality assessment (I/VQA)methods on the benchmark. In this way, we are able to show thatspecifically tailored quality assessment methods for interpolatedslow-motion videos are needed, since the evaluated methods –despite their good performance on real-time video databases – donot give satisfying results when it comes to frame interpolation.BibTeX
H. Lin, V. Hosu, and D. Saupe, “KADID-10k: A Large-scale Artificially Distorted IQA Database,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2019, pp. 1–3, doi:
10.1109/QoMEX.2019.8743252.
Abstract
Current artificially distorted image quality assessment (IQA) databases are small in size and limited in content. Larger IQA databases that are diverse in content could benefit the development of deep learning for IQA. We create two datasets, the Konstanz Artificially Distorted Image quality Database (KADID-10k) and the Konstanz Artificially Distorted Image quality Set (KADIS-700k). The former contains 81 pristine images, each degraded by 25 distortions in 5 levels. The latter has 140,000 pristine images, with 5 degraded versions each, where the distortions are chosen randomly. We conduct a subjective IQA crowdsourcing study on KADID-10k to yield 30 degradation category ratings (DCRs) per image. We believe that the annotated set KADID-10k, together with the unlabelled set KADIS-700k, can enable the full potential of deep learning based IQA methods by means of weakly-supervised learning.BibTeX
H. Men, H. Lin, V. Hosu, D. Maurer, A. Bruhn, and D. Saupe, “Visual Quality Assessment for Motion Compensated Frame Interpolation,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2019, pp. 1–6, doi:
10.1109/QoMEX.2019.8743221.
Abstract
Current benchmarks for optical flow algorithms evaluate the estimation quality by comparing their predicted flow field with the ground truth, and additionally may compare interpolated frames, based on these predictions, with the correct frames from the actual image sequences. For the latter comparisons, objective measures such as mean square errors are applied. However, for applications like image interpolation, the expected user's quality of experience cannot be fully deduced from such simple quality measures. Therefore, we conducted a subjective quality assessment study by crowdsourcing for the interpolated images provided in one of the optical flow benchmarks, the Middlebury benchmark. We used paired comparisons with forced choice and reconstructed absolute quality scale values according to Thurstone's model using the classical least squares method. The results give rise to a re-ranking of 141 participating algorithms w.r.t. visual quality of interpolated frames mostly based on optical flow estimation. Our re-ranking result shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks.BibTeX
V. Hosu, B. Goldlücke, and D. Saupe, “Effective Aesthetics Prediction with Multi-level Spatially Pooled Features,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9367–9375, 2019, doi:
10.1109/CVPR.2019.00960.
Abstract
We propose an effective deep learning approach to aesthetics quality assessment that relies on a new type of pre-trained features, and apply it to the AVA data set, the currently largest aesthetics database. While previous approaches miss some of the information in the original images, due to taking small crops, down-scaling or warping the originals during training, we propose the first method that efficiently supports full resolution images as an input, and can be trained on variable input sizes. This allows us to significantly improve upon the state of the art, increasing the Spearman rank-order correlation coefficient (SRCC) of ground-truth mean opinion scores (MOS) from the existing best reported of 0.612 to 0.756. To achieve this performance, we extract multi-level spatially pooled (MLSP) features from all convolutional blocks of a pre-trained InceptionResNet-v2 network, and train a custom shallow Convolutional Neural Network (CNN) architecture on these new features.BibTeX
C. Fan
et al., “SUR-Net: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Learning,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2019, pp. 1–6, doi:
10.1109/QoMEX.2019.8743204.
Abstract
The Satisfied User Ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, characterizes the probability distribution of the Just Noticeable Difference (JND) level, the smallest distortion level that can be perceived by a subject. We propose the first deep learning approach to predict such SUR curves. Instead of the direct approach of regressing the SUR curve itself for a given reference image, our model is trained on pairs of images, original and compressed. Relying on a Siamese Convolutional Neural Network (CNN), feature pooling, a fully connected regression-head, and transfer learning, we achieved a good prediction performance. Experiments on the MCL-JCI dataset showed a mean Bhattacharyya distance between the predicted and the original JND distributions of only 0.072.BibTeX
D. Varga, D. Saupe, and T. Szirányi, “DeepRN: A Content Preserving Deep Architecture for Blind Image Quality Assessment,” in
Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), 2018, pp. 1–6, doi:
10.1109/ICME.2018.8486528.
Abstract
This paper presents a blind image quality assessment (BIQA) method based on deep learning with convolutional neural networks (CNN). Our method is trained on full and arbitrarily sized images rather than small image patches or resized input images as usually done in CNNs for image classification and quality assessment. The resolution independence is achieved by pyramid pooling. This work is the first that applies a fine-tuned residual deep learning network (ResNet-101) to BIQA. The training is carried out on a new and very large, labeled dataset of 10, 073 images (KonIQ-10k) that contains quality rating histograms besides the mean opinion scores (MOS). In contrast to previous methods we do not train to approximate the MOS directly, but rather use the distributions of scores. Experiments were carried out on three benchmark image quality databases. The results showed clear improvements of the accuracy of the estimated MOS values, compared to current state-of-the-art algorithms. We also report on the quality of the estimation of the score distributions.BibTeX
H. Men, H. Lin, and D. Saupe, “Spatiotemporal Feature Combination Model for No-Reference Video Quality Assessment,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2018, pp. 1–3, doi:
10.1109/QoMEX.2018.8463426.
Abstract
One of the main challenges in no-reference video quality assessment is temporal variation in a video. Methods typically were designed and tested on videos with artificial distortions, without considering spatial and temporal variations simultaneously. We propose a no-reference spatiotemporal feature combination model which extracts spatiotemporal information from a video, and tested it on a database with authentic distortions. Comparing with other methods, our model gave satisfying performance for assessing the quality of natural videos.BibTeX
M. Jenadeleh, M. Pedersen, and D. Saupe, “Realtime Quality Assessment of Iris Biometrics Under Visible Light,” in
Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPRW), CVPR Workshops, 2018, pp. 443–452, doi:
10.1109/CVPRW.2018.00085.
Abstract
Ensuring sufficient quality of iris images acquired by handheld imaging devices in visible light poses many challenges to iris recognition systems. Many distortions affect the input iris images, and the source and types of these distortions are unknown in uncontrolled environments. We propose a fast no-reference image quality assessment measure for predicting iris image quality to handle severely degraded iris images. The proposed differential sign-magnitude statistics index (DSMI) is based on statistical features of the local difference sign-magnitude transform, which are computed by comparing the local mean with the central pixel of the patch and considering the noticeable variations. The experiments, conducted with a reference iris recognition system and three visible light datasets, showed that the quality of iris images strongly affects the recognition performance. Using the proposed method as a quality filtering step improved the performance of the iris recognition system by rejecting poor quality iris samples.BibTeX
V. Hosu, H. Lin, and D. Saupe, “Expertise Screening in Crowdsourcing Image Quality,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2018, pp. 276–281, doi:
https://dx.doi.org/10.1109/QoMEX.2018.8463427.
Abstract
We propose a screening approach to find reliable and effectively expert crowd workers in image quality assessment (IQA). Our method measures the users' ability to identify image degradations by using test questions, together with several relaxed reliability checks. We conduct multiple experiments, obtaining reproducible results with a high agreement between the expertise-screened crowd and the freelance experts of 0.95 Spearman rank order correlation (SROCC), with one restriction on the image type. Our contributions include a reliability screening method for uninformative users, a new type of test questions that rely on our proposed database 1 of pristine and artificially distorted images, a group agreement extrapolation method and an analysis of the crowdsourcing experiments.BibTeX
U. Gadiraju et al., “Crowdsourcing Versus the Laboratory: Towards Human-centered Experiments Using the Crowd,” in Information Systems and Applications, incl. Internet/Web, and HCI, vol. Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments. Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22 – 27, 2015, Revised Contributions, no. LNCS 10264, D. Archambault, H. Purchase, and T. Hossfeld, Eds. Springer International Publishing, 2017, pp. 6–26.
BibTeX
S. Egger-Lampl et al., “Crowdsourcing Quality of Experience Experiments,” in Information Systems and Applications, incl. Internet/Web, and HCI, vol. Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments. Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22 – 27, 2015, Revised Contributions, no. LNCS 10264, D. Archambault, H. Purchase, and T. Hossfeld, Eds. Springer International Publishing, 2017, pp. 154–190.
BibTeX
V. Hosu
et al., “The Konstanz natural video database (KoNViD-1k).,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2017, pp. 1–6, doi:
10.1109/QoMEX.2017.7965673.
Abstract
Subjective video quality assessment (VQA) strongly depends on semantics, context, and the types of visual distortions. Currently, all existing VQA databases include only a small number of video sequences with artificial distortions. The development and evaluation of objective quality assessment methods would benefit from having larger datasets of real-world video sequences with corresponding subjective mean opinion scores (MOS), in particular for deep learning purposes. In addition, the training and validation of any VQA method intended to be `general purpose' requires a large dataset of video sequences that are representative of the whole spectrum of available video content and all types of distortions. We report our work on KoNViD-1k, a subjectively annotated VQA database consisting of 1,200 public-domain video sequences, fairly sampled from a large public video dataset, YFCC100m. We present the challenges and choices we have made in creating such a database aimed at `in the wild' authentic distortions, depicting a wide variety of content.BibTeX
M. Spicker, F. Hahn, T. Lindemeier, D. Saupe, and O. Deussen, “Quantifying Visual Abstraction Quality for Stipple Drawings,” in
Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR), 2017, pp. 8:1-8:10, [Online]. Available:
https://doi.org/http://dx.doi.org/10.1145/3092919.3092923.
Abstract
We investigate how the perceived abstraction quality of stipple illustrations is related to the number of points used to create them. Since it is difficult to find objective functions that quantify the visual quality of such illustrations, we gather comparative data by a crowdsourcing user study and employ a paired comparison model to deduce absolute quality values. Based on this study we show that it is possible to predict the perceived quality of stippled representations based on the properties of an input image. Our results are related to Weber-Fechner's law from psychophysics and indicate a logarithmic relation between numbers of points and perceived abstraction quality. We give guidance for the number of stipple points that is typically enough to represent an input image well.BibTeX
I. Zingman, D. Saupe, O. A. B. Penatti, and K. Lambers, “Detection of Fragmented Rectangular Enclosures in Very High Resolution Remote Sensing Images,”
IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, Art. no. 8, 2016, doi:
10.1109/TGRS.2016.2545919.
Abstract
We develop an approach for the detection of ruins of livestock enclosures (LEs) in alpine areas captured by high-resolution remotely sensed images. These structures are usually of approximately rectangular shape and appear in images as faint fragmented contours in complex background. We address this problem by introducing a rectangularity feature that quantifies the degree of alignment of an optimal subset of extracted linear segments with a contour of rectangular shape. The rectangularity feature has high values not only for perfectly regular enclosures but also for ruined ones with distorted angles, fragmented walls, or even a completely missing wall. Furthermore, it has a zero value for spurious structures with less than three sides of a perceivable rectangle. We show how the detection performance can be improved by learning a linear combination of the rectangularity and size features from just a few available representative examples and a large number of negatives. Our approach allowed detection of enclosures in the Silvretta Alps that were previously unknown. A comparative performance analysis is provided. Among other features, our comparison includes the state-of-the-art features that were generated by pretrained deep convolutional neural networks (CNNs). The deep CNN features, although learned from a very different type of images, provided the basic ability to capture the visual concept of the LEs. However, our handcrafted rectangularity-size features showed considerably higher performance.BibTeX
D. Saupe, F. Hahn, V. Hosu, I. Zingman, M. Rana, and S. Li, “Crowd Workers Proven Useful: A Comparative Study
of Subjective Video Quality Assessment,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), 2016, pp. 1–2, [Online]. Available:
https://www.uni-konstanz.de/mmsp/pubsys/publishedFiles/SaHaHo16.pdf.
Abstract
We carried out crowdsourced video quality as-sessments using paired comparisons and converting the resultsto differential mean opinion scores (DMOS). A previous lab-based study had provided corresponding MOS-values for absolutecategory ratings. Using a simple linear transformation to fit thecrowdsourcing-based DMOS values to the lab-based MOS values,we compared the results in terms of correlation coefficients andvisually checked the relationship on scatter plots. The comparisonresult is surprisingly good with correlation coefficients more than0.96, although (1) the original video sequences had to be croppedand downscaled in the crowdsourcing-based experiments, (2) thecontrol of the experimental setup for the crowdsourcing casewas much less and (3) it was widely believed that data fromcrowdsourcing workers are less reliable. Our result suggestscrowdsourcing workers can actually be used to collect reliableVQA data in some applications.BibTeX
V. Hosu, F. Hahn, O. Wiedemann, S.-H. Jung, and D. Saupe, “Saliency-driven Image Coding Improves Overall Perceived JPEG Quality,” in
Proceedings of the Picture Coding Symposium (PCS), 2016, pp. 1–5, doi:
10.1109/PCS.2016.7906397.
Abstract
Saliency-driven image coding is well worth pursuing. Previous studies on JPEG and JPEG2000 have suggested that region-of-interest coding brings little overall benefit compared to the standard implementation. We show that our saliency-driven variable quantization JPEG coding method significantly improves perceived image quality. To validate our findings, we performed large crowdsourcing experiments involving several hundred contributors, on 44 representative images. To quantify the level of improvement, we devised an approach to equate Likert-type opinions to bitrate differences. Our saliency-driven coding showed 11% bpp average benefit over the standard JPEG.BibTeX
V. Hosu, F. Hahn, I. Zingman, and D. Saupe, “Reported Attention as a Promising Alternative to Gaze in IQA Tasks,” in
Proceedings of the 5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016), 2016, pp. 117–121, doi:
10.21437/PQS.2016-25.Abstract
We study the use of crowdsourcing for self-reported attention in image quality assessment (IQA) tasks. We present the results from two crowdsourcing campaigns: one where participants indicated via mouse clicks the image locations that influenced their rating of quality, and another where participants chose locations they looked at in a free-viewing setting. The results are compared to in-lab eye tracking experiments. Our analysis shows a strong connection between the in-lab and self-reported IQA locations. This suggests that crowdsourced studies are an affordable and valid alternative to eye tracking for IQA tasks.BibTeX