Publications
2025
- ArXivUnsupervised Segmentation by Diffusing, Walking and CuttingDaniela Ivanova, Marco Aversa, Paul Henderson, and John WilliamsonArXiv preprint, 2025
We propose an unsupervised image segmentation method using features from pre-trained text-to-image diffusion models. Inspired by classic spectral clustering approaches, we construct adjacency matrices from self-attention layers between image patches and recursively partition using Normalised Cuts. A key insight is that self-attention probability distributions, which capture semantic relations between patches, can be interpreted as a transition matrix for random walks across the image. We leverage this by first using Random Walk Normalized Cuts directly on these self-attention activations to partition the image, minimizing transition probabilities between clusters while maximizing coherence within clusters. Applied recursively, this yields a hierarchical segmentation that reflects the rich semantics in the pre-trained attention layers, without any additional training. Next, we explore other ways to build the NCuts adjacency matrix from features, and how we can use the random walk interpretation of self-attention to capture long-range relationships. Finally, we propose an approach to automatically determine the NCut cost criterion, avoiding the need to tune this manually. We quantitatively analyse the effect incorporating different features, a constant versus dynamic NCut threshold, and incorporating multi-node paths when constructing the NCuts adjacency matrix. We show that our approach surpasses all existing methods for zero-shot unsupervised segmentation, achieving state-of-the-art results on COCO-Stuff-27 and Cityscapes.
- WACVARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media DamageDaniela Ivanova, Marco Aversa, Paul Henderson, and John WilliamsonIEEE/CVF Winter Conference on Applications of Computer Vision , 2025
Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting degradation if the damage operator is known a priori, we show that they fail to robustly predict \emphwhere the damage is even after supervised training; thus, reliable damage detection remains a challenge. Motivated by this, we introduce ARTeFACT, a dataset for damage detection in diverse types analogue media, with over 11,000 annotations covering 15 kinds of damage across various subjects, media, and historical provenance. Furthermore, we contribute human-verified text prompts describing the semantic contents of the images, and derive additional textual descriptions of the annotated damage. We evaluate CNN, Transformer, diffusion-based segmentation models, and foundation vision models in zero-shot, supervised, unsupervised and text-guided settings, revealing their limitations in generalising across media types. By publicly sharing our dataset, we provide a benchmark for the analogue damage detection task, with the aim to advance developments in automated analogue media restoration and preservation.
2024
- NeurIPSIs One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation ModelsAthanasios Tragakis*, Marco Aversa*, Chaitanya Kaul, Roderick Murray-Smith, and Daniele FaccioAdvances in Neural Information Processing Systems, 2024
In this work, we introduce Pixelsmith, a zero-shot text-to-image generative framework to sample images at higher resolutions with a single GPU. We are the first to show that it is possible to scale the output of a pre-trained diffusion model by a factor of 1000, opening the road for gigapixel image generation at no additional cost. Our cascading method uses the image generated at the lowest resolution as a baseline to sample at higher resolutions. For the guidance, we introduce the Slider, a tunable mechanism that fuses the overall structure contained in the first-generated image with enhanced fine details. At each inference step, we denoise patches rather than the entire latent space, minimizing memory demands such that a single GPU can handle the process, regardless of the image’s resolution. Our experimental results show that Pixelsmith not only achieves higher quality and diversity compared to existing techniques, but also reduces sampling time and artifacts.
2023
- NeurIPS SpotlightDiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in HistopathologyMarco Aversa, Gabriel Nobis, Miriam Hägele, Kai Standvoss, Mihaela Chirica, Roderick Murray-Smith, and 5 more authorsIn Advances in Neural Information Processing Systems, 2023
We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts. The training leverages classifier-free guidance to augment a small, sparsely annotated dataset with unlabelled data. Our method alleviates unique challenges in histopathological imaging practice: large-scale information, costly manual annotation, and protective data handling. The biological plausibility of DiffInfinite data is validated in a survey by ten experienced pathologists as well as a downstream segmentation task. Furthermore, the model scores strongly on anti-copying metrics which is beneficial for the protection of patient data.
- TMLRData Models for Dataset Drift Controls in Machine Learning With Optical ImagesLuis Oala*, Marco Aversa*, Gabriel Nobis, Kurt Willis, Yoan Neuenschwander, Michèle Buck, and 5 more authorsTransactions on Machine Learning Research, 2023
Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important public services spanning medicine or environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models. We demonstrate how such data models can be constructed for image data and used to control downstream machine learning model performance related to dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases to power model selection and targeted generalization. Second, the gradient connection between machine learning task model and data model allows advanced, precise tolerancing of task model sensitivity to changes in the data generation. These drift forensics can be used to precisely specify the acceptable data environments in which a task model may be run. Third, drift optimization opens up the possibility to create drifts that can help the task model learn better faster, effectively optimizing the data generating process itself to support the downstream machine vision task. This is an interesting upgrade to existing imaging pipelines which traditionally have been optimized to be consumed by human users but not machine learning models. Alongside the data model code we release two datasets to the public that we collected as part of this work. In total, the two datasets, Raw-Microscopy and Raw-Drone, comprise 1,488 scientifically calibrated reference raw sensor measurements, 8,928 raw intensity variations as well as 17,856 images processed through twelve data models with different configurations. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.
2022
- OBPDCData-centric AI workflow based on compressed raw imagesMarco Aversa, Ziad Malik, Phillip Geier, Fabien Droz, Andres Upegui, Roderick Murray-Smith, and 2 more authorsProceedings of the OBPDC2022-8th Internationl Worshop on Onboard payload data compression, 2022
In order to extract the full potential of the high volume of image data coming from earth observation, image compression is needed for transfer and storage, and artificial intelligence (AI) is needed for analysis. The promise of AI is to perform complex operations with low programming effort, naturally shifting the focus of the development of machine learning systems from the code, i.e. the implementation of the neural network, to the training process, and in particular to the acquisition, selection and preparation of training data. Lossy compression (like many other image processing methods), however, was developed primarily to compress already processed images for visual inspection, not regarding damage to invisible image properties which play an important role in machine-learning, such as higher order statistics, correlations and bias. The Jetraw image format, in contrast, was designed to compress raw image data, preserving its statistics and embedding camera calibration profile and noise model. These features facilitate the generation of accurate raw synthetic data. They allow for “Jetraw functions” to take a Jetraw image as an argument and return another Jetraw image, complete with its newly computed calibration profile and noise model. Several of these functions can be chained to build complex operations while always maintaining metrologically correct data, i.e. values that have independent errors, are unbiased and have a well-defined noise model. Jetraw images and functions may be used in end-to-end models to generate synthetic data with statistics matching those of genuine raw images, and play an important role in data-centric AI methodologies. Here we show how these features are used for a machine-learning task: the segmentation of cars in an urban, suburban and rural environment. Starting from a drone and airship image dataset in the Jetraw format (with calibrated sensor and optics), we use an end-to-end model to emulate realistic satellite raw images with on-demand parameters. First, we study the effect of various satellite parameters on the task’s performance as well as on the compressed image size. These parameters are satellite mirror size, focal length, pixel size and pattern, exposure time and atmospheric haze. Then, we discuss characterising and improving the performance and tolerances of the neural network through the use of on-the-fly generation of data that accurately reflects the statistics of the target system.
2021
- Phys. Rev. Lett.Direct observation of fractal-dimensional percolation in the 3D cluster dynamics of a ferroelectric supercrystalLudovica Falsi, Marco Aversa, Fabrizio Di Mei, Davide Pierangeli, Feifei Xin, Aharon J Agranat, and 1 more authorPhysical Review Letters, 2021
We perform percolation analysis of crossed-polarizer transmission images in a biased nanodisordered bulk KTN: Li perovskite. Two distinct percolative transitions are identified at two electric field thresholds. The low-field transition involves a directional fractal chain of dimension D= 1.65, while the high-field transition has a dimension D> 2. Direct cluster imaging in the volume is achieved using high-resolution orthographic 3D projections based on giant refraction. Percolation is attributed to a full-3D domain reorientation that mediates the transition from a ferroelectric supercrystal state to a disordered domain mosaic.
Workshop Papers
2024
- ECCVState-of-the-Art Fails in the Art of Damage DetectionDaniela Ivanova, Marco Aversa, Paul Henderson, and John WilliamsonECCV 2024 Vision for Art Workshop VISART, 2024
Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting global degradation if the damage operator is known a priori, we show that they fail to predict where the damage is even after supervised training; thus, reliable damage detection remains a challenge. We introduce DamBench, a dataset for damage detection in diverse analogue media, with over 11,000 annotations covering 15 damage types across various subjects and media. We evaluate CNN, Transformer, and text-guided diffusion segmentation models, revealing their limitations in generalising across media types.
- ICMLGenerative Fractional Diffusion ModelsGabriel Nobis, Maximilian Springenberg, Marco Aversa, Michael Detzel, Stefano Ermon, Shinichi Nakajima, and 5 more authorsICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 2024
We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index H∈(0,1) of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation.
2023
- NeurIPSGenerative Fractional Diffusion ModelsGabriel Nobis, Maximilian Springenberg, Marco Aversa, Michael Detzel, Stefano Ermon, Shinichi Nakajima, and 5 more authorsNeurIPS 2023 Workshop on Diffusion Models, 2023
We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index H∈(0,1) of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation.
- ICMLData Models for Dataset Drift Controls in Machine Learning With Optical ImagesLuis Oala, Marco Aversa, Gabriel Nobis, Kurt Willis, Yoan Neuenschwander, Michèle Buck, and 5 more authorsInternational Conference on Machine Learning, Differentiable Almost Everything Workshop, 2023
Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important public services spanning medicine or environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models. We demonstrate how such data models can be constructed for image data and used to control downstream machine learning model performance related to dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases to power model selection and targeted generalization. Second, the gradient connection between machine learning task model and data model allows advanced, precise tolerancing of task model sensitivity to changes in the data generation. These drift forensics can be used to precisely specify the acceptable data environments in which a task model may be run. Third, drift optimization opens up the possibility to create drifts that can help the task model learn better faster, effectively optimizing the data generating process itself to support the downstream machine vision task. This is an interesting upgrade to existing imaging pipelines which traditionally have been optimized to be consumed by human users but not machine learning models. Alongside the data model code we release two datasets to the public that we collected as part of this work. In total, the two datasets, Raw-Microscopy and Raw-Drone, comprise 1,488 scientifically calibrated reference raw sensor measurements, 8,928 raw intensity variations as well as 17,856 images processed through twelve data models with different configurations. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.
- ICMLData Models for Dataset Drift Controls in Machine Learning With Optical ImagesLuis Oala, Marco Aversa, Gabriel Nobis, Kurt Willis, Yoan Neuenschwander, Michèle Buck, and 5 more authorsInternational Conference on Machine Learning, Spurious Correlations, Invariance, and Stability Workshop, 2023
Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important public services spanning medicine or environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models. We demonstrate how such data models can be constructed for image data and used to control downstream machine learning model performance related to dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases to power model selection and targeted generalization. Second, the gradient connection between machine learning task model and data model allows advanced, precise tolerancing of task model sensitivity to changes in the data generation. These drift forensics can be used to precisely specify the acceptable data environments in which a task model may be run. Third, drift optimization opens up the possibility to create drifts that can help the task model learn better faster, effectively optimizing the data generating process itself to support the downstream machine vision task. This is an interesting upgrade to existing imaging pipelines which traditionally have been optimized to be consumed by human users but not machine learning models. Alongside the data model code we release two datasets to the public that we collected as part of this work. In total, the two datasets, Raw-Microscopy and Raw-Drone, comprise 1,488 scientifically calibrated reference raw sensor measurements, 8,928 raw intensity variations as well as 17,856 images processed through twelve data models with different configurations. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.
2022
- NeurIPS Contributed TalkPhysical Data Models in Machine Learning Imaging PipelinesMarco Aversa, Luis Oala, Christoph Clausen, Roderick Murray-Smith, and Bruno SanguinettiAdvances in Neural Information Processing Systems, Machine Learning and the Physical Science Workshop, 2022
Light propagates from the object through the optics up to the sensor to create an image. Once the raw data is collected, it is processed through a complex image signal processing (ISP) pipeline to produce an image compatible with human perception. However, this processing is rarely considered in machine learning modelling because available benchmark data sets are generally not in raw format. This study shows how to embed the forward acquisition process into the machine learning model. We consider the optical system and the ISP separately. Following the acquisition process, we start from a drone and airship image dataset to emulate realistic satellite raw images with on-demand parameters. The end-to-end process is built to resemble the optics and sensor of the satellite setup. These parameters are satellite mirror size, focal length, pixel size and pattern, exposure time and atmospheric haze. After raw data collection, the ISP plays a crucial role in neural network robustness. We jointly optimize a parameterized differentiable image processing pipeline with a neural network model. This can lead to speed up and stabilization of classifier training at a margin of up to 20% in validation accuracy.