The Sequential Learning Group

A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking

Research Paper | Gal Fadlon, Idan Arbiv, Nimrod Berman, Omri Azencot

A novel two-step framework for generating realistic time series from irregular data using Time Series Transformers and vision-based diffusion models, achieving 70% improvement in discriminative score and 85% reduction in computational cost.

GitHub ArXiv

Abstract

Authors

Gal Fadlon*🐪 Ben Gurion University of the Negev

Idan Arbiv*🐪 Ben Gurion University of the Negev

Nimrod Berman🐪 Ben Gurion University of the Negev

Omri Azencot🐪 Ben Gurion University of the Negev

*equal contribution

Links

NeurIPS 2025 GitHub ArXiv

Abstract

Generating realistic time series data is critical for applications in healthcare, finance, and science. However, irregular sampling and missing values present significant challenges. While prior methods address these irregularities, they often yield suboptimal results and incur high computational costs. Recent advances in regular time series generation, such as the diffusion-based ImagenTime model, demonstrate strong, fast, and scalable generative capabilities by transforming time series into image representations, making them a promising solution. However, extending ImagenTime to irregular sequences using simple masking introduces 'unnatural' neighborhoods, where missing values replaced by zeros disrupt the learning process. To overcome this, we propose a novel two-step framework: first, a Time Series Transformer completes irregular sequences, creating natural neighborhoods; second, a vision-based diffusion model with masking minimizes dependence on the completed values. This approach leverages the strengths of both completion and masking, enabling robust and efficient generation of realistic time series. Our method achieves state-of-the-art performance, achieving a relative improvement in discriminative score by 70% and in computational cost by 85%.

Key Contributions

We introduce a novel generative model for irregularly-sampled time series, leveraging vision-based diffusion approaches to efficiently and effectively handle sequences ranging from short to long lengths.
In contrast to existing methods that assume completed information is drawn from the data distribution, we treat it as a weak conditioning signal and directly optimize on the observed signal using a masking strategy.
Our approach achieves state-of-the-art performance across multiple generative tasks, delivering an average improvement of 70% in discriminative benchmarks while reducing computational requirements by 85% relative to competing methods.

The Challenge of Irregular Time Series

Time series data is essential in fields such as healthcare, finance, and science, supporting critical tasks like forecasting trends, detecting anomalies, and analyzing patterns. Beyond direct analysis, generating synthetic time series has become increasingly valuable for creating realistic proxies of private data, testing systems under new scenarios, exploring 'what-if' questions, and balancing datasets for training machine learning models. The ability to generate realistic sequences enables deeper insights and robust applications across diverse domains. In practice, however, time series data is often irregular, with missing values and unevenly spaced measurements. These irregularities arise from limitations in data collection processes, such as sensor failures, inconsistent sampling, or interruptions in monitoring systems.

Limitations of Existing Approaches

The synthesis of regular time series from irregular ones is a fundamental challenge, yet existing approaches remain scarce, with notable examples being GT-GAN and KoVAE. Unfortunately, these methods suffer from several limitations. First, they rely on generative adversarial networks (GANs) and variational autoencoders (VAEs), which have recently been surpassed in performance by diffusion-based tools. Second, both GT-GAN and KoVAE utilize a computationally-demanding preprocessing step based on neural controlled differential equations (NCDEs), rendering these methods impractical for long time series. For instance, KoVAE requires approximately 6.5 times more training time in comparison to our approach. Third, these methods inherently assume that the data, completed by NCDE, accurately reflects the true underlying distribution, which can introduce catastrophic errors when this assumption fails.

Our Two-Step Framework

To address these shortcomings, we base our approach on a recent diffusion model for time series, ImagenTime. This method maps time series data to images, enabling the use of powerful vision-based diffusion neural architectures. Leveraging a vision-based diffusion generator offers a significant advantage: regular time series can be generated from irregular ones using a straightforward masking mechanism. However, while this straightforward masking approach is simple and achieves strong results, we identify a significant limitation. Missing values in the time series are mapped to zeros in the image, resulting in 'unnatural' neighborhoods that mix valid and invalid information. To address this issue, we propose a two-step generation process. In the first step, we complete the irregular series using our adaptation of an efficient Time Series Transformer (TST) approach, significantly reducing computational overhead and enabling the generation of long time series. In the second step, we apply the straightforward masking approach described earlier.

Method Architecture

Two-step framework architecture showing TST completion and vision diffusion process

Figure: In the first step (top), we train a TST-based autoencoder, which we use during the second step (middle), where a vision diffusion model is trained with masking over non-active pixels. Inference (bottom) is done similarly to ImagenTime.

The Problem of Unnatural Image Neighborhoods

Unfortunately, the straightforward approach has a fundamental limitation: although non-active pixels are ignored during loss computation, they are still processed by the network. In practice, missing values are replaced with zeros, resulting in 'unnatural' pixel neighborhoods. Specifically, while zeros may occasionally occur in non-zero segments of a time series, their repeated presence is highly unlikely, leading to inconsistencies. In other words, masking is not applied at the architecture level, potentially hindering the effective learning of neural components. This can pose challenges for diffusion backbones, such as U-Nets with convolutional blocks, where the convolution kernels are not inherently masked and may inadvertently propagate errors from these artificial neighborhoods.

Combining Completion and Masking

To create more natural pixel neighborhoods while remaining agnostic to the underlying architecture, we draw inspiration from the two-step process utilized in GT-GAN and KoVAE. Our approach adopts a two-step training scheme. First, we complete the missing values in the irregularly-sampled time series using TST, producing a regularly-sampled sequence. Next, we transform the completed time series into an image and apply denoising as in ImagenTime, with a key distinction: we apply the mask to the completed pixels during the loss computation. This novel combination of completion and masking addresses the two primary challenges of processing irregular sequences. On one hand, it creates natural neighborhoods, enabling convolutional kernels to learn effectively from values that closely align with the true data distribution. On the other hand, it ensures that the completed values are not fully relied upon by excluding them from the loss computation via the mask, striking a balance between utilizing and mitigating incomplete information.

State-of-the-Art Performance

We conduct a comprehensive evaluation of our approach on standard irregular time series benchmarks, benchmarking it against state-of-the-art methods. Our model consistently demonstrates superior generative performance, effectively bridging the gap between regular and irregular settings. Furthermore, we extend the evaluation to medium-, long- and ultra-long-length sequence generation, assessing performance across 12 datasets and 12 tasks. The results highlight the robustness and efficiency of our method, achieving consistent improvements over existing approaches. Our approach achieves state-of-the-art performance across multiple generative tasks, delivering an average improvement of 70% in discriminative benchmarks while reducing computational requirements by 85% relative to competing methods.

Computational Efficiency

Our approach demonstrates significant computational advantages over existing methods. Unlike GT-GAN and KoVAE which rely on computationally-demanding NCDE preprocessing, our TST-based completion is much more efficient. KoVAE requires approximately 6.5 times more training time compared to our approach, as demonstrated in our training time analysis. The two-step framework enables effective modeling of long time series while making minimal assumptions about pre-completed data, resulting in significantly improved generation performance with reduced computational overhead. This efficiency is particularly important for practical applications where computational resources are limited.

Results & Comparison

Method Overview

Our two-step framework addresses the challenge of irregular time series generation by combining Time Series Transformer completion with vision-based diffusion models.

Step 1: TST Completion

Complete irregular sequences

Create natural neighborhoods

Efficient preprocessing

Step 2: Vision Diffusion

Mask-based denoising

Minimize dependence on completed values

Robust generation

Training Time Comparison

Length	Model	ETTh1	ETTh2	ETTm1	ETTm2	Weather	Electricity	Energy	Sine	Mujoco
24	GT-GAN	7.44	7.44	7.44	7.44	7.44	7.44	7.44	7.44	2.17
	KoVAE	6.49	6.49	6.49	6.49	6.49	6.49	6.49	6.49	1.15
	Ours	1.28	1.28	1.28	1.28	1.28	1.28	1.28	1.28	0.60
96	KoVAE	19.70	19.70	19.70	19.70	19.70	19.70	19.70	19.70	-
96	Ours	1.52	1.52	1.52	1.52	1.52	1.52	1.52	1.52	-
768	KoVAE	31.53	31.53	31.53	31.53	31.53	31.53	31.53	31.53	-
768	Ours	5.38	5.38	5.38	5.38	5.38	5.38	5.38	5.38	-

Table 1: Training time (in hours) for sequence lengths (24, 96, and 768), averaged over 30%, 50%, and 70% missing rates. Our method demonstrates significantly faster training times compared to existing approaches.

Discriminative Time Analysis

Figure 4: Discriminative time analysis showing how our method maintains consistent performance across different time periods compared to baseline methods.

Quantitative Results

Step 1: Choose Sequence Length

Step 2: Choose Metric to View

Currently viewing: Sequence Length 24 - Discriminative Score

Model	ETTh1	ETTh2	ETTm1	ETTm2	Weather	Electricity	Energy	Sine	Stock
Discriminative Score	Lower is better
TimeGAN-Δt	0.499	0.499	0.499	0.499	0.497	0.499	0.474	0.497	0.479
GT-GAN	0.471	0.369	0.412	0.366	0.481	0.427	0.325	0.338	0.249
KoVAE	0.197	0.081	0.05	0.067	0.332	0.498	0.323	0.043	0.118
Ours	0.037	0.009	0.012	0.011	0.057	0.384	0.08	0.01	0.008

Table 2: Averaged results over 30%, 50%, 70% missing rates for sequence length 24. Lower values are better. Our method consistently achieves state-of-the-art performance across all evaluation metrics and datasets.

Qualitative Evaluation

Figure 1: 2D t-SNE embeddings and probability density functions comparing real data vs synthetic data from our method and KoVAE. Our approach generates more realistic data distributions that closely match the original data patterns.

Ablation Studies

Completion Strategy Ablation

Imputation Methods Explained

GN → NaN

Gaussian noise completion - fills missing values with Gaussian noise

0 → NaN

Zero-filling - replaces missing values with zeros

LI

Linear interpolation - estimates missing values using linear interpolation

PI

Polynomial interpolation - uses polynomial fitting for missing value estimation

SI

Stochastic imputation - samples from Gaussian distribution fitted to non-missing values

NCDE

Neural Controlled Differential Equations - advanced learning-based imputation

CSDI

Conditional Score-based Diffusion Imputation - diffusion-based imputation method

GRU-D

GRU with Decay - recurrent neural network with decay mechanism for missing values

Ours (TST)

Time Series Transformer - our proposed lightweight and efficient completion method

This ablation study compares different imputation strategies for handling missing values. Simple methods (GN, zero-filling) create unnatural neighborhoods, while advanced methods (NCDE, CSDI) are computationally expensive. Our TST approach achieves the best balance of performance and efficiency.

Model	Energy (Disc.)	Stock (Disc.)	Energy (Pred.)	Stock (Pred.)
GN → NaN	0.457	0.102	0.058	0.014
0 → NaN	0.269	0.158	0.051	0.014
LI	0.251	0.013	0.049	0.019
PI	0.201	0.012	0.053	0.016
NCDE	0.102	0.013	0.058	0.013
CSDI	0.088	0.012	0.048	0.013
SI	0.069	0.010	0.047	0.013
GRU-D	0.158	0.014	0.055	0.015
Ours (TST)	0.065	0.007	0.047	0.012

Completion Strategy Ablation: Comparison of different imputation methods with 50% drop-rate on Energy and Stock datasets. Our TST-based completion strategy achieves the best performance across both discriminative and predictive metrics.

Method Ablation

Method Variants Explained

KoVAE + TST

KoVAE baseline with Time Series Transformer completion preprocessing

TimeAutoDiff + TST

TimeAutoDiff baseline with Time Series Transformer completion preprocessing

TransFusion + TST

TransFusion baseline with Time Series Transformer completion preprocessing

Ours (Mask Only)

Our method using only the masking strategy without TST completion

Ours (Without Mask)

Our method using TST completion but without the masking strategy

Ours (Full)

Our complete method with both TST completion and masking strategy

This ablation study demonstrates the contribution of each component in our framework. The results show that both TST completion and masking strategy are essential for optimal performance.

Choose Sequence Length:

Currently viewing: Method Ablation for Sequence Length 24

Model	Energy (30%)	Stock (30%)	Energy (50%)	Stock (50%)	Energy (70%)	Stock (70%)
KoVAE + TST	0.399	0.109	0.407	0.064	0.408	0.037
TimeAutoDiff + TST	0.293	0.100	0.329	0.101	0.468	0.375
TransFusion + TST	0.201	0.050	0.279	0.058	0.423	0.065
Ours (Mask Only)	0.157	0.087	0.269	0.168	0.372	0.237
Ours (Without Mask)	0.158	0.025	0.307	0.045	0.444	0.013
Ours	0.048	0.007	0.065	0.007	0.128	0.007

Method Ablation: Discriminative scores comparing different method components for sequence length 24 with 30%, 50%, and 70% drop-rates on Energy and Stock datasets. Our full method consistently outperforms all ablation variants.

Noise Robustness

Noise Level	Model	Weather (Disc.)	Weather (Pred.)	ETTh1 (Disc.)	ETTh1 (Pred.)	Stock (Disc.)	Stock (Pred.)	Energy (Disc.)	Energy (Pred.)
0.1	KoVAE	0.426	0.056	0.225	0.073	0.235	0.016	0.434	0.067
0.1	Ours	0.061	0.052	0.024	0.034	0.007	0.012	0.065	0.047
0.15	KoVAE	0.488	0.092	0.377	0.077	0.341	0.092	0.493	0.093
0.15	Ours	0.416	0.029	0.407	0.059	0.282	0.023	0.467	0.053
0.2	KoVAE	0.491	0.096	0.440	0.084	0.352	0.121	0.496	0.123
0.2	Ours	0.485	0.035	0.456	0.062	0.340	0.027	0.457	0.057

Table 2: Discriminative and predictive scores for 50% missing rate on Weather, ETTh1, Stock, and Energy datasets with injected noise levels (0.1, 0.15, and 0.2). Our method demonstrates superior robustness across different noise levels.

Cite Us

BibTeX Citation

@inproceedings{fadlon2025diffusionmodelregulartimeseries,
      title={A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking}, 
      author={Gal Fadlon and Idan Arbiv and Nimrod Berman and Omri Azencot},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
      year={2025},
      url={https://neurips.cc/virtual/2025/poster/118491}
}

Quick Links

Read on ArXiv View Code