
The Sequential Learning Group
A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking
Research Paper | Gal Fadlon, Idan Arbiv, Nimrod Berman, Omri Azencot
A novel two-step framework for generating realistic time series from irregular data using Time Series Transformers and vision-based diffusion models, achieving 70% improvement in discriminative score and 85% reduction in computational cost.
Abstract
Authors
*equal contribution
Abstract
Generating realistic time series data is critical for applications in healthcare, finance, and science. However, irregular sampling and missing values present significant challenges. While prior methods address these irregularities, they often yield suboptimal results and incur high computational costs. Recent advances in regular time series generation, such as the diffusion-based ImagenTime model, demonstrate strong, fast, and scalable generative capabilities by transforming time series into image representations, making them a promising solution. However, extending ImagenTime to irregular sequences using simple masking introduces 'unnatural' neighborhoods, where missing values replaced by zeros disrupt the learning process. To overcome this, we propose a novel two-step framework: first, a Time Series Transformer completes irregular sequences, creating natural neighborhoods; second, a vision-based diffusion model with masking minimizes dependence on the completed values. This approach leverages the strengths of both completion and masking, enabling robust and efficient generation of realistic time series. Our method achieves state-of-the-art performance, achieving a relative improvement in discriminative score by 70% and in computational cost by 85%.
Key Contributions
- We introduce a novel generative model for irregularly-sampled time series, leveraging vision-based diffusion approaches to efficiently and effectively handle sequences ranging from short to long lengths.
- In contrast to existing methods that assume completed information is drawn from the data distribution, we treat it as a weak conditioning signal and directly optimize on the observed signal using a masking strategy.
- Our approach achieves state-of-the-art performance across multiple generative tasks, delivering an average improvement of 70% in discriminative benchmarks while reducing computational requirements by 85% relative to competing methods.
The Challenge of Irregular Time Series
Time series data is essential in fields such as healthcare, finance, and science, supporting critical tasks like forecasting trends, detecting anomalies, and analyzing patterns. Beyond direct analysis, generating synthetic time series has become increasingly valuable for creating realistic proxies of private data, testing systems under new scenarios, exploring 'what-if' questions, and balancing datasets for training machine learning models. The ability to generate realistic sequences enables deeper insights and robust applications across diverse domains. In practice, however, time series data is often irregular, with missing values and unevenly spaced measurements. These irregularities arise from limitations in data collection processes, such as sensor failures, inconsistent sampling, or interruptions in monitoring systems.
Limitations of Existing Approaches
The synthesis of regular time series from irregular ones is a fundamental challenge, yet existing approaches remain scarce, with notable examples being GT-GAN and KoVAE. Unfortunately, these methods suffer from several limitations. First, they rely on generative adversarial networks (GANs) and variational autoencoders (VAEs), which have recently been surpassed in performance by diffusion-based tools. Second, both GT-GAN and KoVAE utilize a computationally-demanding preprocessing step based on neural controlled differential equations (NCDEs), rendering these methods impractical for long time series. For instance, KoVAE requires approximately 6.5 times more training time in comparison to our approach. Third, these methods inherently assume that the data, completed by NCDE, accurately reflects the true underlying distribution, which can introduce catastrophic errors when this assumption fails.
Our Two-Step Framework
To address these shortcomings, we base our approach on a recent diffusion model for time series, ImagenTime. This method maps time series data to images, enabling the use of powerful vision-based diffusion neural architectures. Leveraging a vision-based diffusion generator offers a significant advantage: regular time series can be generated from irregular ones using a straightforward masking mechanism. However, while this straightforward masking approach is simple and achieves strong results, we identify a significant limitation. Missing values in the time series are mapped to zeros in the image, resulting in 'unnatural' neighborhoods that mix valid and invalid information. To address this issue, we propose a two-step generation process. In the first step, we complete the irregular series using our adaptation of an efficient Time Series Transformer (TST) approach, significantly reducing computational overhead and enabling the generation of long time series. In the second step, we apply the straightforward masking approach described earlier.
Method Architecture

Figure: In the first step (top), we train a TST-based autoencoder, which we use during the second step (middle), where a vision diffusion model is trained with masking over non-active pixels. Inference (bottom) is done similarly to ImagenTime.
The Problem of Unnatural Image Neighborhoods
Unfortunately, the straightforward approach has a fundamental limitation: although non-active pixels are ignored during loss computation, they are still processed by the network. In practice, missing values are replaced with zeros, resulting in 'unnatural' pixel neighborhoods. Specifically, while zeros may occasionally occur in non-zero segments of a time series, their repeated presence is highly unlikely, leading to inconsistencies. In other words, masking is not applied at the architecture level, potentially hindering the effective learning of neural components. This can pose challenges for diffusion backbones, such as U-Nets with convolutional blocks, where the convolution kernels are not inherently masked and may inadvertently propagate errors from these artificial neighborhoods.
Combining Completion and Masking
To create more natural pixel neighborhoods while remaining agnostic to the underlying architecture, we draw inspiration from the two-step process utilized in GT-GAN and KoVAE. Our approach adopts a two-step training scheme. First, we complete the missing values in the irregularly-sampled time series using TST, producing a regularly-sampled sequence. Next, we transform the completed time series into an image and apply denoising as in ImagenTime, with a key distinction: we apply the mask to the completed pixels during the loss computation. This novel combination of completion and masking addresses the two primary challenges of processing irregular sequences. On one hand, it creates natural neighborhoods, enabling convolutional kernels to learn effectively from values that closely align with the true data distribution. On the other hand, it ensures that the completed values are not fully relied upon by excluding them from the loss computation via the mask, striking a balance between utilizing and mitigating incomplete information.
State-of-the-Art Performance
We conduct a comprehensive evaluation of our approach on standard irregular time series benchmarks, benchmarking it against state-of-the-art methods. Our model consistently demonstrates superior generative performance, effectively bridging the gap between regular and irregular settings. Furthermore, we extend the evaluation to medium-, long- and ultra-long-length sequence generation, assessing performance across 12 datasets and 12 tasks. The results highlight the robustness and efficiency of our method, achieving consistent improvements over existing approaches. Our approach achieves state-of-the-art performance across multiple generative tasks, delivering an average improvement of 70% in discriminative benchmarks while reducing computational requirements by 85% relative to competing methods.
Computational Efficiency
Our approach demonstrates significant computational advantages over existing methods. Unlike GT-GAN and KoVAE which rely on computationally-demanding NCDE preprocessing, our TST-based completion is much more efficient. KoVAE requires approximately 6.5 times more training time compared to our approach, as demonstrated in our training time analysis. The two-step framework enables effective modeling of long time series while making minimal assumptions about pre-completed data, resulting in significantly improved generation performance with reduced computational overhead. This efficiency is particularly important for practical applications where computational resources are limited.
Results & Comparison
Method Overview
Our two-step framework addresses the challenge of irregular time series generation by combining Time Series Transformer completion with vision-based diffusion models.
Step 1: TST Completion
Complete irregular sequences
Create natural neighborhoods
Efficient preprocessing
Step 2: Vision Diffusion
Mask-based denoising
Minimize dependence on completed values
Robust generation
Training Time Comparison
| Length | Model | ETTh1 | ETTh2 | ETTm1 | ETTm2 | Weather | Electricity | Energy | Sine | Mujoco |
|---|---|---|---|---|---|---|---|---|---|---|
| 24 | GT-GAN | 7.44 | 7.44 | 7.44 | 7.44 | 7.44 | 7.44 | 7.44 | 7.44 | 2.17 |
| KoVAE | 6.49 | 6.49 | 6.49 | 6.49 | 6.49 | 6.49 | 6.49 | 6.49 | 1.15 | |
| Ours | 1.28 | 1.28 | 1.28 | 1.28 | 1.28 | 1.28 | 1.28 | 1.28 | 0.60 | |
| 96 | KoVAE | 19.70 | 19.70 | 19.70 | 19.70 | 19.70 | 19.70 | 19.70 | 19.70 | - |
| Ours | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | - | |
| 768 | KoVAE | 31.53 | 31.53 | 31.53 | 31.53 | 31.53 | 31.53 | 31.53 | 31.53 | - |
| Ours | 5.38 | 5.38 | 5.38 | 5.38 | 5.38 | 5.38 | 5.38 | 5.38 | - |
Table 1: Training time (in hours) for sequence lengths (24, 96, and 768), averaged over 30%, 50%, and 70% missing rates. Our method demonstrates significantly faster training times compared to existing approaches.
Discriminative Time Analysis

Figure 4: Discriminative time analysis showing how our method maintains consistent performance across different time periods compared to baseline methods.
Quantitative Results
Step 1: Choose Sequence Length
Step 2: Choose Metric to View
Currently viewing: Sequence Length 24 - Discriminative Score
| Model | ETTh1 | ETTh2 | ETTm1 | ETTm2 | Weather | Electricity | Energy | Sine | Stock |
|---|---|---|---|---|---|---|---|---|---|
| Discriminative Score | Lower is better | ||||||||
| TimeGAN-Δt | 0.499 | 0.499 | 0.499 | 0.499 | 0.497 | 0.499 | 0.474 | 0.497 | 0.479 |
| GT-GAN | 0.471 | 0.369 | 0.412 | 0.366 | 0.481 | 0.427 | 0.325 | 0.338 | 0.249 |
| KoVAE | 0.197 | 0.081 | 0.05 | 0.067 | 0.332 | 0.498 | 0.323 | 0.043 | 0.118 |
| Ours | 0.037 | 0.009 | 0.012 | 0.011 | 0.057 | 0.384 | 0.08 | 0.01 | 0.008 |
Table 2: Averaged results over 30%, 50%, 70% missing rates for sequence length 24. Lower values are better. Our method consistently achieves state-of-the-art performance across all evaluation metrics and datasets.
Qualitative Evaluation

Figure 1: 2D t-SNE embeddings and probability density functions comparing real data vs synthetic data from our method and KoVAE. Our approach generates more realistic data distributions that closely match the original data patterns.
Ablation Studies
Completion Strategy Ablation
Imputation Methods Explained
GN → NaN
Gaussian noise completion - fills missing values with Gaussian noise
0 → NaN
Zero-filling - replaces missing values with zeros
LI
Linear interpolation - estimates missing values using linear interpolation
PI
Polynomial interpolation - uses polynomial fitting for missing value estimation
SI
Stochastic imputation - samples from Gaussian distribution fitted to non-missing values
NCDE
Neural Controlled Differential Equations - advanced learning-based imputation
CSDI
Conditional Score-based Diffusion Imputation - diffusion-based imputation method
GRU-D
GRU with Decay - recurrent neural network with decay mechanism for missing values
Ours (TST)
Time Series Transformer - our proposed lightweight and efficient completion method
This ablation study compares different imputation strategies for handling missing values. Simple methods (GN, zero-filling) create unnatural neighborhoods, while advanced methods (NCDE, CSDI) are computationally expensive. Our TST approach achieves the best balance of performance and efficiency.
| Model | Energy (Disc.) | Stock (Disc.) | Energy (Pred.) | Stock (Pred.) |
|---|---|---|---|---|
| GN → NaN | 0.457 | 0.102 | 0.058 | 0.014 |
| 0 → NaN | 0.269 | 0.158 | 0.051 | 0.014 |
| LI | 0.251 | 0.013 | 0.049 | 0.019 |
| PI | 0.201 | 0.012 | 0.053 | 0.016 |
| NCDE | 0.102 | 0.013 | 0.058 | 0.013 |
| CSDI | 0.088 | 0.012 | 0.048 | 0.013 |
| SI | 0.069 | 0.010 | 0.047 | 0.013 |
| GRU-D | 0.158 | 0.014 | 0.055 | 0.015 |
| Ours (TST) | 0.065 | 0.007 | 0.047 | 0.012 |
Completion Strategy Ablation: Comparison of different imputation methods with 50% drop-rate on Energy and Stock datasets. Our TST-based completion strategy achieves the best performance across both discriminative and predictive metrics.
Method Ablation
Method Variants Explained
KoVAE + TST
KoVAE baseline with Time Series Transformer completion preprocessing
TimeAutoDiff + TST
TimeAutoDiff baseline with Time Series Transformer completion preprocessing
TransFusion + TST
TransFusion baseline with Time Series Transformer completion preprocessing
Ours (Mask Only)
Our method using only the masking strategy without TST completion
Ours (Without Mask)
Our method using TST completion but without the masking strategy
Ours (Full)
Our complete method with both TST completion and masking strategy
This ablation study demonstrates the contribution of each component in our framework. The results show that both TST completion and masking strategy are essential for optimal performance.
Choose Sequence Length:
Currently viewing: Method Ablation for Sequence Length 24
| Model | Energy (30%) | Stock (30%) | Energy (50%) | Stock (50%) | Energy (70%) | Stock (70%) |
|---|---|---|---|---|---|---|
| KoVAE + TST | 0.399 | 0.109 | 0.407 | 0.064 | 0.408 | 0.037 |
| TimeAutoDiff + TST | 0.293 | 0.100 | 0.329 | 0.101 | 0.468 | 0.375 |
| TransFusion + TST | 0.201 | 0.050 | 0.279 | 0.058 | 0.423 | 0.065 |
| Ours (Mask Only) | 0.157 | 0.087 | 0.269 | 0.168 | 0.372 | 0.237 |
| Ours (Without Mask) | 0.158 | 0.025 | 0.307 | 0.045 | 0.444 | 0.013 |
| Ours | 0.048 | 0.007 | 0.065 | 0.007 | 0.128 | 0.007 |
Method Ablation: Discriminative scores comparing different method components for sequence length 24 with 30%, 50%, and 70% drop-rates on Energy and Stock datasets. Our full method consistently outperforms all ablation variants.
Noise Robustness
| Noise Level | Model | Weather (Disc.) | Weather (Pred.) | ETTh1 (Disc.) | ETTh1 (Pred.) | Stock (Disc.) | Stock (Pred.) | Energy (Disc.) | Energy (Pred.) |
|---|---|---|---|---|---|---|---|---|---|
| 0.1 | KoVAE | 0.426 | 0.056 | 0.225 | 0.073 | 0.235 | 0.016 | 0.434 | 0.067 |
| Ours | 0.061 | 0.052 | 0.024 | 0.034 | 0.007 | 0.012 | 0.065 | 0.047 | |
| 0.15 | KoVAE | 0.488 | 0.092 | 0.377 | 0.077 | 0.341 | 0.092 | 0.493 | 0.093 |
| Ours | 0.416 | 0.029 | 0.407 | 0.059 | 0.282 | 0.023 | 0.467 | 0.053 | |
| 0.2 | KoVAE | 0.491 | 0.096 | 0.440 | 0.084 | 0.352 | 0.121 | 0.496 | 0.123 |
| Ours | 0.485 | 0.035 | 0.456 | 0.062 | 0.340 | 0.027 | 0.457 | 0.057 |
Table 2: Discriminative and predictive scores for 50% missing rate on Weather, ETTh1, Stock, and Energy datasets with injected noise levels (0.1, 0.15, and 0.2). Our method demonstrates superior robustness across different noise levels.
Cite Us
BibTeX Citation
@inproceedings{fadlon2025diffusionmodelregulartimeseries,
title={A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking},
author={Gal Fadlon and Idan Arbiv and Nimrod Berman and Omri Azencot},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2025},
url={https://neurips.cc/virtual/2025/poster/118491}
}