Proof-of-concept research

OpenSCS Foundation

Open foundation models for severe convective storm science.

OpenSCS is an open scientific AI initiative for modeling severe convective storms, including hail, thunderstorm winds, and tornadoes. The project combines atmospheric reanalysis, radar-derived products, forecast data, and storm reports to build reusable datasets, model weights, hindcast pipelines, benchmarks, and evaluation tools.

Early research prototype. Not intended for operational warnings.

Three-panel hail hazard climatology comparison.
2000–2015training period
2016–2025test period
3 hazardshail, wind, tornado

The strongest current evidence is spatial: the models recover broad severe-weather corridors and high-risk regions seen in observed/report-based climatology.

Current stageProof of concept
Training period2000–2015
Test period2016–2025
HazardsHail, thunderstorm wind, tornado
Intended useResearch, benchmarking, education, climate-risk analysis
Important limitationNot intended for operational warnings

Current stage: proof of concept. The models shown below are early research outputs evaluated on a held-out test period. They are intended for scientific research and benchmarking, not for issuing warnings.

System overview

Open severe-weather AI pipeline

OpenSCS connects atmospheric data, radar products, storm reports, model training, hindcast generation, and evaluation into a reproducible severe-weather AI workflow.

🌎

Inputs

  • ERA5 atmospheric fields
  • ECMWF forecasts and ensembles
  • MRMS radar products
  • NOAA Storm Events reports
🧠

Models

  • Peril-specific prototype models
  • Balanced severe-weather sampling
  • U-Net-style spatial models
  • Foundation-model research direction
📊

Outputs

  • Hindcast climatologies
  • Spatial evaluation metrics
  • Reliability diagnostics
  • Open benchmarks and dashboards
Motivation

Why OpenSCS matters

Severe-weather AI needs reproducible workflows, shared benchmarks, transparent evaluation, and clear documentation.

The research gap

Severe convective storms cause major societal and economic impacts, but severe-weather AI research is often fragmented across separate datasets, labeling methods, model code, evaluation scripts, and visualization tools. This makes it hard to compare models, reproduce results, and build shared benchmarks.

The OpenSCS goal

OpenSCS aims to make severe-weather AI more reproducible by developing open workflows for data preparation, label generation, model training, hindcast generation, evaluation, and documentation. The long-term goal is to support open scientific benchmarking for severe convective storm hazards.

  • Reproducible data and label generation
  • Open model training and hindcast workflows
  • Shared evaluation metrics and benchmark plots
  • Transparent model cards, data cards, and limitations
Prototype results

Prototype Severe-Weather Hazard Models

We evaluated prototype models for hail, thunderstorm wind, and tornado. The strongest early evidence is spatial: the models recover broad severe-weather corridors and high-risk regions seen in observed/report-based climatology. These results are preliminary and are intended to guide continued model development, calibration, and open benchmarking.

Hail Hazard Model

Selected prototype checkpoint
Magnitude model
Hail hazard model climatology comparison.

The hail model captures the main severe-hail corridor and reproduces many of the same high-risk regions seen in observed/report-based climatology. After filtering out sub-severe hail below 0.5 inch, the selected model shows strong spatial agreement, strong regional ranking skill, and low event-cell error among the evaluated hail checkpoints.

Spatial agreement 0.84 Map pattern match
Spatial ranking agreement 0.91 Risk ordering
Top-10% hotspot capture 0.71 High-risk regions found
Neighborhood agreement 0.85 Nearby spatial match
Event-cell MAE / RMSE 0.47 / 0.57 inch · lower is better

Figure summary Observed hail climatology, model hindcast climatology, and model output with observed hotspot overlay. The model captures the broad severe-hail corridor and many of the strongest hail-risk regions.

What this shows
  • The observed and model climatologies show similar broad hail-risk geography.
  • The model captures many of the strongest severe-hail regions.
  • The 0.5 inch filter focuses the evaluation on meaningful hail signal.
  • Remaining work includes calibration, false-alarm reduction, and higher-resolution event structure.

Thunderstorm Wind Hazard Model

Selected prototype checkpoint
Magnitude model
Thunderstorm wind hazard model climatology comparison.

The thunderstorm-wind model learns coherent regional wind-risk structure and captures many of the same high-risk areas found in observed/report-based climatology. The strongest result is spatial: the model recovers broad geographic risk patterns and high-risk wind corridors across the test period.

Spatial agreement 0.77 Map pattern match
Spatial ranking agreement 0.70 Risk ordering
Top-10% hotspot capture 0.70 High-risk regions found
Neighborhood agreement 0.89 Nearby spatial match
Event-cell MAE / RMSE 16.00 / 23.16 m/s · lower is better

Figure summary Observed thunderstorm-wind climatology, model hindcast climatology, and model output with observed hotspot overlay. The model shows strong neighborhood agreement for high-risk wind regions.

What this shows
  • The model captures broad regional wind-risk structure.
  • Neighborhood agreement is strong, meaning predicted high-risk areas are close to observed/report-based high-risk areas.
  • Error metrics are useful, but the clearest current signal is spatial.
  • Remaining work includes intensity calibration and false-alarm reduction.

Tornado Probability Model

Selected prototype checkpoint
Rare-event probability model
Tornado probability model climatology comparison.

Tornado prediction is the most challenging task because tornadoes are rare, localized events. Instead of treating this as a simple accuracy problem, we evaluate it as a rare-event probability task. The selected tornado model learns a meaningful broad spatial signal and shows the strongest probability skill among the evaluated tornado checkpoints, while calibration and false-alarm reduction remain active areas of development.

Spatial agreement 0.69 Broad probability signal
Spatial ranking agreement 0.46 Rare-event ordering
AUPRC 0.287 Rare-event skill
Brier score 2.6e-5 Probability error
Balanced Brier 0.072 Rare-event weighted probability error

Figure summary Observed tornado climatology, model probability climatology, and model output with observed hotspot overlay. Tornadoes are rare and highly localized, so this prototype emphasizes broad probability signal and rare-event ranking skill.

What this shows
  • Tornadoes are much rarer and more localized than hail or wind reports.
  • Probability metrics are more meaningful than raw accuracy or MAE/RMSE.
  • The model learns a broad tornado-risk signal.
  • Calibration and false-alarm reduction remain major next steps.
Metric guide

How to read the metrics

These explanations are written for non-technical readers. Higher is better for agreement, ranking, hotspot capture, and AUPRC. Lower is better for error and Brier scores.

Spatial agreement

Measures whether the model’s geographic risk map matches the observed/report-based map. A high value means the model is learning the right broad severe-weather regions.

Spatial ranking agreement

Measures whether the model ranks places correctly from lower risk to higher risk. This helps when exact magnitudes are imperfect but the model still identifies the right high-risk areas.

Top-10% hotspot capture

Looks at the strongest observed risk regions and asks how many of those hotspots are also captured by the model.

Neighborhood agreement

A spatially forgiving metric. It gives credit when the model places risk close to the observed region, even if it is not on the exact same grid cell.

MAE and RMSE

For hail and wind, MAE is the average size or speed error where event activity exists. RMSE is similar, but penalizes larger errors more strongly.

AUPRC

Area Under the Precision–Recall Curve. This is useful for rare events like tornadoes because normal accuracy can be misleading when most grid cells are non-events.

Brier score

A probability error score. Lower is better. It measures whether predicted probabilities match what actually happened.

Balanced Brier score

A Brier score adjusted to give rare tornado events more weight instead of being dominated by the huge number of non-event grid cells.

For hail and thunderstorm wind, MAE and RMSE help summarize magnitude error. For tornado, AUPRC and Brier score are more appropriate because tornadoes are rare probability events.
Candidate evaluation

Evaluation Summary Across Candidate Models

We evaluated multiple prototype checkpoints for each severe-weather peril. Across the evaluated candidates, the strongest and most consistent evidence is spatial. Hail and thunderstorm wind models recover broad risk corridors and high-risk regions, while tornado remains a rare-event probability task where calibration and false-alarm reduction are ongoing development priorities.

Cross-model spatial pattern agreement chart.Click to enlarge

Cross-Model Spatial Pattern Agreement

Compares how well candidate models reproduce broad observed/report-based spatial patterns.

Cross-model spatial ranking agreement chart.Click to enlarge

Cross-Model Spatial Ranking Agreement

Checks whether candidate models rank regions from lower to higher risk in a similar way to the observed/reference maps.

Cross-model neighborhood agreement chart.Click to enlarge

Cross-Model Neighborhood Agreement

Gives spatial credit when predicted high-risk areas are close to observed/report-based high-risk areas.

Tornado is evaluated differently from hail and thunderstorm wind because tornadoes are rare and highly localized. For tornado, AUPRC and Brier score are more informative than raw hotspot overlap alone.
Technical approach

Technical approach

OpenSCS is designed as a reproducible severe-weather AI workflow, from raw atmospheric and storm-report data through training, hindcast generation, and evaluation.

🛰️

Data integration

ERA5, ECMWF, MRMS, and NOAA Storm Events are aligned into geospatial tensors for severe-weather modeling.

🏷️

Label generation

Hail, tornado, and thunderstorm-wind labels are generated with metadata for uncertainty, event thresholds, and reporting bias.

⚙️

Model training

Prototype spatial models are trained using balanced sampling and distributed GPU workflows to improve learning from rare severe-weather examples.

📈

Evaluation

Hindcasts, climatology maps, spatial metrics, calibration diagnostics, and benchmark dashboards are used for model comparison.

Technical details

OpenSCS work includes ERA5 atmospheric variables, MRMS radar-derived products, NOAA Storm Events labels, hindcast generation, spatial climatology evaluation, and reliability diagnostics. Deeper implementation notes will be documented in model cards, data cards, and reproducibility guides.

Next phase

Roadmap

The current models are proof-of-concept prototypes. The next phase will focus on forecast-data integration, foundation-model training, open benchmark releases, calibration, uncertainty communication, and reproducible evaluation dashboards.

Immediate focus

Move from prototype maps to reproducible open benchmarks.

The next phase keeps the work research-first: stronger data integration, better calibration, clearer documentation, and practical evaluation tools that other teams can inspect and reuse.

01 Data expansion

ECMWF forecast and ensemble integration

Add forecast and ensemble data streams for future foundation-model experiments.

02 Modeling

Multi-source model training

Train broader severe-weather models across multiple data sources and hazards.

03 Benchmarking

Public benchmark release

Publish clear splits, metrics, plots, and scripts for reproducible comparisons.

04 Transparency

Model cards and data cards

Document intended use, caveats, uncertainty, reporting bias, and limitations.

05 Tools

Open-source dashboards

Build practical tools for maps, metrics, calibration, and model comparison.

06 Validation

Calibration and validation

Improve false-alarm reduction, probability calibration, and validation across years, regions, and event types.

Responsible use

Research tool, not an operational warning system.

OpenSCS is intended for scientific research, benchmarking, education, and climate-risk analysis. It should not replace official forecasts, watches, warnings, or emergency guidance.

  • Not for issuing public warnings
  • Not a replacement for national meteorological agencies
  • Intended for research and benchmarking
  • Requires calibration and uncertainty communication
  • Outputs should be interpreted with expert context

Public releases will include model cards, data cards, uncertainty diagnostics, calibration results, known limitations, and responsible-use guidance.

Collaboration

Collaborate on OpenSCS

OpenSCS is being developed as an open-science effort. We welcome collaboration around datasets, benchmarks, evaluation methods, model cards, documentation, and responsible deployment practices.

Datasets and labels Model benchmarking Evaluation metrics Radar and forecast data Responsible-use docs Open dashboards