# Perovskite Atlas — Public Corpus Snapshot

Static export of the **public curated** experiment subset served by the Perovskite Atlas portal.
Generated at `2026-06-02T11:41:04+00:00` · schema version `1` · **39557** rows.

## Files

| File | Description |
|------|-------------|
| `atlas-public-snapshot.csv` | UTF-8 CSV with header row (units noted in column names where applicable) |
| `atlas-public-snapshot.json` | JSON array of objects (same fields as CSV) |
| `atlas-public-snapshot.meta.json` | Row count, generation time, schema version, license summary |
| `atlas-public-snapshot-README.md` | This data dictionary |

## Scope & exclusions

- **Included:** `source_type = public_curated` and `visibility = public` (same filter as `/portal/search`).
- **Excluded:** `user_private` / `own` rows, unpublished work, local-only SQLite corpus, and any non-public Supabase fields.
- Rows with `metric_qc_flag = needs_review` are **included** but flagged (typically PCE stored in 0–1% fraction form). They remain in the snapshot for provenance; portal rankings may exclude them.

## Column dictionary

| Column | Unit / type | Meaning |
|--------|-------------|---------|
| `id` | UUID | Stable experiment identifier in Supabase |
| `title` | text | Short experiment title |
| `doi` | text | Digital Object Identifier (may be bare DOI or URL) |
| `year` | integer | Publication or report year |
| `journal` | text | Journal or venue name |
| `architecture` | text | Device architecture (e.g. n-i-p, p-i-n, tandem) |
| `target_bandgap_ev` | eV | Target absorber bandgap from experiment metadata |
| `composition_formula` | text | Perovskite composition formula |
| `htl` | text | Hole transport layer description |
| `etl` | text | Electron transport layer description |
| `additives` | text (semicolon-separated in CSV) | Additive list from curation |
| `process_summary` | text | Short process description |
| `provenance_public_ref` | URL/text | Public provenance link (DOI URL, NOMAD entry, etc.) |
| `confidence` | text | Curator confidence label |
| `best_pce_percent` | % | Best power conversion efficiency across device results |
| `best_voc_volts` | V | Best open-circuit voltage |
| `best_jsc_ma_cm2` | mA/cm² | Best short-circuit current density |
| `best_ff_percent` | % | Best fill factor |
| `best_bandgap_ev` | eV | Best measured bandgap from metrics |
| `best_voc_deficit_ev` | eV | `target_bandgap_ev − best_voc` where both are valid |
| `metric_qc_flag` | text | `needs_review` when PCE is in (0, 1] (possible unit issue); otherwise null |

## Provenance & upstream licenses

Data are aggregated from published literature and public databases, including:

- **NOMAD** — https://nomad-lab.eu (CC BY 4.0)
- **Literature DOIs** — via Crossref/OpenAlex metadata (cite original papers)
- **NREL** efficiency benchmarks — https://www.nrel.gov (public domain; attribute NREL)
- **Materials Project** — https://materialsproject.org (CC BY 4.0 where used)

Honor each upstream license and cite the **original sources** (DOI/NOMAD) for any row you use in publications.

## License (snapshot bundle)

The **Perovskite Atlas application code** is MIT-licensed. This **dataset snapshot** is a composite of third-party
curated facts; it is distributed under **CC BY 4.0** for the curated export layer, consistent with NOMAD and
Materials Project terms. The snapshot does not grant rights beyond what upstream sources provide.

## How to cite

> Perovskite Atlas Public Corpus Snapshot (schema v1), generated 2026-06-02.
> https://perovskite-atlas.pages.dev/portal/download — 39557 public curated experiments.

Also cite the primary sources (DOI or NOMAD entry) for specific experiments you reference.
