Commit graph

2 commits

Author SHA1 Message Date
Aric Camarata
ada08e7ec4 data: expand dataset from 5.9k to 91k records via 6 new SQM sources
Add 6 new data collection pipelines and their processed outputs:

Sources added:
- TESS/Stars4All photometer network: 37 months (Jun 2017-Aug 2020),
  ~40k raw events from 100+ European stations via Zenodo archives
- Globe at Night citizen science: 26k twilight observations (2006-2024),
  filtered from 308k total observations for solar depression 6-22 deg
- GaN-MN continuous monitoring: 45 months (Jan 2022-Sep 2025),
  ~12.5k twilight events from 88 stations across 20+ countries
- Galicia SQM network: 14 stations, 1-min resolution, 7.5k events
- Madrid/Majadahonda SQM: multi-year continuous monitoring, 3.1k events
- washetdonker.nl Netherlands: 7 stations, 3.3k morning events
- Academic papers: Jordan (Abed 2015), Fayum Egypt, India photometer

Pipeline changes:
- ingest.py: add all new files to APPROVED_RAW_CSVS allowlist,
  fix filter to use allowlist instead of hardcoded exclusions
- .gitignore: exclude bulk raw data directories (BSRN, TESS, GaN-MN,
  washetdonker, Globe at Night downloads)

Final dataset: 56,668 Fajr + 34,763 Isha = 91,431 total records
Previous: 5,871 Fajr + 46 Isha = 5,917 total records
2026-03-22 16:39:29 -04:00
Aric Camarata
c1eeef53c4 Expand dataset to 5,871 Fajr / 46 Isha across 114 locations
Major additions:
- Extract all 1,621 Basthoni 2022 SQM records (46 Indonesian sites,
  Lampiran 2-5) via precomputed_angles.py
- Add 9 new raw sighting CSVs: Abdel-Hadi Malaysia, BRIN multistation,
  Kassim Bahali (2017+2019), Khalifa Saudi, Moonsighting.com,
  Shaukat 2015 Blackburn UK, Walisongo Sulawesi
- Curate aggregate D0 database (115 entries) in research/

Pipeline improvements:
- Open-Topo-Data SRTM30m primary elevation API with fallback
- APPROVED_RAW_CSVS allowlist prevents circular data ingestion
- Pre-computed angle merge path (bypasses back-calculation for SQM data)
- BAD_NOTE_MARKERS quality filter for excluded sources

Collection tools:
- BRIN multistation SQM processors
- PDF/HTML table extractor for academic papers
- Source tracking database (collection_manifest.json)

Documentation:
- Rewrite .wiki/Data.md and .wiki/Research.md from scratch
- Expand Data-Sources.md with full Basthoni Lampiran breakdown
- Add 14 researcher outreach drafts
- Update .gitignore to exclude bulk/experimental files
2026-02-28 10:51:01 -05:00