pray-calc-ml/.gitignore
Aric Camarata c1eeef53c4 Expand dataset to 5,871 Fajr / 46 Isha across 114 locations
Major additions:
- Extract all 1,621 Basthoni 2022 SQM records (46 Indonesian sites,
  Lampiran 2-5) via precomputed_angles.py
- Add 9 new raw sighting CSVs: Abdel-Hadi Malaysia, BRIN multistation,
  Kassim Bahali (2017+2019), Khalifa Saudi, Moonsighting.com,
  Shaukat 2015 Blackburn UK, Walisongo Sulawesi
- Curate aggregate D0 database (115 entries) in research/

Pipeline improvements:
- Open-Topo-Data SRTM30m primary elevation API with fallback
- APPROVED_RAW_CSVS allowlist prevents circular data ingestion
- Pre-computed angle merge path (bypasses back-calculation for SQM data)
- BAD_NOTE_MARKERS quality filter for excluded sources

Collection tools:
- BRIN multistation SQM processors
- PDF/HTML table extractor for academic papers
- Source tracking database (collection_manifest.json)

Documentation:
- Rewrite .wiki/Data.md and .wiki/Research.md from scratch
- Expand Data-Sources.md with full Basthoni Lampiran breakdown
- Add 14 researcher outreach drafts
- Update .gitignore to exclude bulk/experimental files
2026-02-28 10:51:01 -05:00

57 lines
1.2 KiB
Text

__pycache__/
*.py[cod]
*.egg-info/
.eggs/
dist/
build/
.venv/
venv/
env/
.env
*.log
.DS_Store
.ipynb_checkpoints/
.jupyter/
.claude/
# Raw scraped/downloaded files
data/raw/*.pdf
# Generated notebook outputs
data/processed/*.png
data/processed/*.svg
# Bulk data directories (too large for git, not curated)
data/cache/
data/raw/crawled/
data/raw/excluded/
data/raw/brin_multistation_raw/
# Collection session logs and scratch
data/raw/collection_log*.txt
data/raw/sources_crawled.md
# Research scratch files (superseded by aggregate_d0_values.csv)
research/aggregate_d0_database.csv
research/aggregate_analysis.md
research/candidate_papers.json
research/mine_*.py
# Experimental collection scripts (not part of core pipeline)
src/autonomous_collect.py
src/compute_aggregate_times.py
src/collect/aladhan.py
src/collect/autonomous_collector.py
src/collect/aggregate_to_records.py
src/collect/bulk_generator.py
src/collect/bulk_runner.py
src/collect/cities.py
src/collect/collect_agent.py
src/collect/harvest.py
src/collect/jakim.py
src/collect/morocco.py
src/collect/muis_singapore.py
src/collect/openalex_harvester.py
src/collect/source_tracker.py
src/collect/waktusolat.py
src/collect/web_harvester.py