Commit graph

3 commits

Author SHA1 Message Date
Aric Camarata
1c8187cfc4 data: deduplicate dataset — 35 Fajr + 1 Isha duplicates removed
Identified three sources of cross-source duplication and fixed each:

1. Kassim Bahali 2018 Pekan Pahang (9 records)
   Same 9 June-July 2017 DSLR observations existed in both
   verified_sightings.py (Table 2 entries) and the raw CSV
   kassim_bahali_2017_malaysia.csv. Removed from verified_sightings;
   raw CSV is the canonical source with richer cloud/conditions notes.

2. BRIN Mount Timau SQM dataset (22 records)
   timau_sqm_fajr.csv contained two SQM threshold readings per night:
   target=18.0° (75 records, primary) and target=16.51° (22 records,
   derived from the 75-night mean). Removed target=16.51 rows.
   Each night now has exactly one Fajr time.

3. Khalifa 2018 Hail Fajr (4 records)
   Original batch had times producing implausible angles: 2015-01-15
   gave 12.6° and 2015-06-21 gave 19.3° (paper reports 14.014°±0.317°).
   Removed the four bad-time records. Batch 16a replacements (computed
   from the paper mean D0) remain and give consistent 13.9-14.1° angles.

Pipeline: add automatic deduplication guard. After combining all sources,
any (prayer, date, lat rounded to 3dp, lng rounded to 3dp) duplicate is
logged and dropped (keep first). This prevents future cross-source overlaps
from silently inflating the dataset or training on the same observation twice.

Dataset: fajr_angles.csv 4535 records, isha_angles.csv 120 records
Zero duplicates confirmed.
2026-02-26 05:13:28 -05:00
Aric Camarata
0f01783516 Expand dataset to 4,149 Fajr / 58 Isha records across 46 locations
New records from research expansion:
- Tanjung Aru, Sabah Malaysia (Niri & Zainuddin): 4 Isha Shafaq Abyad records
- Teluk Kemang, Malaysia (Abdel-Hadi & Hassan 2022): 4 Fajr + 4 Isha SQM records
- Bosscha Observatory, Java 1310m (Herdiwijaya 2020): 4 Fajr records
- Yogyakarta, Java (Herdiwijaya 2014-2016, 136 nights): 4 Fajr records
- Kupang, NTT 10°S (Herdiwijaya 2020): 4 Fajr + 4 Isha records
- Matrouh, Egypt (Hassan et al.): 4 Fajr + 3 Isha records (1 filtered)
- Kharga Oasis, Egypt (Hassan et al. 2020): 4 Fajr records
- Hurghada, Egypt (Hassan et al. 2020): 4 Fajr records
- Marsa-Alam, Egypt (Hassan et al. 2020): 4 Fajr records
- 15th of May City, Egypt (Taha et al. 2025): 4 Fajr records
- Riyadh, Saudi Arabia (Taha et al. 2025): 4 Fajr records
- Mauritania 18°N (Taha et al. 2025): 4 Fajr records — first West Africa data

New modules:
- src/geocode.py: Nominatim geocoding with disk cache
- src/ingest.py: CSV ingestion and data standardization pipeline
- src/pipeline.py: integrated raw CSV loading via ingest module
2026-02-25 19:59:06 -05:00
Aric Camarata
6e0f4a679c Rebuild as Python data science project
Replaces the original JS calibration library with a pure Python pipeline
for collecting and back-calculating solar depression angles from human-verified
Fajr and Isha prayer sightings.

What this does:
- src/pipeline.py: master pipeline; fetches iCal + manual records, back-calculates
  angles via PyEphem, applies quality filters, exports two clean CSVs
- src/collect/openfajr.py: parses the OpenFajr Birmingham iCal feed (~4,018 records)
- src/collect/verified_sightings.py: manually compiled records from peer-reviewed
  studies (Egypt, Saudi Arabia, Malaysia, Indonesia, UK, USA, Canada, and more)
- src/angle_calc.py: PyEphem back-calculation with atmospheric refraction
- src/elevation.py: Open-Elevation API batch lookup

Datasets generated:
- data/processed/fajr_angles.csv: 4,105 confirmed Fajr records, 35 locations,
  latitude range -37.8 to 53.7 degrees, date range 1985-2026
- data/processed/isha_angles.csv: 43 confirmed Isha records, 20+ locations

Also includes:
- notebooks/01_exploratory_analysis.ipynb: latitude, TOY, elevation pattern analysis
- research/: academic paper summaries (not training data)
- data/raw/sources.md: full citation table for all data sources
2026-02-25 19:32:47 -05:00