mirror of
https://github.com/acamarata/pray-calc-ml.git
synced 2026-06-30 19:04:26 +00:00
Identified three sources of cross-source duplication and fixed each: 1. Kassim Bahali 2018 Pekan Pahang (9 records) Same 9 June-July 2017 DSLR observations existed in both verified_sightings.py (Table 2 entries) and the raw CSV kassim_bahali_2017_malaysia.csv. Removed from verified_sightings; raw CSV is the canonical source with richer cloud/conditions notes. 2. BRIN Mount Timau SQM dataset (22 records) timau_sqm_fajr.csv contained two SQM threshold readings per night: target=18.0° (75 records, primary) and target=16.51° (22 records, derived from the 75-night mean). Removed target=16.51 rows. Each night now has exactly one Fajr time. 3. Khalifa 2018 Hail Fajr (4 records) Original batch had times producing implausible angles: 2015-01-15 gave 12.6° and 2015-06-21 gave 19.3° (paper reports 14.014°±0.317°). Removed the four bad-time records. Batch 16a replacements (computed from the paper mean D0) remain and give consistent 13.9-14.1° angles. Pipeline: add automatic deduplication guard. After combining all sources, any (prayer, date, lat rounded to 3dp, lng rounded to 3dp) duplicate is logged and dropped (keep first). This prevents future cross-source overlaps from silently inflating the dataset or training on the same observation twice. Dataset: fajr_angles.csv 4535 records, isha_angles.csv 120 records Zero duplicates confirmed. |
||
|---|---|---|
| .. | ||
| processed | ||
| raw | ||