Identified three sources of cross-source duplication and fixed each:
1. Kassim Bahali 2018 Pekan Pahang (9 records)
Same 9 June-July 2017 DSLR observations existed in both
verified_sightings.py (Table 2 entries) and the raw CSV
kassim_bahali_2017_malaysia.csv. Removed from verified_sightings;
raw CSV is the canonical source with richer cloud/conditions notes.
2. BRIN Mount Timau SQM dataset (22 records)
timau_sqm_fajr.csv contained two SQM threshold readings per night:
target=18.0° (75 records, primary) and target=16.51° (22 records,
derived from the 75-night mean). Removed target=16.51 rows.
Each night now has exactly one Fajr time.
3. Khalifa 2018 Hail Fajr (4 records)
Original batch had times producing implausible angles: 2015-01-15
gave 12.6° and 2015-06-21 gave 19.3° (paper reports 14.014°±0.317°).
Removed the four bad-time records. Batch 16a replacements (computed
from the paper mean D0) remain and give consistent 13.9-14.1° angles.
Pipeline: add automatic deduplication guard. After combining all sources,
any (prayer, date, lat rounded to 3dp, lng rounded to 3dp) duplicate is
logged and dropped (keep first). This prevents future cross-source overlaps
from silently inflating the dataset or training on the same observation twice.
Dataset: fajr_angles.csv 4535 records, isha_angles.csv 120 records
Zero duplicates confirmed.