pray-calc-ml

aric/pray-calc-ml

Fork 0

mirror of https://github.com/acamarata/pray-calc-ml.git synced 2026-07-01 11:24:26 +00:00

Commit graph

Author	SHA1	Message	Date
Aric Camarata	1c8187cfc4	data: deduplicate dataset — 35 Fajr + 1 Isha duplicates removed Identified three sources of cross-source duplication and fixed each: 1. Kassim Bahali 2018 Pekan Pahang (9 records) Same 9 June-July 2017 DSLR observations existed in both verified_sightings.py (Table 2 entries) and the raw CSV kassim_bahali_2017_malaysia.csv. Removed from verified_sightings; raw CSV is the canonical source with richer cloud/conditions notes. 2. BRIN Mount Timau SQM dataset (22 records) timau_sqm_fajr.csv contained two SQM threshold readings per night: target=18.0° (75 records, primary) and target=16.51° (22 records, derived from the 75-night mean). Removed target=16.51 rows. Each night now has exactly one Fajr time. 3. Khalifa 2018 Hail Fajr (4 records) Original batch had times producing implausible angles: 2015-01-15 gave 12.6° and 2015-06-21 gave 19.3° (paper reports 14.014°±0.317°). Removed the four bad-time records. Batch 16a replacements (computed from the paper mean D0) remain and give consistent 13.9-14.1° angles. Pipeline: add automatic deduplication guard. After combining all sources, any (prayer, date, lat rounded to 3dp, lng rounded to 3dp) duplicate is logged and dropped (keep first). This prevents future cross-source overlaps from silently inflating the dataset or training on the same observation twice. Dataset: fajr_angles.csv 4535 records, isha_angles.csv 120 records Zero duplicates confirmed.	2026-02-26 05:13:28 -05:00
Aric Camarata	cc8d3c33d1	Expand dataset to 4,396 Fajr / 70 Isha records across 80 locations Added sources and sites: - Mount Timau NTT (CC0 BRIN SQM dataset): 97 individual Fajr nights at two target angles (16.51° and 18.0°); pristine 21.86 mpsas site, 1,600m; data.brin.go.id hdl:20.500.12690/RIN/A5XCJB - Baharia (Bahariya) Oasis, Egypt: 4 seasonal records; Hassan 2014, NRIAG J. 3:23-26; naked-eye multi-site 1984-1987, mean 14.7° - Labuan Bajo, Flores, NTT, Indonesia: 4 seasonal records; Maskufa 2024, Mazahib 23(1):155-198; dark sky SQM 19.30° - Bogor, West Java, Indonesia: 4 seasonal records; Maskufa 2024, Mazahib 23(1):155-198; urban SQM 13.58° - Pekan, Pahang, Malaysia: 9 individual DSLR observations Jun-Jul 2017; Kassim Bahali 2018, Sains Malaysiana 47(11):2877-2885; Do range -15.45° to -18.06° - Kuala Terengganu, Malaysia: 1 record; Kassim Bahali 2018 Fig 4, Do=-16°, time inferred via PyEphem - Additional batch 3 aggregate sites: Tubruq Libya (3 subsets), Fayum Egypt, Biak Papua, Manado North Sulawesi, Lombok NTB, Makkah, Madinah, Karachi, Ankara, Marrakech, Kano, Johannesburg, Dhaka, Alexandria Source correction: removed incorrect Setyanto 2021 Al-Hilal attribution from Labuan Bajo and Bogor (that paper covers zodiac light, not Fajr, at different Indonesian sites)	2026-02-25 20:44:37 -05:00

Author

SHA1

Message

Date

Aric Camarata

1c8187cfc4

data: deduplicate dataset — 35 Fajr + 1 Isha duplicates removed

Identified three sources of cross-source duplication and fixed each:

1. Kassim Bahali 2018 Pekan Pahang (9 records)
   Same 9 June-July 2017 DSLR observations existed in both
   verified_sightings.py (Table 2 entries) and the raw CSV
   kassim_bahali_2017_malaysia.csv. Removed from verified_sightings;
   raw CSV is the canonical source with richer cloud/conditions notes.

2. BRIN Mount Timau SQM dataset (22 records)
   timau_sqm_fajr.csv contained two SQM threshold readings per night:
   target=18.0° (75 records, primary) and target=16.51° (22 records,
   derived from the 75-night mean). Removed target=16.51 rows.
   Each night now has exactly one Fajr time.

3. Khalifa 2018 Hail Fajr (4 records)
   Original batch had times producing implausible angles: 2015-01-15
   gave 12.6° and 2015-06-21 gave 19.3° (paper reports 14.014°±0.317°).
   Removed the four bad-time records. Batch 16a replacements (computed
   from the paper mean D0) remain and give consistent 13.9-14.1° angles.

Pipeline: add automatic deduplication guard. After combining all sources,
any (prayer, date, lat rounded to 3dp, lng rounded to 3dp) duplicate is
logged and dropped (keep first). This prevents future cross-source overlaps
from silently inflating the dataset or training on the same observation twice.

Dataset: fajr_angles.csv 4535 records, isha_angles.csv 120 records
Zero duplicates confirmed.

2026-02-26 05:13:28 -05:00

Aric Camarata

cc8d3c33d1

Expand dataset to 4,396 Fajr / 70 Isha records across 80 locations

Added sources and sites:
- Mount Timau NTT (CC0 BRIN SQM dataset): 97 individual Fajr nights
  at two target angles (16.51° and 18.0°); pristine 21.86 mpsas site,
  1,600m; data.brin.go.id hdl:20.500.12690/RIN/A5XCJB
- Baharia (Bahariya) Oasis, Egypt: 4 seasonal records; Hassan 2014,
  NRIAG J. 3:23-26; naked-eye multi-site 1984-1987, mean 14.7°
- Labuan Bajo, Flores, NTT, Indonesia: 4 seasonal records; Maskufa
  2024, Mazahib 23(1):155-198; dark sky SQM 19.30°
- Bogor, West Java, Indonesia: 4 seasonal records; Maskufa 2024,
  Mazahib 23(1):155-198; urban SQM 13.58°
- Pekan, Pahang, Malaysia: 9 individual DSLR observations Jun-Jul 2017;
  Kassim Bahali 2018, Sains Malaysiana 47(11):2877-2885; Do range
  -15.45° to -18.06°
- Kuala Terengganu, Malaysia: 1 record; Kassim Bahali 2018 Fig 4,
  Do=-16°, time inferred via PyEphem
- Additional batch 3 aggregate sites: Tubruq Libya (3 subsets),
  Fayum Egypt, Biak Papua, Manado North Sulawesi, Lombok NTB,
  Makkah, Madinah, Karachi, Ankara, Marrakech, Kano, Johannesburg,
  Dhaka, Alexandria

Source correction: removed incorrect Setyanto 2021 Al-Hilal
attribution from Labuan Bajo and Bogor (that paper covers zodiac
light, not Fajr, at different Indonesian sites)

2026-02-25 20:44:37 -05:00

2 commits