Commit graph

6 commits

Author SHA1 Message Date
Aric Camarata
1c8187cfc4 data: deduplicate dataset — 35 Fajr + 1 Isha duplicates removed
Identified three sources of cross-source duplication and fixed each:

1. Kassim Bahali 2018 Pekan Pahang (9 records)
   Same 9 June-July 2017 DSLR observations existed in both
   verified_sightings.py (Table 2 entries) and the raw CSV
   kassim_bahali_2017_malaysia.csv. Removed from verified_sightings;
   raw CSV is the canonical source with richer cloud/conditions notes.

2. BRIN Mount Timau SQM dataset (22 records)
   timau_sqm_fajr.csv contained two SQM threshold readings per night:
   target=18.0° (75 records, primary) and target=16.51° (22 records,
   derived from the 75-night mean). Removed target=16.51 rows.
   Each night now has exactly one Fajr time.

3. Khalifa 2018 Hail Fajr (4 records)
   Original batch had times producing implausible angles: 2015-01-15
   gave 12.6° and 2015-06-21 gave 19.3° (paper reports 14.014°±0.317°).
   Removed the four bad-time records. Batch 16a replacements (computed
   from the paper mean D0) remain and give consistent 13.9-14.1° angles.

Pipeline: add automatic deduplication guard. After combining all sources,
any (prayer, date, lat rounded to 3dp, lng rounded to 3dp) duplicate is
logged and dropped (keep first). This prevents future cross-source overlaps
from silently inflating the dataset or training on the same observation twice.

Dataset: fajr_angles.csv 4535 records, isha_angles.csv 120 records
Zero duplicates confirmed.
2026-02-26 05:13:28 -05:00
Aric Camarata
877f481c9d data: add batches 15-17 (48 records) — Isha surpasses 100-record target
Batch 15a — Al-faruq 2013 UPI thesis, Bosscha Observatory West Java
  4 records (2 Fajr + 2 Isha), wet/dry season aggregate, photoelectric photometer
  D0: Fajr ~15-16°, Isha ~14-15°

Batch 15b — Niri et al. 2012 MEJSR, Tanjung Aru Kota Kinabalu Sabah
  4 Isha records, D0=18.0° Shafaq al-Abyad, SQM + naked-eye, Jun 2009 campaign
  Seasonal representative dates (2009 equinoxes/solstices)

Batch 16a — Khalifa, Hassan & Taha 2018 NRIAG, Hail Saudi Arabia
  4 Fajr records, D0=14.014°±0.317°, SQM + photoelectric, 32 nights 2014-2015
  First Saudi Arabia site in dataset

Batch 16b — Herdiwijaya 2016 ICOPIA, Yogyakarta area Indonesia
  4 Fajr records, D0=17°, SQM, 136 nights 2014-2016

Batch 17 — Faid et al. 2024 Scientific Reports, 8 sites Malaysia + Australia
  32 Isha records, SQM, 5-year campaign 2017-2022
  D0 by class: urban 11.50°, rural 15.67°, pristine 17.49°
  Sites: Putrajaya, Tanjung Balau, Pantai Batu Buruk, Coonabarabran AU,
         Pantai Mek Mas, Balai Cerap Unisza, Simpang Mengayau, Tengku Zaharah
  First Australia Isha site

Dataset: fajr_angles.csv 4570 records, isha_angles.csv 121 records (target: 100+)
2026-02-26 04:58:10 -05:00
Aric Camarata
1d48dc5b2e Expand dataset to 4,527 Fajr / 82 Isha records across 112 locations (Batches 8-10)
Batch 8: Saksono & Fulazzaky 2020 (NRIAG J Astron Geophys 9:238-244)
- Depok, West Java, Indonesia (6.383°S, 106.83°E): 8 aggregate Fajr records
- SQM, 26 nights Jun-Jul 2015, D0=14.0° ± 0.6°, suburban LP

Batch 9: Rashed et al. 2022 (IJMET 13(10):8-24)
- Fayum (Wadi al-Hitan), Egypt (29.283°N, 30.050°E): 6 Fajr records
- SQM-LU-DL + naked eye, Dec 2018-2019, D0=14.7°, remote desert

Batch 10: Abdel-Hadi & Hassan 2022 (IJAA 12(1):7-29)
- Per-date D0 values from Shariff 2008 SQM-LE data (M.Sc. Univ. Malaya)
- 8 Fajr records: Merang, Kuala Lipis, Port Klang (3 new sites)
- 12 Isha records: Teluk Kemang, Kuala Lumpur, Kuala Lipis, Port Klang
- Malaysia, May 2007 - April 2008, UTC+8
2026-02-25 21:37:07 -05:00
Aric Camarata
cc8d3c33d1 Expand dataset to 4,396 Fajr / 70 Isha records across 80 locations
Added sources and sites:
- Mount Timau NTT (CC0 BRIN SQM dataset): 97 individual Fajr nights
  at two target angles (16.51° and 18.0°); pristine 21.86 mpsas site,
  1,600m; data.brin.go.id hdl:20.500.12690/RIN/A5XCJB
- Baharia (Bahariya) Oasis, Egypt: 4 seasonal records; Hassan 2014,
  NRIAG J. 3:23-26; naked-eye multi-site 1984-1987, mean 14.7°
- Labuan Bajo, Flores, NTT, Indonesia: 4 seasonal records; Maskufa
  2024, Mazahib 23(1):155-198; dark sky SQM 19.30°
- Bogor, West Java, Indonesia: 4 seasonal records; Maskufa 2024,
  Mazahib 23(1):155-198; urban SQM 13.58°
- Pekan, Pahang, Malaysia: 9 individual DSLR observations Jun-Jul 2017;
  Kassim Bahali 2018, Sains Malaysiana 47(11):2877-2885; Do range
  -15.45° to -18.06°
- Kuala Terengganu, Malaysia: 1 record; Kassim Bahali 2018 Fig 4,
  Do=-16°, time inferred via PyEphem
- Additional batch 3 aggregate sites: Tubruq Libya (3 subsets),
  Fayum Egypt, Biak Papua, Manado North Sulawesi, Lombok NTB,
  Makkah, Madinah, Karachi, Ankara, Marrakech, Kano, Johannesburg,
  Dhaka, Alexandria

Source correction: removed incorrect Setyanto 2021 Al-Hilal
attribution from Labuan Bajo and Bogor (that paper covers zodiac
light, not Fajr, at different Indonesian sites)
2026-02-25 20:44:37 -05:00
Aric Camarata
0f01783516 Expand dataset to 4,149 Fajr / 58 Isha records across 46 locations
New records from research expansion:
- Tanjung Aru, Sabah Malaysia (Niri & Zainuddin): 4 Isha Shafaq Abyad records
- Teluk Kemang, Malaysia (Abdel-Hadi & Hassan 2022): 4 Fajr + 4 Isha SQM records
- Bosscha Observatory, Java 1310m (Herdiwijaya 2020): 4 Fajr records
- Yogyakarta, Java (Herdiwijaya 2014-2016, 136 nights): 4 Fajr records
- Kupang, NTT 10°S (Herdiwijaya 2020): 4 Fajr + 4 Isha records
- Matrouh, Egypt (Hassan et al.): 4 Fajr + 3 Isha records (1 filtered)
- Kharga Oasis, Egypt (Hassan et al. 2020): 4 Fajr records
- Hurghada, Egypt (Hassan et al. 2020): 4 Fajr records
- Marsa-Alam, Egypt (Hassan et al. 2020): 4 Fajr records
- 15th of May City, Egypt (Taha et al. 2025): 4 Fajr records
- Riyadh, Saudi Arabia (Taha et al. 2025): 4 Fajr records
- Mauritania 18°N (Taha et al. 2025): 4 Fajr records — first West Africa data

New modules:
- src/geocode.py: Nominatim geocoding with disk cache
- src/ingest.py: CSV ingestion and data standardization pipeline
- src/pipeline.py: integrated raw CSV loading via ingest module
2026-02-25 19:59:06 -05:00
Aric Camarata
6e0f4a679c Rebuild as Python data science project
Replaces the original JS calibration library with a pure Python pipeline
for collecting and back-calculating solar depression angles from human-verified
Fajr and Isha prayer sightings.

What this does:
- src/pipeline.py: master pipeline; fetches iCal + manual records, back-calculates
  angles via PyEphem, applies quality filters, exports two clean CSVs
- src/collect/openfajr.py: parses the OpenFajr Birmingham iCal feed (~4,018 records)
- src/collect/verified_sightings.py: manually compiled records from peer-reviewed
  studies (Egypt, Saudi Arabia, Malaysia, Indonesia, UK, USA, Canada, and more)
- src/angle_calc.py: PyEphem back-calculation with atmospheric refraction
- src/elevation.py: Open-Elevation API batch lookup

Datasets generated:
- data/processed/fajr_angles.csv: 4,105 confirmed Fajr records, 35 locations,
  latitude range -37.8 to 53.7 degrees, date range 1985-2026
- data/processed/isha_angles.csv: 43 confirmed Isha records, 20+ locations

Also includes:
- notebooks/01_exploratory_analysis.ipynb: latitude, TOY, elevation pattern analysis
- research/: academic paper summaries (not training data)
- data/raw/sources.md: full citation table for all data sources
2026-02-25 19:32:47 -05:00