Issues fixed:
- Add upper bound angle filter: fajr/isha capped at 22 deg (was unbounded,
max was 49 deg from light pollution artifacts)
- Remove washetdonker from approved list: threshold method produces 7 deg
angles (civil twilight), not Fajr at 12-18 deg
- Remove openfajr_94992898.csv: duplicated the iCal feed data with slightly
different coordinates, bypassing dedup (4,007 duplicate records)
- Filter out future dates: OpenFajr publishes predictions for the full year
- Filter out polar stations (|lat| > 70): no meaningful Fajr/Isha
- Filter out Null Island (lat=0, lng=0): GPS default / missing coordinates
- Move precomputed angles merge before dedup: was bypassing dedup entirely
- Make BAD_NOTE_MARKERS case-insensitive: catches mixed-case variants
- Add missing tess_jun2017.csv to approved list
- Clean up duplicate comment blocks in ingest.py
Dataset after fixes: 48,668 Fajr + 34,529 Isha = 83,197 total
Angle range now: 7.0-22.0 deg (Fajr), 10.0-22.0 deg (Isha)
Latitude range now: -62.6 to 69.7 (was -90 to 90)
Add 6 new data collection pipelines and their processed outputs:
Sources added:
- TESS/Stars4All photometer network: 37 months (Jun 2017-Aug 2020),
~40k raw events from 100+ European stations via Zenodo archives
- Globe at Night citizen science: 26k twilight observations (2006-2024),
filtered from 308k total observations for solar depression 6-22 deg
- GaN-MN continuous monitoring: 45 months (Jan 2022-Sep 2025),
~12.5k twilight events from 88 stations across 20+ countries
- Galicia SQM network: 14 stations, 1-min resolution, 7.5k events
- Madrid/Majadahonda SQM: multi-year continuous monitoring, 3.1k events
- washetdonker.nl Netherlands: 7 stations, 3.3k morning events
- Academic papers: Jordan (Abed 2015), Fayum Egypt, India photometer
Pipeline changes:
- ingest.py: add all new files to APPROVED_RAW_CSVS allowlist,
fix filter to use allowlist instead of hardcoded exclusions
- .gitignore: exclude bulk raw data directories (BSRN, TESS, GaN-MN,
washetdonker, Globe at Night downloads)
Final dataset: 56,668 Fajr + 34,763 Isha = 91,431 total records
Previous: 5,871 Fajr + 46 Isha = 5,917 total records
- Regenerate fajr_angles.csv with current collection state
- Update wiki docs to reflect current dataset stats
- Add missing requirements and minor pipeline fixes
Identified three sources of cross-source duplication and fixed each:
1. Kassim Bahali 2018 Pekan Pahang (9 records)
Same 9 June-July 2017 DSLR observations existed in both
verified_sightings.py (Table 2 entries) and the raw CSV
kassim_bahali_2017_malaysia.csv. Removed from verified_sightings;
raw CSV is the canonical source with richer cloud/conditions notes.
2. BRIN Mount Timau SQM dataset (22 records)
timau_sqm_fajr.csv contained two SQM threshold readings per night:
target=18.0° (75 records, primary) and target=16.51° (22 records,
derived from the 75-night mean). Removed target=16.51 rows.
Each night now has exactly one Fajr time.
3. Khalifa 2018 Hail Fajr (4 records)
Original batch had times producing implausible angles: 2015-01-15
gave 12.6° and 2015-06-21 gave 19.3° (paper reports 14.014°±0.317°).
Removed the four bad-time records. Batch 16a replacements (computed
from the paper mean D0) remain and give consistent 13.9-14.1° angles.
Pipeline: add automatic deduplication guard. After combining all sources,
any (prayer, date, lat rounded to 3dp, lng rounded to 3dp) duplicate is
logged and dropped (keep first). This prevents future cross-source overlaps
from silently inflating the dataset or training on the same observation twice.
Dataset: fajr_angles.csv 4535 records, isha_angles.csv 120 records
Zero duplicates confirmed.
Herdiwijaya 2016 + 2020 (J.Phys.Conf.Ser.) — Amfoang/Kupang, East Nusa Tenggara
New site: 9.667°S, 124.0°E, 1300m high-elevation dark site.
D0=18.0° (pristine, 83 moonless night study 2011-2018).
4,532 Fajr / 82 Isha records across 113 locations.
Added 38 per-date individual DSLR observations from Kassim Bahali et al. (2019)
JATMA 7(2):37-48 across 10 new sites:
- Sabang, Aceh, Indonesia (5.876°N, 95.340°E) — 11 nights Dec 2017
- Yaring, Pattani, Thailand (6.934°N, 101.319°E) — 2 nights Jan 2018
- Surabaya, East Java, Indonesia — 3 nights Feb 2018
- Sumenep, Madura, Indonesia — 3 nights Feb 2018
- Ternate, North Maluku, Indonesia — 3 nights Mar 2018
- South Sulawesi (Gowa area), Indonesia — 6 nights Mar 2018
- Mersing, Johor, Malaysia — 3 nights Jun 2018
- Kuala Rompin, Pahang, Malaysia — 2 nights Jul 2018
- Nenasi Pekan, Pahang, Malaysia — 2 nights Aug 2018
- Kota Tinggi, Johor, Malaysia — 3 nights Sep 2018
Depression angles computed via PyEphem from actual dawn times + coordinates.
Mean D0 range: 17.07° (Pattani) to 19.61° (Kota Tinggi dry season).
Includes first Thailand data point (Pattani) and expanded Indonesian coverage.
+10 new unique locations (88 → 98)
Added 32 new Fajr records from 8 new sites:
- 6 Indonesian cities from Saksono ISRN/UHAMKA 'Premature Dawn' series
(Padang, Batusangkar, Cirebon, Balikpapan, Bitung, Manokwari) — urban LP,
D0=-13.4°, 4 seasonal records each
- Tayu Beach, Pati, Central Java — Noor & Hamdani 2018 QIJIS, photoelectric+SQM,
D0=-17.0°, 4 individual nights Aug-Sep 2016
- Cimahi, West Java — Herdiwijaya 2020, SQM, D0=-18.5°, 4 seasonal records
+8 new unique locations (80 → 88)
New records from research expansion:
- Tanjung Aru, Sabah Malaysia (Niri & Zainuddin): 4 Isha Shafaq Abyad records
- Teluk Kemang, Malaysia (Abdel-Hadi & Hassan 2022): 4 Fajr + 4 Isha SQM records
- Bosscha Observatory, Java 1310m (Herdiwijaya 2020): 4 Fajr records
- Yogyakarta, Java (Herdiwijaya 2014-2016, 136 nights): 4 Fajr records
- Kupang, NTT 10°S (Herdiwijaya 2020): 4 Fajr + 4 Isha records
- Matrouh, Egypt (Hassan et al.): 4 Fajr + 3 Isha records (1 filtered)
- Kharga Oasis, Egypt (Hassan et al. 2020): 4 Fajr records
- Hurghada, Egypt (Hassan et al. 2020): 4 Fajr records
- Marsa-Alam, Egypt (Hassan et al. 2020): 4 Fajr records
- 15th of May City, Egypt (Taha et al. 2025): 4 Fajr records
- Riyadh, Saudi Arabia (Taha et al. 2025): 4 Fajr records
- Mauritania 18°N (Taha et al. 2025): 4 Fajr records — first West Africa data
New modules:
- src/geocode.py: Nominatim geocoding with disk cache
- src/ingest.py: CSV ingestion and data standardization pipeline
- src/pipeline.py: integrated raw CSV loading via ingest module
Replaces the original JS calibration library with a pure Python pipeline
for collecting and back-calculating solar depression angles from human-verified
Fajr and Isha prayer sightings.
What this does:
- src/pipeline.py: master pipeline; fetches iCal + manual records, back-calculates
angles via PyEphem, applies quality filters, exports two clean CSVs
- src/collect/openfajr.py: parses the OpenFajr Birmingham iCal feed (~4,018 records)
- src/collect/verified_sightings.py: manually compiled records from peer-reviewed
studies (Egypt, Saudi Arabia, Malaysia, Indonesia, UK, USA, Canada, and more)
- src/angle_calc.py: PyEphem back-calculation with atmospheric refraction
- src/elevation.py: Open-Elevation API batch lookup
Datasets generated:
- data/processed/fajr_angles.csv: 4,105 confirmed Fajr records, 35 locations,
latitude range -37.8 to 53.7 degrees, date range 1985-2026
- data/processed/isha_angles.csv: 43 confirmed Isha records, 20+ locations
Also includes:
- notebooks/01_exploratory_analysis.ipynb: latitude, TOY, elevation pattern analysis
- research/: academic paper summaries (not training data)
- data/raw/sources.md: full citation table for all data sources