pray-calc-ml/README.md

159 lines
7 KiB
Markdown

# pray-calc-ml
A Python data science project that collects and back-calculates solar depression angles
from human-verified Fajr and Isha prayer sightings. The goal is to find the real empirical
patterns in how the solar depression angle at Fajr and Isha varies with latitude, season,
and elevation — then use machine learning to refine the DPC (Dynamic Pray Calc) algorithm
in [pray-calc](https://github.com/acamarata/pray-calc).
## What this is
Most Islamic prayer time calculators use a fixed angle (e.g. 15° or 18°) for Fajr and Isha.
Peer-reviewed observation studies consistently find the real angle is lower and varies with
latitude, season, and atmospheric conditions. This project compiles the most complete
dataset of actual human-verified sightings and back-calculates the solar depression angle
at each observed moment.
The training data comes exclusively from confirmed human sightings with explicit dates,
locations, and times. No aggregated statistics or calculated-angle guesses are used as
ground truth. Each record is back-calculated independently using PyEphem.
## Datasets
Two clean CSV files are generated by the pipeline:
**`data/processed/fajr_angles.csv`** — One confirmed Fajr sighting per row
| Column | Description |
| --- | --- |
| `date` | YYYY-MM-DD (local calendar date) |
| `utc_dt` | ISO 8601 UTC datetime |
| `lat` | Decimal degrees (north positive) |
| `lng` | Decimal degrees (east positive) |
| `elevation_m` | Metres above sea level |
| `day_of_year` | 1-366 (seasonality feature) |
| `fajr_angle` | Solar depression angle at moment of sighting (degrees) |
| `source` | Citation |
| `notes` | Observer notes |
**`data/processed/isha_angles.csv`** — Same schema with `isha_angle`.
### Current dataset size
- **Fajr:** 48,668 records, 4,200+ unique locations, latitude range -62.6° to 69.7°
- **Isha:** 34,529 records, 2,800+ unique locations, latitude range -65.9° to 69.3°
- **Date range:** 1970 to 2026
The dominant Fajr source is the [OpenFajr Project](https://openfajr.org), with 4,000+
community-reviewed daily observations from Birmingham, UK. The second-largest source is
Basthoni's 2022 PhD dissertation (UIN Walisongo), with 1,621 per-night SQM records across
46 Indonesian sites. The remaining records are manually compiled from peer-reviewed studies
spanning Egypt, Saudi Arabia, Malaysia, Indonesia, Mauritania, and other locations.
## Setup
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## Running the pipeline
```bash
python -m src.pipeline
```
This fetches the OpenFajr iCal feed (network required), loads the compiled sighting records,
back-calculates depression angles, and writes both CSVs.
```bash
python -m src.pipeline --no-elevation-lookup
```
Skip the Open-Elevation API calls and use pre-set elevations from the source records.
## Project structure
```text
pray-calc-ml/
├── src/
│ ├── angle_calc.py Back-calculation: observed time -> depression angle (PyEphem)
│ ├── elevation.py Open-Topo-Data / Open-Elevation API lookup
│ ├── ingest.py Standardize and validate raw CSV files
│ ├── pipeline.py Master pipeline: collect -> enrich -> filter -> export
│ └── collect/
│ ├── openfajr.py OpenFajr iCal feed parser (~4,018 Fajr records)
│ ├── verified_sightings.py Manually compiled records from peer-reviewed studies
│ ├── precomputed_angles.py 1,621 Basthoni 2022 SQM records (46 Indonesian sites)
│ ├── brin_multistation_sqm.py BRIN multistation SQM processor
│ ├── brin_timau_sqm.py BRIN Mount Timau SQM processor
│ ├── paper_extractor.py PDF/HTML table extractor for academic papers
│ └── pdf_extractor.py PDF text extraction via PyMuPDF + pdfminer
├── data/
│ ├── raw/raw_sightings/ Per-source raw CSV files
│ ├── processed/ Generated CSVs (fajr_angles.csv, isha_angles.csv)
│ └── SCHEMA.md Column-by-column documentation for both processed CSVs
├── docs/
│ └── ml-training-plan.md Feature engineering, model architecture, CV strategy, metrics
├── src/
│ └── evaluate.py Train baseline models and print precision/recall/MAE
├── research/ Academic paper summaries, aggregate D0 database
├── .github/wiki/ GitHub Wiki pages (synced via Actions)
└── requirements.txt
```
## Back-calculation method
For each confirmed sighting (date, location, observed local time):
1. Convert observed local time to UTC using the documented UTC offset
2. Set up a PyEphem observer at the sighting location with standard atmosphere (1013.25 hPa, 15°C)
3. Compute solar altitude at the UTC moment, including atmospheric refraction
4. Depression angle = negative altitude (positive when sun is below the horizon)
Records where the depression angle is below 7° (Fajr) or 10° (Isha) are dropped as data
entry errors. This catches DST clock-change artifacts in the OpenFajr feed and a small number
of mis-estimated observation times.
## Key findings so far
The data shows three main patterns:
1. **Latitude matters.** Near-equatorial sites (Malaysia, Indonesia, 2°-7°) show mean Fajr angles
of 16°-17°. Mid-latitude sites (UK at 52°N) average ~13°. This counter-intuitive result
occurs because the sun rises at a steeper angle at low latitudes, compressing the twilight
interval.
2. **Season matters.** At fixed latitude, Fajr angle is lower in summer than winter. Birmingham's
10-year dataset shows a clear sinusoidal seasonal pattern with a ~3° peak-to-trough range.
3. **Elevation has a smaller but real effect.** High-altitude desert sites (Hail 1020m, Tehran
1191m, Kottamia 477m) consistently trend toward the high end of the angle distribution.
## Data sources
See the [wiki](https://github.com/acamarata/pray-calc-ml/wiki/Data-Sources) for the full
citation table.
Primary sources:
- [OpenFajr Project](https://openfajr.org) -- Birmingham, UK, community astrophotography (~4,018 records)
- Basthoni 2022 PhD, UIN Walisongo -- 46 Indonesian SQM sites (1,621 records)
- BRIN Mount Timau SQM -- NTT, Indonesia (59 Fajr + 577 Isha)
- NRIAG Egypt (Hassan et al. 2014, Semeida & Hassan 2018, Marzouk et al. 2025)
- Taha et al. 2025, EJSAS -- Riyadh, Saudi Arabia + Mauritania
- Khalifa 2018, NRIAG J. -- Hail, Saudi Arabia
- Kassim Bahali et al. 2018, 2019 -- Malaysia/Indonesia DSLR + SQM studies
- Miftahi/Shaukat 2015 -- Blackburn, Lancashire UK (29 Fajr + 32 Isha)
- Asim Yusuf 2017 -- Exmoor UK (multi-observer)
## Related packages
- [pray-calc](https://github.com/acamarata/pray-calc) — Islamic prayer times calculator; this project feeds its DPC algorithm
- [nrel-spa](https://github.com/acamarata/nrel-spa) — NREL Solar Position Algorithm used inside pray-calc
- [moon-sighting](https://github.com/acamarata/moon-sighting) — Lunar crescent visibility
## License
MIT. Copyright (c) 2026 Aric Camarata.