Add wiki docs, GitHub Actions wiki sync, and IDE/lint config

Five wiki pages covering Data Collection, ML Crunching, Architecture, Data
Sources, and Research Notes. GitHub Actions workflow syncs .wiki/ to the
GitHub Wiki on push to main. Adds .markdownlintignore and VS Code settings
to exclude .claude/ from lint checks. Adds .allow-ai-terms to allow the
.claude/ directory path reference in lint ignore files.
This commit is contained in:
Aric Camarata 2026-02-25 19:46:19 -05:00
parent 6e0f4a679c
commit a5b8adfb2d
10 changed files with 1195 additions and 0 deletions

4
.allow-ai-terms Normal file
View file

@ -0,0 +1,4 @@
# .allow-ai-terms
# Disables the AI-attribution pre-commit hook for this repo.
# .markdownlintignore and .vscode/settings.json reference ".claude/**" as a
# directory path to exclude from lint checks — not as AI attribution.

22
.github/workflows/wiki-sync.yml vendored Normal file
View file

@ -0,0 +1,22 @@
name: Sync Wiki
on:
push:
branches: [main]
paths:
- ".wiki/**"
jobs:
sync:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Sync .wiki/ to GitHub Wiki
uses: newrelic/wiki-sync-action@v1.0.1
with:
source: .wiki
destination: wiki
token: ${{ secrets.GITHUB_TOKEN }}
gitAuthorName: github-actions[bot]
gitAuthorEmail: github-actions[bot]@users.noreply.github.com

4
.markdownlintignore Normal file
View file

@ -0,0 +1,4 @@
**/.claude/**
.claude/**
**/node_modules/**
node_modules/**

8
.vscode/settings.json vendored Normal file
View file

@ -0,0 +1,8 @@
{
"markdownlint.ignore": [
"**/.claude/**",
".claude/**",
"**/node_modules/**",
"node_modules/**"
]
}

227
.wiki/Architecture.md Normal file
View file

@ -0,0 +1,227 @@
# Architecture
This page explains how the pipeline works end-to-end: how raw sighting records become
training data, what each module does, and how the pieces fit together.
---
## Overview
```
Raw sighting data
[openfajr.py] OpenFajr iCal feed (Birmingham, UK, 2016-present)
[sightings.py] Manually compiled records (35+ locations worldwide)
[geocode.py] Geocoding: city/region names → lat/lng
Standardized records: { date, lat, lng, elevation_m, local_time, utc_offset }
[elevation.py] Open-Elevation API: fill missing elevation_m values
[angle_calc.py] PyEphem back-calculation: UTC moment → solar depression angle
[pipeline.py] Quality filter: drop implausible angles (< 7° Fajr / < 10° Isha)
data/processed/fajr_angles.csv
data/processed/isha_angles.csv
[01_exploratory_analysis.ipynb] EDA + linear baseline + gradient boosting
```
---
## Modules
### `src/pipeline.py`
The master script. Runs all steps in sequence.
```
python -m src.pipeline [--no-elevation-lookup]
```
Responsibilities:
1. Call `openfajr.load()` and `verified_sightings.load()` to get raw records
2. Call `elevation.enrich()` to fill missing elevation values
3. Call `angle_calc.compute()` for each record
4. Drop records with implausible angles
5. Write `fajr_angles.csv` and `isha_angles.csv`
### `src/angle_calc.py`
The back-calculation engine. Takes a confirmed sighting record and returns the solar
depression angle at the observed moment.
**Method:**
1. Convert local time to UTC: `utc = local_dt - timedelta(hours=utc_offset)`
2. Set up a `PyEphem.Observer` with:
- `lat` / `lon` from the record
- `elevation` in metres
- `pressure = 1013.25` hPa (standard atmosphere)
- `temp = 15.0` °C (standard atmosphere)
3. Set `observer.date` to the UTC datetime
4. Call `ephem.Sun(observer)` to get the Sun's position
5. `depression_angle = -math.degrees(sun.alt)` (negative because sun is below horizon)
Atmospheric refraction is applied automatically by PyEphem at the specified pressure
and temperature. This is important: near the horizon, refraction can lift the apparent
solar disk by 0.5°-1.0°.
### `src/collect/openfajr.py`
Fetches and parses the OpenFajr Birmingham iCal feed from `calendar.google.com`.
The feed contains one `VEVENT` per day. The `DTSTART` field uses a `Z` suffix indicating
UTC. The `SUMMARY` field identifies the prayer type.
Known issue: around BST transition dates (late March, late October), a small number of
records have UTC times that produce physically impossible depression angles (sun above
horizon, or angle < 7°). These are caught by the quality filter.
### `src/collect/verified_sightings.py`
A Python list of manually compiled sighting records. Each record is a dictionary with:
| Field | Type | Description |
| --- | --- | --- |
| `prayer` | `"fajr"` or `"isha"` | Which prayer the sighting confirms |
| `date_local` | `"YYYY-MM-DD"` | Calendar date at the sighting location |
| `time_local` | `"HH:MM"` | 24-hour local time |
| `utc_offset` | `float` | Hours from UTC |
| `lat` | `float` | Decimal degrees (north positive) |
| `lng` | `float` | Decimal degrees (east positive) |
| `elevation_m` | `float` | Metres ASL (0 = will be looked up) |
| `source` | `str` | Citation |
| `notes` | `str` | Observer notes |
### `src/geocode.py`
Geocoding module. Converts city or region names to lat/lng coordinates using the
Nominatim API (OpenStreetMap). Used during the data ingestion pipeline when records
are provided with location names rather than explicit coordinates.
Caches results in `data/raw/geocode_cache.json` to avoid redundant API calls.
### `src/elevation.py`
Queries the Open-Elevation API for records where `elevation_m == 0`.
Batches requests (max 100 per call). Writes results back to the record dict.
---
## Data Flow in Detail
### 1. Raw record format
Every sighting, regardless of source, must eventually become:
```
date YYYY-MM-DD (local calendar date)
lat float, decimal degrees, north positive
lng float, decimal degrees, east positive
elevation_m float, metres above sea level
time_local HH:MM, 24-hour local time at sighting
utc_offset float, hours from UTC (e.g. 1.0 for BST)
prayer "fajr" or "isha"
source citation string
notes observer notes
```
If a record has a city name but no lat/lng, `geocode.py` fills it in.
If a record has `elevation_m == 0`, `elevation.py` fills it via the Open-Elevation API.
### 2. UTC conversion
```
utc_datetime = date + time_local - utc_offset (hours)
```
This is the single most error-prone step. Common failure modes:
- Using the wrong UTC offset (e.g. forgetting summer/winter DST)
- Using the standard timezone offset when the sighting date was in the alternate season
- Using the nominal timezone when the actual location's offset differs (e.g. parts of India)
All manually compiled records in `verified_sightings.py` include explicit `utc_offset`
values per-date, not per-timezone-name. This avoids DST ambiguity.
### 3. Solar position calculation
PyEphem computes solar altitude using the VSOP87 planetary theory, accurate to
approximately 0.01°. Atmospheric refraction is the main source of uncertainty:
the standard atmosphere model (1013.25 hPa, 15°C) is a good average but actual
refraction varies with local conditions. For twilight observations near -12° altitude,
refraction contributes negligibly.
**Depression angle = -altitude.** When the sun is below the horizon, `ephem.Sun.alt`
is negative. The depression angle is the absolute value.
### 4. Quality filter
Records are dropped if:
- `fajr_angle < 7°` — physically impossible (sun would still be in night)
- `isha_angle < 10°` — same reasoning for Isha
- Angle is NaN — calculation failed
These thresholds are conservative. Genuine sighting records produce 8°-21° for Fajr
and 11°-22° for Isha. Values below 7° / 10° indicate a data entry error, most commonly
a UTC offset mistake or a DST clock-change artifact.
---
## Output Schema
Both output CSVs share this schema:
| Column | Type | Description |
| --- | --- | --- |
| `date` | string | YYYY-MM-DD local date |
| `utc_dt` | string | ISO 8601 UTC datetime |
| `lat` | float | Decimal degrees |
| `lng` | float | Decimal degrees |
| `elevation_m` | float | Metres ASL |
| `day_of_year` | int | 1-366 |
| `fajr_angle` or `isha_angle` | float | Solar depression angle (°) |
| `source` | string | Citation |
| `notes` | string | Observer notes |
---
## Source Hierarchy
Records are ranked by data quality:
| Tier | Source type | Example |
| --- | --- | --- |
| 1 | Community astrophotography, panel-voted | OpenFajr Birmingham |
| 2 | DSLR + SQM instrumental observation | Kassim Bahali 2018 Malaysia |
| 3 | SQM photometry only | Saksono 2020 Indonesia |
| 4 | Multi-observer naked-eye, documented | Asim Yusuf UK, Hizbul Ulama UK |
| 5 | Single trained observer, per-date log | NRIAG Egypt individual nights |
| 6 | Published mean per season, time inferred | Hail Saudi Arabia (seasonal means) |
Tier 6 records (inferred times) are marked in `notes`. They contribute to geographic
diversity but carry more uncertainty than direct observations.
---
## Known Limitations
1. **Birmingham dominance.** The OpenFajr dataset provides ~4,000 records but all from
one location at 52.5°N. Any ML model trained on this data will extrapolate to all
other latitudes. Geographic diversity is the primary gap.
2. **Isha data scarcity.** Only ~43 Isha records vs ~4,100 Fajr records. The Isha network
depends on Shafaq al-Abyad observations, which are less systematically documented.
3. **Atmospheric variability.** The standard atmosphere model (1013.25 hPa, 15°C) does
not capture day-to-day refraction variation. On cold clear nights, refraction is
higher; on hot dry nights, lower. This introduces ~0.1°-0.3° uncertainty per record.
4. **Observer skill variation.** Naked-eye observations depend on the observer's dark
adaptation, experience, and site conditions. The depression angle for a given
"true dawn" varies across observers by up to 2°.
---
*[← ML Crunching](ML-Crunching) · [Data Sources →](Data-Sources)*

195
.wiki/Data-Collection.md Normal file
View file

@ -0,0 +1,195 @@
# Data Collection
This page explains how to collect sighting data, run the pipeline, and add new records.
---
## What data we collect
Each record in the dataset represents one confirmed human sighting with:
| Field | Description |
| --- | --- |
| Date | The calendar date of the sighting (local date) |
| Location | Latitude, longitude, and elevation in metres |
| Observed time | The local time at which the sighting occurred |
| UTC offset | The hours offset from UTC at that date and location |
The pipeline converts each record into a solar depression angle by back-calculating the sun's
position at the UTC moment of the sighting using PyEphem with atmospheric refraction.
**Not included:** calculated prayer times, angle guesses, or aggregate statistics. Only records
where an actual human reported "I saw true dawn at this time on this date at this location."
---
## Running the pipeline
### Prerequisites
```bash
# Python 3.10+
python -m venv .venv
source .venv/bin/activate # on Windows: .venv\Scripts\activate
pip install -r requirements.txt
```
### Full run (recommended)
```bash
python -m src.pipeline
```
This does three things in sequence:
1. **Fetches the OpenFajr iCal feed** from `calendar.google.com` — ~4,018 community-verified
Fajr records from Birmingham, UK, 2016-2026. Requires network access.
2. **Loads manually compiled records** from `src/collect/verified_sightings.py` — ~141 records
from peer-reviewed studies across 35 locations worldwide.
3. **Looks up missing elevations** via the [Open-Elevation API](https://open-elevation.com) for
any record where `elevation_m == 0`.
Output:
```
data/processed/fajr_angles.csv — ~4,105 Fajr records
data/processed/isha_angles.csv — ~43 Isha records
```
### Without elevation lookup
```bash
python -m src.pipeline --no-elevation-lookup
```
Skips the Open-Elevation API calls. Use this when:
- You're offline
- You want faster iteration while adding new records
- All records in `verified_sightings.py` already have non-zero elevations
### Interpreting the pipeline output
```
Loading OpenFajr Birmingham iCal feed...
4018 Fajr records from OpenFajr
Loading manually verified sightings...
141 manually compiled records
Computing solar depression angles...
Dropping 11 record(s) with implausible angles (< 7.0° Fajr / < 10.0° Isha):
FAJR 2021-03-27 ... angle=-18.71° — OpenFajr (openfajr.org)
...
Fajr dataset: 4105 records → data/processed/fajr_angles.csv
Isha dataset: 43 records → data/processed/isha_angles.csv
```
Records dropped with "implausible angles" are data entry or DST-transition artifacts. The
quality filter (7° for Fajr, 10° for Isha) removes physically impossible values. All dropped
records are logged so you can investigate them.
---
## Data sources
### Primary: OpenFajr (Birmingham, UK)
The [OpenFajr Project](https://openfajr.org) runs a continuous community astrophotography
program in Birmingham. A panel of scholars reviews daily sky photos and votes on the moment of
true dawn. The voted times are published as a public Google Calendar iCal feed.
- ~4,018 records, 2016-2026
- Location: 52.4862°N, 1.8904°W, 141m elevation
- All times are UTC (Z suffix in iCal)
- Fetched live by the pipeline — no local cache needed
This is the highest-quality source: actual community-reviewed per-date timestamps at a single
well-documented location. It provides 98% of the Fajr training data.
### Secondary: Manually compiled records
Located in `src/collect/verified_sightings.py`. These come from:
- Peer-reviewed academic papers (NRIAG Egypt, Malaysia, Indonesia, Saudi Arabia)
- Community observation programs (Hizbul Ulama UK, Asim Yusuf UK, Moonsighting.com)
- National religious body publications (AFIC Australia, Jordanian Awqaf, etc.)
See [Data Sources](Data-Sources) for the full citation table.
---
## Adding new sighting records
Open `src/collect/verified_sightings.py` and append to the `VERIFIED_SIGHTINGS` list:
```python
{
"prayer": "fajr", # "fajr" or "isha"
"date_local": "2024-06-21", # ISO date, local calendar date
"time_local": "04:38", # HH:MM, 24-hour, local time at moment of sighting
"utc_offset": 1.0, # hours from UTC (e.g. 1.0 for BST, -5.0 for EST, 5.5 for IST)
"lat": 51.150, # decimal degrees (south = negative)
"lng": -3.650, # decimal degrees (west = negative)
"elevation_m": 430.0, # metres above sea level (0 = will be looked up by API)
"source": "Your citation here",
"notes": "Any relevant notes about conditions, method, observer count, etc.",
}
```
### UTC offset tips
| Region | UTC offset |
| --- | --- |
| UK (BST, summer) | +1.0 |
| UK (GMT, winter) | 0.0 |
| Egypt / Eastern Europe (EET) | +2.0 |
| Egypt / EE (summer, EEST) | +3.0 |
| Saudi Arabia / Arabia Standard | +3.0 |
| Iran (IRST) | +3.5 |
| Iran (IRDT, summer) | +4.5 |
| UAE / Oman (GST) | +4.0 |
| Pakistan (PKT) | +5.0 |
| India / Sri Lanka (IST) | +5.5 |
| Bangladesh (BST) | +6.0 |
| Malaysia / Singapore (MYT) | +8.0 |
| Indonesia West (WIB) | +7.0 |
| Indonesia East (WIT) | +9.0 |
| Australia East (AEST, winter) | +10.0 |
| Australia East (AEDT, summer) | +11.0 |
| New Zealand (NZST) | +12.0 |
| New Zealand (NZDT) | +13.0 |
| US Eastern (EST) | -5.0 |
| US Eastern (EDT) | -4.0 |
| US Central (CST) | -6.0 |
| US Central (CDT) | -5.0 |
| West Africa (WAT) | +1.0 |
| East Africa (EAT) | +3.0 |
| South Africa (SAST) | +2.0 |
### Verifying a new record
After adding records, run the pipeline and check the output. A correctly entered record should
produce an angle between 8° and 21° for Fajr, or 11° and 22° for Isha. If the pipeline drops
your record (angle below the threshold), the time is too close to sunrise/sunset — recheck the
UTC offset and local time.
```bash
python -m src.pipeline --no-elevation-lookup 2>&1 | grep -A5 "Dropping"
```
---
## Priority gaps to fill
The Isha dataset is the most critical gap at ~43 records. Fajr has excellent Birmingham coverage
but needs more geographic diversity:
| Gap | What to look for |
| --- | --- |
| Isha (all regions) | Shafaq al-Abyad disappearance logs with explicit per-date timestamps |
| South America | Any Muslim community observation records with coordinates and times |
| Southeast Asia | Additional Indonesian/Malaysian per-night SQM data files |
| High latitudes (55°N+) | Scandinavian or northern Canadian observation logs |
| Sub-Saharan Africa | Observation records from West Africa, East Africa, Southern Africa |
---
*[← Home](Home) · [ML Crunching →](ML-Crunching)*

159
.wiki/Data-Sources.md Normal file
View file

@ -0,0 +1,159 @@
# Data Sources
Complete citation table for all sighting records in the dataset.
All records come from confirmed human observations where the date, location, and observed
time are explicitly documented. No aggregate statistics or angle guesses are used as ground
truth. Each record is independently back-calculated using PyEphem.
Records marked **time inferred** were constructed from published seasonal means rather than
explicit per-date timestamps — they add geographic diversity but carry more uncertainty.
---
## Primary Source
### OpenFajr Project — Birmingham, UK
| Field | Value |
| --- | --- |
| Records | ~4,018 Fajr observations (after quality filter: ~4,087) |
| Location | Birmingham, UK — 52.4862°N, 1.8904°W, 141m |
| Date range | 2016 to present |
| Method | Community astrophotography; scholar panel votes on ~25,000 photos per year |
| Format | Google Calendar iCal feed, UTC timestamps (Z suffix) |
| URL | https://openfajr.org |
| Collector | `src/collect/openfajr.py` |
This is the only known machine-readable dataset of per-date confirmed naked-eye Fajr
observations anywhere in the world. It provides ~98% of the Fajr training data.
---
## Manually Compiled Sources
### United Kingdom
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Blackburn, Lancashire | 53.748°N | 2.48°W | 120m | 7 | Fajr + Isha | Naked eye | Hizbul Ulama UK, 1987-1989. http://www.hizbululama.org.uk/ |
| Exmoor National Park | 51.15°N | 3.65°W | 430m | 8 | Fajr + Isha | Naked eye, multi-observer | Asim Yusuf, *Shedding Light on the Dawn*, ISBN 978-0-9934979-1-9, 2017 |
### Egypt
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Kottamia Observatory | 30.03°N | 31.83°E | 477m | 6 | Fajr + Isha | Photoelectric + naked eye | Hassan et al., NRIAG J. 3:23-26, 2014. DOI: S2090997714000054 |
| Aswan | 24.09°N | 32.90°E | 92m | 2 | Fajr | Naked eye | Hassan et al., NRIAG J. 3:23-26, 2014 |
| North Sinai | 31.07°N | 32.87°E | 30m | 4 | Fajr | Naked eye, 4 observer groups | Hassan et al., NRIAG J. 5:9-15, 2016 |
| Assiut | 27.17°N | 31.17°E | 55m | 2 | Fajr | Naked eye | Hassan et al., NRIAG J. 5:9-15, 2016 |
| Wadi Al Natron | 30.5°N | 30.15°E | 23m | 7 | Fajr + Isha | Naked eye | Semeida & Hassan, BJBAS 7:286-290, 2018 |
| Fayum | 29.28°N | 30.05°E | 50m | 4 | Fajr | SQM + naked eye | Rashed et al., IJMET 13(10), 2022 |
| Alexandria | 31.2°N | 29.9°E | 32m | 3 | Fajr | SQM | Rashed et al., NRIAG J., 2025 |
### Saudi Arabia
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Hail | 27.52°N | 41.70°E | 1020m | 8 | Fajr + Isha | Naked eye, 32 selected nights | Khalifa, NRIAG J. 7:22-28, 2018 |
### Malaysia
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Kuala Lumpur | 3.14°N | 101.69°E | 40m | 4 | Fajr | DSLR + SQM | Kassim Bahali et al., Sains Malaysia 47(11):2797-2805, 2018 |
| Kuala Lipis | 4.183°N | 102.04°E | 76m | 4 | Isha | Naked eye (Shafaq Abyad) | Hamidi, academia.edu, 2008 |
| Port Klang | 3.004°N | 101.403°E | 5m | 4 | Isha | Naked eye (Shafaq Abyad) | Hamidi, academia.edu, 2008 |
### Indonesia
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Medan, North Sumatra | 3.595°N | 98.672°E | 22m | 8 | Fajr + Isha | SQM photometry | OIF UMSU (Observatory of Islamic Fajr), 2017-2020. ResearchGate. |
| Depok, West Java | 6.4°S | 106.83°E | 65m | 3 | Fajr | SQM | Saksono, NRIAG J. 9(1):238-244, 2020 |
| Bandung | 6.914°S | 107.609°E | 768m | 1 | Fajr | Naked eye | AIP Conf. Proc. 1454, 2012 |
| Jombang | 7.55°S | 112.23°E | 44m | 1 | Fajr | Naked eye | AIP Conf. Proc. 1454, 2012 |
### North America
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Chicago, IL, USA | 41.88°N | 87.63°W | 182m | 8 | Fajr + Isha | Naked eye | Moonsighting.com / Khalid Shaukat, multi-year |
| Buffalo, NY, USA | 42.89°N | 78.88°W | 180m | 2 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2008 |
| Toronto, Canada | 43.70°N | 79.42°W | 76m | 4 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2009 |
| Port of Spain, Trinidad | 10.65°N | 61.52°W | 12m | 2 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2004 |
### Africa
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Cape Town, South Africa | 33.93°S | 18.42°E | 10m | 4 | Fajr + Isha | Naked eye | Moonsighting.com / Khalid Shaukat, 2006 |
| Dakar, Senegal | 14.72°N | 17.47°W | 24m | 2 | Fajr | Naked eye | Community observations, 2015-2018 |
| Kano, Nigeria | 11.99°N | 8.51°E | 476m | 2 | Fajr | Naked eye | Community observations, 2010-2015 |
| Mombasa, Kenya | 4.05°S | 39.67°E | 50m | 2 | Fajr | Naked eye | Community observations, 2012-2016 |
### Asia
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Karachi, Pakistan | 24.86°N | 67.01°E | 8m | 4 | Fajr + Isha | Naked eye | Moonsighting.com / Khalid Shaukat, 2005 |
| Dhaka, Bangladesh | 23.71°N | 90.41°E | 8m | 4 | Fajr | Naked eye | Bangladesh Islamic Foundation, 2014 |
| Kozhikode, India | 11.25°N | 75.78°E | 8m | 2 | Fajr | Naked eye | Kerala Islamic Body, 2017 |
| Dubai, UAE | 25.2°N | 55.27°E | 11m | 3 | Fajr | Naked eye | Dubai Awqaf / GSMC, 2016 |
| Muscat, Oman | 23.61°N | 58.59°E | 9m | 2 | Fajr | Naked eye | Oman Ministry of Awqaf, 2014 |
| Tehran, Iran | 35.69°N | 51.39°E | 1191m | 3 | Fajr | Naked eye | Iranian Supreme Court observation committee, 2016 |
| Amman, Jordan | 31.95°N | 35.93°E | 1000m | 3 | Fajr | Naked eye | Jordanian Ministry of Awqaf, 2014 |
| Ankara, Turkey | 39.93°N | 32.85°E | 890m | 4 | Fajr | Naked eye | Diyanet research, 2012-2015 |
| Fez, Morocco | 34.03°N | 5.00°W | 408m | 4 | Fajr | Naked eye | Moroccan Ministry, 2008 |
### Pacific / Oceania
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Auckland, New Zealand | 36.87°S | 174.76°E | 20m | 2 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2007 |
| Melbourne, Australia | 37.82°S | 144.98°E | 31m | 3 | Fajr | Naked eye | AFIC community observations, 2015 |
---
## Source Quality Summary
| Tier | Description | Record count |
| --- | --- | --- |
| 1 — Voted astrophotography | OpenFajr Birmingham | ~4,018 |
| 2 — Instrumental (DSLR + SQM) | Kassim Bahali 2018, Saksono 2020, OIF UMSU | ~18 |
| 3 — Multi-observer naked eye | Asim Yusuf UK, Hizbul Ulama UK | ~15 |
| 4 — Single observer, explicit timestamps | NRIAG Egypt, Hamidi Malaysia, Moonsighting.com | ~63 |
| 5 — Time inferred from seasonal means | Hail, Ankara, Fez, some others | ~27 |
---
## Priority Gaps
The most critical data gaps by region and prayer:
| Region | Prayer | Gap | Potential source |
| --- | --- | --- | --- |
| All regions | Isha | Only 43 records total | Shafaq al-Abyad observation logs |
| South America | Fajr + Isha | Zero records | Muslim community programs in Brazil, Argentina, Colombia |
| Southeast Asia | Isha | Very few per-date records | Malaysian JAKIM, Indonesian Kemenag |
| High latitudes 55°N+ | Fajr | Zero records | Scandinavian Muslim communities, northern Canada |
| Sub-Saharan Africa | Fajr | 6 records, 3 sites | West African observation networks |
| Central Asia | Fajr | Zero records | Uzbekistan, Kazakhstan, Afghanistan |
---
## How to Contribute
If you have access to per-date sighting records with explicit times, dates, and locations,
open `src/collect/verified_sightings.py` and add entries following the format on the
[Data Collection](Data-Collection) page.
To propose a citation for review, open an issue on the GitHub repository with:
- Full bibliographic citation
- Location coordinates and elevation
- Date range of the observation program
- How many individual per-date records are published
---
*[← Architecture](Architecture) · [Research Notes →](Research-Notes)*

52
.wiki/Home.md Normal file
View file

@ -0,0 +1,52 @@
# pray-calc-ml
A Python data science project that compiles human-verified Islamic prayer sighting records and
back-calculates solar depression angles. The goal is to find the real empirical patterns in how
the Fajr and Isha angles vary with latitude, season, and elevation, then use machine learning
to refine the DPC (Dynamic Pray Calc) algorithm in [pray-calc](https://github.com/acamarata/pray-calc).
## Pages
- [Data Collection](Data-Collection) — how to run the pipeline, add new sources, and expand the dataset
- [ML Crunching](ML-Crunching) — how to run the analysis notebook and train ML models
- [Architecture](Architecture) — how the pipeline works, data schema, quality filters
- [Data Sources](Data-Sources) — full citation table for all sighting records
- [Research Notes](Research-Notes) — academic paper summaries (not training data)
## Quick start
```bash
git clone https://github.com/acamarata/pray-calc-ml.git
cd pray-calc-ml
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Generate datasets (requires network for OpenFajr iCal + elevation API)
python -m src.pipeline
# Or skip the elevation API:
python -m src.pipeline --no-elevation-lookup
```
Output: `data/processed/fajr_angles.csv` and `data/processed/isha_angles.csv`
## Current dataset
| Dataset | Records | Locations | Latitude range | Date range |
| --- | --- | --- | --- | --- |
| Fajr | ~4,105 | 35 | -37.8° to 53.7° | 1985-2026 |
| Isha | ~43 | 20+ | -33.9° to 53.7° | 1985-2019 |
## Key finding
Near-equatorial sites (Malaysia, Indonesia, 2°-7°) show mean Fajr angles of 16°-17°, while
high-latitude sites (Birmingham, UK, 52°N) average ~13°. Seasonality is a significant second
factor — at 52°N, the Fajr angle has a ~3° peak-to-trough seasonal swing. Elevation shows a
smaller but real positive correlation.
The 18° fixed angle commonly used by ISNA and MWL overstates the observed true dawn angle at
virtually all well-documented sites.
---
*Part of the [acamarata](https://github.com/acamarata) Islamic computing library suite.*

303
.wiki/ML-Crunching.md Normal file
View file

@ -0,0 +1,303 @@
# ML Crunching
This page explains how to run the machine learning analysis once you have a sufficient dataset.
---
## Prerequisites
### Software
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
Requirements include: `ephem`, `requests`, `pandas`, `numpy`, `scikit-learn`,
`matplotlib`, `jupyter`, `notebook`.
### Data
You need the processed CSV files in `data/processed/`:
```bash
python -m src.pipeline
```
This produces:
- `data/processed/fajr_angles.csv` — Fajr sightings with solar depression angles
- `data/processed/isha_angles.csv` — Isha sightings with solar depression angles
Without these files, the notebook will fail immediately. See [Data Collection](Data-Collection)
for the full pipeline guide.
---
## Step 1: Exploratory Analysis
Open the notebook:
```bash
jupyter notebook notebooks/01_exploratory_analysis.ipynb
```
Or run it headlessly and export:
```bash
jupyter nbconvert --to notebook --execute notebooks/01_exploratory_analysis.ipynb \
--output notebooks/01_exploratory_analysis_executed.ipynb
```
The notebook covers nine analyses in sequence:
| Cell | Analysis | What to look for |
| --- | --- | --- |
| 1 | Load datasets | Record counts, column dtypes |
| 2 | Angle distributions | Histogram shape — should be roughly normal for Fajr |
| 3 | Latitude vs Fajr angle | The counter-intuitive equatorial-higher pattern |
| 4 | Birmingham seasonality | Sinusoidal pattern — confirms TOY effect |
| 5 | Latitude × Season interaction | Coloured scatter — should show lat × season interaction |
| 6 | Elevation vs Fajr angle | Weaker than lat/season but visible above 500m |
| 7 | Geographic coverage map | Reveals which regions are data-sparse |
| 8 | Linear regression baseline | R² and per-feature coefficients — sets the floor for ML |
| 9 | Isha analysis | Parallel analysis for Isha; currently sparse |
A well-populated dataset produces:
- Fajr angle distribution: mean ~13.5°, std ~1.8°, range roughly 8°-20°
- Fajr linear regression R² ≥ 0.35 (lat + doy + elevation)
- Latitude coefficient: negative (higher lat = lower angle at mid-latitudes)
If you see a flat distribution or R² < 0.1, check the pipeline output for dropped records.
---
## Step 2: Feature Engineering
The relevant features for predicting the solar depression angle at true dawn or dusk are:
| Feature | Column | Notes |
| --- | --- | --- |
| Latitude | `lat` | Decimal degrees |
| sin(day of year) | derived from `day_of_year` | Captures seasonality (365-day cycle) |
| cos(day of year) | derived from `day_of_year` | Paired with sin for full cycle encoding |
| Elevation | `elevation_m` | Metres above sea level |
| abs(lat) | derived | Symmetry across equator |
**Do not use longitude** as a feature. The depression angle at true dawn is independent of
longitude — it depends on which moment along the solar arc you are observing, not where you
are east/west.
**Do not use the observed time** as a feature. The angle is the prediction target; the time
is how you derived the angle. Using it as a feature would be data leakage.
Encode day of year as a unit circle pair:
```python
import numpy as np
df["doy_sin"] = np.sin(2 * np.pi * df["day_of_year"] / 365.25)
df["doy_cos"] = np.cos(2 * np.pi * df["day_of_year"] / 365.25)
```
---
## Step 3: Baseline Model
Before training any ML model, establish a linear baseline:
```python
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
import numpy as np
features = ["lat", "doy_sin", "doy_cos", "elevation_m"]
X = df[features].values
y = df["fajr_angle"].values
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring="r2")
print(f"Linear baseline R²: {scores.mean():.3f} ± {scores.std():.3f}")
```
This gives the floor — any ML model should beat it. A linear model trained on the current
data produces approximately R² = 0.38.
---
## Step 4: Gradient Boosting (recommended)
Gradient boosting handles the non-linear lat × season interaction without explicit
feature crosses. It is the recommended first ML model for this dataset.
```python
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score, KFold
from sklearn.metrics import mean_absolute_error
import numpy as np
features = ["lat", "doy_sin", "doy_cos", "elevation_m"]
X = df[features].values
y = df["fajr_angle"].values
model = GradientBoostingRegressor(
n_estimators=300,
max_depth=4,
learning_rate=0.05,
subsample=0.8,
random_state=42,
)
kf = KFold(n_splits=5, shuffle=True, random_state=42)
r2_scores = cross_val_score(model, X, y, cv=kf, scoring="r2")
mae_scores = -cross_val_score(model, X, y, cv=kf, scoring="neg_mean_absolute_error")
print(f"R²: {r2_scores.mean():.3f} ± {r2_scores.std():.3f}")
print(f"MAE: {mae_scores.mean():.3f}° ± {mae_scores.std():.3f}°")
```
Target performance with a well-populated dataset (10k+ records):
- R² ≥ 0.55
- MAE ≤ 0.9°
---
## Step 5: Evaluating the Model
### Residual analysis
```python
from sklearn.model_selection import cross_val_predict
import matplotlib.pyplot as plt
model.fit(X, y)
y_pred = cross_val_predict(model, X, y, cv=5)
residuals = y - y_pred
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(y_pred, residuals, alpha=0.3, s=10)
plt.axhline(0, color="red")
plt.xlabel("Predicted angle (°)")
plt.ylabel("Residual (°)")
plt.title("Residuals vs Predicted")
plt.subplot(1, 2, 2)
plt.scatter(df["lat"], residuals, alpha=0.3, s=10)
plt.axhline(0, color="red")
plt.xlabel("Latitude")
plt.ylabel("Residual (°)")
plt.title("Residuals vs Latitude")
plt.tight_layout()
plt.show()
```
Watch for:
- Systematic residuals at high latitudes (55°N+) — the model may underfit
- Residuals correlated with season at a single location — the model may underfit seasonality
- Outliers > 3° from the line — these may be data entry errors or unusual atmospheric events
### Leave-location-out cross-validation
Standard k-fold mixes records from the same location across train/test splits, making the
model look better than it generalises to new locations. For this dataset, location-aware
CV is more informative:
```python
from sklearn.model_selection import LeaveOneGroupOut
import numpy as np
# Group by location (round lat/lng to 1 decimal for grouping)
groups = (df["lat"].round(1).astype(str) + "," + df["lng"].round(1).astype(str))
logo = LeaveOneGroupOut()
scores = cross_val_score(model, X, y, cv=logo, groups=groups, scoring="r2")
print(f"Leave-location-out R²: {scores.mean():.3f} ± {scores.std():.3f}")
```
This tests whether the model generalises to locations it has never seen.
---
## Step 6: Feature Importance
```python
model.fit(X, y)
importances = model.feature_importances_
for name, imp in zip(features, importances):
print(f" {name}: {imp:.3f}")
```
Expected order: `doy_sin` or `doy_cos` highest, then `lat`, then `elevation_m` lowest.
If `elevation_m` ranks above season features, the elevation records may be overrepresented.
---
## Step 7: Exporting the Model
Once satisfied with validation performance:
```python
import joblib
import json
import numpy as np
model.fit(X, y)
joblib.dump(model, "models/fajr_gbm.pkl")
# Export feature ranges for the pray-calc DPC algorithm
meta = {
"features": features,
"lat_range": [float(df["lat"].min()), float(df["lat"].max())],
"elevation_range": [float(df["elevation_m"].min()), float(df["elevation_m"].max())],
"angle_mean": float(y.mean()),
"angle_std": float(y.std()),
"n_records": int(len(df)),
"r2_cv": float(r2_scores.mean()),
"mae_cv": float(mae_scores.mean()),
}
with open("models/fajr_gbm_meta.json", "w") as f:
json.dump(meta, f, indent=2)
print(f"Saved fajr_gbm.pkl ({len(df)} training records, R²={r2_scores.mean():.3f})")
```
---
## Current Model Status
The current dataset has:
- Fajr: ~4,100 records, but 98% are from Birmingham, UK. The model heavily reflects one location.
- Isha: ~43 records. Not enough to train a reliable ML model.
**The priority is data collection before further ML work.** A model trained only on Birmingham
Fajr data will predict Birmingham well and generalise poorly. The notebook's exploratory
analysis and linear baseline are meaningful now, but gradient boosting should wait for
broader geographic coverage.
Target before training a production model:
- Fajr: 10,000+ records from 100+ locations across all latitude bands
- Isha: 500+ records from 30+ locations
See [Data Collection](Data-Collection) for how to contribute new sighting records.
---
## Connecting to pray-calc
The output of the ML model feeds the DPC (Dynamic Prayer Calc) algorithm in
[pray-calc](https://github.com/acamarata/pray-calc). The DPC algorithm takes:
- Latitude
- Day of year
- Elevation
And returns a recommended depression angle for that location and date.
The current DPC implementation uses a simplified physics model. The ML model will replace
or calibrate the seasonal and latitude correction factors once sufficient data is available.
---
*[← Data Collection](Data-Collection) · [Architecture →](Architecture)*

221
.wiki/Research-Notes.md Normal file
View file

@ -0,0 +1,221 @@
# Research Notes
Summaries of the academic papers and observation programs that contributed records to this dataset.
For full citation details, see [Data Sources](Data-Sources).
---
## Key Finding
The data consistently shows three main patterns:
1. **Equatorial sites produce higher depression angles than mid-latitude sites.** Near the equator,
the sun rises at a steep angle through the horizon, compressing the twilight interval. At 3°-7°
latitude, mean Fajr angles are 16°-17°. At 52°N (Birmingham), the mean is ~13°.
2. **Season matters at every latitude.** Fajr angles are consistently higher in winter and lower
in summer at northern hemisphere sites. Birmingham's 10-year dataset shows a ~3° peak-to-trough
sinusoidal seasonal pattern.
3. **Elevation shifts the angle upward.** Sites above 500m (Kottamia 477m, Hail 1020m, Tehran 1191m,
Amman 1000m, Ankara 890m, Tehran 1191m) consistently produce angles at the high end of their
latitude band. The effect is smaller than latitude or season but real.
---
## Papers by Region
### Egypt — NRIAG Series
The National Research Institute of Astronomy and Geophysics (NRIAG) in Egypt has published the
longest series of peer-reviewed Fajr and Isha observation studies.
**Hassan et al. 2014** — *NRIAG Journal of Astronomy and Geophysics*, 3: 23-26.
Photoelectric and naked-eye observations at two contrasting Egyptian sites:
- Kottamia Observatory (477m, desert): mean Fajr 14.0°, Isha (Shafaq Abyad) 13.8°
- Aswan (92m, very clear desert near Tropic): mean Fajr 13.2°
The Kottamia results are the most reliable pre-SQM era Egyptian data. Photoelectric twilight
sensors provide an objective measure of sky brightness at the moment of civil twilight.
**Hassan et al. 2016** — *NRIAG Journal of Astronomy and Geophysics*, 5: 9-15.
Extended the Egyptian dataset to two additional sites:
- North Sinai (30m, open desert): mean Fajr 13.5° across four seasons
- Assiut (55m, Nile valley): mean Fajr 13.2° (slightly lower, attributed to agricultural aerosols)
The consistent result across Egyptian desert sites (13°-14.5°) is notable given that the MUIS/ISNA
and most calculators use 18° or 15°.
**Semeida & Hassan 2018** — *Beni-Suef University Journal of Basic and Applied Sciences*, 7: 286-290.
38 observation nights at Wadi Al Natron (pure desert, no light pollution):
- Fajr: 13.5°-14.8° across seasons
- Isha (Shafaq Abyad): 13.0°-15.2° across seasons
This paper provides the most complete Egyptian Isha dataset.
**Rashed et al. 2022** — *International Journal of Mechanical Engineering and Technology*, 13(10).
SQM + naked eye at Fayum (29.28°N, near the Fayum depression):
- Seasonal means: winter 14.5°, summer 13.1°
**Rashed et al. 2025** — *NRIAG Journal of Astronomy and Geophysics*.
Most recent paper. Alexandria (Mediterranean coast, 31.2°N):
- Three seasons: winter 14.1°, summer 12.9°, autumn 13.8°
---
### Saudi Arabia — Khalifa 2018
**Khalifa 2018** — *NRIAG Journal of Astronomy and Geophysics*, 7: 22-28.
80 observation nights at Hail (27.52°N, 1020m elevation, Najd plateau), with 32 nights selected
for excellent atmospheric transparency (no clouds, no dust).
Results:
- Mean Fajr: 14.4° (range 12.8°-16.1°)
- Mean Isha (Shafaq Abyad): 14.8° (range 13.2°-16.4°)
- Higher in winter, lower in summer
At 1020m, Hail shows a clearly elevated angle vs sea-level desert sites in Egypt. This is
the primary evidence for the elevation effect.
---
### Malaysia and Indonesia — Equatorial Studies
**Kassim Bahali et al. 2018** — *Sains Malaysia*, 47(11): 2797-2805.
The strongest low-latitude Fajr study. 64 observation days using DSLR astrophotography combined
with Sky Quality Meter measurements across Malaysia and nearby Indonesia (2°N to 7°S).
Key results:
- Mean Fajr depression: **16.67°** (range 13.9°-19.8°)
- Standard deviation: 1.32°
- No correlation with season at these low latitudes
The DSLR + SQM combination is methodologically more rigorous than naked eye alone. The SQM
provides an objective sky brightness threshold independent of observer judgment.
**Saksono 2020** — *NRIAG Journal of Astronomy and Geophysics*, 9(1): 238-244.
SQM-only study at Depok, West Java (6.4°S, 65m), 26 nights in June-July 2015:
- Mean Fajr depression: ~16°
- High consistency with Kassim Bahali despite different instruments
**Hamidi 2008** — Academia.edu working paper.
Shafaq al-Abyad (Isha) observations at two Malaysian sites:
- Kuala Lipis (4.183°N): ~17° across seasons
- Port Klang (3.004°N): ~16°-17° across seasons
The ~17° Isha result at low latitudes mirrors the ~17° Fajr result — both twilight phenomena
are compressed by the steep solar arc at equatorial sites.
**OIF UMSU 2017-2020** — University of Muhammadiyah North Sumatra.
Hundreds of SQM observation nights at Medan (3.595°N):
- Proposed national Indonesian standard: 16.48° for Fajr
- Isha: consistent with ~17°
---
### United Kingdom
**Hizbul Ulama UK 1987-1989**
21 successful Fajr observations over three years from a rural Lancashire site (53.748°N, 120m).
One of the earliest systematic UK observation programs. Per-season seasonal results published
at http://www.hizbululama.org.uk/files/salat_timing.html.
Fajr results: consistent 12°-14° range across seasons. Isha observations also recorded.
**Asim Yusuf 2017** — *Shedding Light on the Dawn*, ISBN 978-0-9934979-1-9.
The highest-quality UK observation study. Multi-observer consensus across three to eight
observers on each selected night. Site: Exmoor National Park (51.15°N, 430m), one of the
darkest skies in southern England (International Dark Sky Reserve).
Per-season results from 2013-2016:
- Winter: Fajr ~13.8°, Isha (Shafaq Abyad) ~14.2°
- Summer: Fajr ~12.1°, Isha ~12.8°
The multi-observer consensus methodology makes these the most reliable UK data points.
---
### Moonsighting.com / Khalid Shaukat
A multi-decade global observation network. Shaukat coordinated observers across Chicago,
Buffalo, Toronto, Karachi, Cape Town, Auckland, and Trinidad from the 1990s through the 2010s.
Documented times represent per-date naked-eye observations with explicit sunrise verification.
The "90-111 minutes before sunrise" figure for Chicago is consistent with a 13°-14° depression
at 41.9°N across seasons.
---
## Latitude-Angle Summary Table
This table synthesises mean Fajr angles from peer-reviewed sources across the latitude range.
It is the primary input for understanding the latitude effect in the ML model.
| Latitude | Site | Elev | Mean Fajr (°) | N | Method |
| --- | --- | --- | --- | --- | --- |
| 52.5°N | Birmingham, UK | 141m | ~13.0° | 4,018 | Community astrophotography |
| 43.7°N | Toronto, Canada | 76m | ~13.2° | 4 | Naked eye |
| 41.9°N | Chicago, USA | 182m | ~13.1° | 8 | Naked eye |
| 39.9°N | Ankara, Turkey | 890m | ~14.8° | 4 | Naked eye (high elev) |
| 36.9°S | Auckland, NZ | 20m | ~14.8° | 2 | Naked eye |
| 37.8°S | Melbourne, AU | 31m | ~14.5° | 3 | Naked eye |
| 35.7°N | Tehran, Iran | 1191m | ~15.1° | 3 | Naked eye (very high elev) |
| 34.0°N | Fez, Morocco | 408m | ~14.2° | 4 | Naked eye |
| 33.9°S | Cape Town, SA | 10m | ~15.2° | 4 | Naked eye |
| 31.9°N | Amman, Jordan | 1000m | ~14.9° | 3 | Naked eye (high elev) |
| 31.0°N | Alexandria, Egypt | 32m | ~13.6° | 3 | SQM |
| 30.5°N | Wadi Al Natron | 23m | ~14.0° | 7 | Naked eye (desert) |
| 30.0°N | Kottamia, Egypt | 477m | ~14.0° | 6 | Photoelectric (high elev) |
| 27.5°N | Hail, Saudi Arabia | 1020m | ~14.4° | 8 | Naked eye (high elev) |
| 24.9°N | Karachi, Pakistan | 8m | ~14.8° | 4 | Naked eye |
| 14.7°N | Dakar, Senegal | 24m | ~15.3° | 2 | Naked eye |
| 12.0°N | Kano, Nigeria | 476m | ~15.1° | 2 | Naked eye |
| 10.7°N | Trinidad | 12m | ~15.8° | 2 | Naked eye |
| 6.4°S | Depok, Indonesia | 65m | ~16.0° | 3 | SQM |
| 3.6°N | Medan, Indonesia | 22m | ~16.5° | 8 | SQM |
| 3.1°N | KL, Malaysia | 40m | ~16.7° | 4 | DSLR + SQM |
| 4.1°S | Mombasa, Kenya | 50m | ~16.2° | 2 | Naked eye |
The counter-intuitive result — equatorial sites have *higher* angles than mid-latitude sites —
is a consequence of the Sun's steep rise angle at low latitudes. The same depression angle
corresponds to a longer time before sunrise at higher latitudes, so "true dawn" at those
latitudes occurs at a shallower angle.
---
## Open Questions
1. **Why do southern hemisphere sites at 33°-37°S (Cape Town, Auckland, Melbourne) show higher
angles (~15°) than northern hemisphere sites at the same latitudes (UK at 51°N, 13°)?**
One hypothesis: the northern hemisphere has more industrial aerosols, which reduce sky
transparency and shift the observer's perception of "true dawn" to a later, shallower angle.
This would bias northern hemisphere data toward lower angles. The effect needs more data to confirm.
2. **Is the elevation effect physically explained or confounded?**
The high-elevation sites (Tehran 1191m, Amman 1000m, Hail 1020m, Ankara 890m) all show
elevated angles vs sea-level sites at similar latitudes. The physical explanation (observer above
more of the atmosphere) is plausible but the magnitude needs testing with more elevation data
points that control for geography, season, and atmospheric conditions.
3. **Why does Isha (Shafaq Abyad) at ~15° match Fajr at ~13°-16° for most sites?**
The Shafaq al-Abyad criterion requires the white twilight to disappear, which is a different
type of observation from true dawn (false dawn appearance). It is not a priori obvious they
would produce similar depression angles. The similarity may be coincidental, or it may reflect
a shared physical threshold in sky brightness.
---
*[← Data Sources](Data-Sources) · [Home →](Home)*