mirror of
https://github.com/acamarata/pray-calc-ml.git
synced 2026-07-02 11:50:40 +00:00
Add wiki docs, GitHub Actions wiki sync, and IDE/lint config
Five wiki pages covering Data Collection, ML Crunching, Architecture, Data Sources, and Research Notes. GitHub Actions workflow syncs .wiki/ to the GitHub Wiki on push to main. Adds .markdownlintignore and VS Code settings to exclude .claude/ from lint checks. Adds .allow-ai-terms to allow the .claude/ directory path reference in lint ignore files.
This commit is contained in:
parent
6e0f4a679c
commit
a5b8adfb2d
10 changed files with 1195 additions and 0 deletions
4
.allow-ai-terms
Normal file
4
.allow-ai-terms
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
# .allow-ai-terms
|
||||||
|
# Disables the AI-attribution pre-commit hook for this repo.
|
||||||
|
# .markdownlintignore and .vscode/settings.json reference ".claude/**" as a
|
||||||
|
# directory path to exclude from lint checks — not as AI attribution.
|
||||||
22
.github/workflows/wiki-sync.yml
vendored
Normal file
22
.github/workflows/wiki-sync.yml
vendored
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
name: Sync Wiki
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
paths:
|
||||||
|
- ".wiki/**"
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
sync:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Sync .wiki/ to GitHub Wiki
|
||||||
|
uses: newrelic/wiki-sync-action@v1.0.1
|
||||||
|
with:
|
||||||
|
source: .wiki
|
||||||
|
destination: wiki
|
||||||
|
token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
gitAuthorName: github-actions[bot]
|
||||||
|
gitAuthorEmail: github-actions[bot]@users.noreply.github.com
|
||||||
4
.markdownlintignore
Normal file
4
.markdownlintignore
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
**/.claude/**
|
||||||
|
.claude/**
|
||||||
|
**/node_modules/**
|
||||||
|
node_modules/**
|
||||||
8
.vscode/settings.json
vendored
Normal file
8
.vscode/settings.json
vendored
Normal file
|
|
@ -0,0 +1,8 @@
|
||||||
|
{
|
||||||
|
"markdownlint.ignore": [
|
||||||
|
"**/.claude/**",
|
||||||
|
".claude/**",
|
||||||
|
"**/node_modules/**",
|
||||||
|
"node_modules/**"
|
||||||
|
]
|
||||||
|
}
|
||||||
227
.wiki/Architecture.md
Normal file
227
.wiki/Architecture.md
Normal file
|
|
@ -0,0 +1,227 @@
|
||||||
|
# Architecture
|
||||||
|
|
||||||
|
This page explains how the pipeline works end-to-end: how raw sighting records become
|
||||||
|
training data, what each module does, and how the pieces fit together.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
Raw sighting data
|
||||||
|
↓
|
||||||
|
[openfajr.py] OpenFajr iCal feed (Birmingham, UK, 2016-present)
|
||||||
|
[sightings.py] Manually compiled records (35+ locations worldwide)
|
||||||
|
[geocode.py] Geocoding: city/region names → lat/lng
|
||||||
|
↓
|
||||||
|
Standardized records: { date, lat, lng, elevation_m, local_time, utc_offset }
|
||||||
|
↓
|
||||||
|
[elevation.py] Open-Elevation API: fill missing elevation_m values
|
||||||
|
↓
|
||||||
|
[angle_calc.py] PyEphem back-calculation: UTC moment → solar depression angle
|
||||||
|
↓
|
||||||
|
[pipeline.py] Quality filter: drop implausible angles (< 7° Fajr / < 10° Isha)
|
||||||
|
↓
|
||||||
|
data/processed/fajr_angles.csv
|
||||||
|
data/processed/isha_angles.csv
|
||||||
|
↓
|
||||||
|
[01_exploratory_analysis.ipynb] EDA + linear baseline + gradient boosting
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Modules
|
||||||
|
|
||||||
|
### `src/pipeline.py`
|
||||||
|
|
||||||
|
The master script. Runs all steps in sequence.
|
||||||
|
|
||||||
|
```
|
||||||
|
python -m src.pipeline [--no-elevation-lookup]
|
||||||
|
```
|
||||||
|
|
||||||
|
Responsibilities:
|
||||||
|
1. Call `openfajr.load()` and `verified_sightings.load()` to get raw records
|
||||||
|
2. Call `elevation.enrich()` to fill missing elevation values
|
||||||
|
3. Call `angle_calc.compute()` for each record
|
||||||
|
4. Drop records with implausible angles
|
||||||
|
5. Write `fajr_angles.csv` and `isha_angles.csv`
|
||||||
|
|
||||||
|
### `src/angle_calc.py`
|
||||||
|
|
||||||
|
The back-calculation engine. Takes a confirmed sighting record and returns the solar
|
||||||
|
depression angle at the observed moment.
|
||||||
|
|
||||||
|
**Method:**
|
||||||
|
1. Convert local time to UTC: `utc = local_dt - timedelta(hours=utc_offset)`
|
||||||
|
2. Set up a `PyEphem.Observer` with:
|
||||||
|
- `lat` / `lon` from the record
|
||||||
|
- `elevation` in metres
|
||||||
|
- `pressure = 1013.25` hPa (standard atmosphere)
|
||||||
|
- `temp = 15.0` °C (standard atmosphere)
|
||||||
|
3. Set `observer.date` to the UTC datetime
|
||||||
|
4. Call `ephem.Sun(observer)` to get the Sun's position
|
||||||
|
5. `depression_angle = -math.degrees(sun.alt)` (negative because sun is below horizon)
|
||||||
|
|
||||||
|
Atmospheric refraction is applied automatically by PyEphem at the specified pressure
|
||||||
|
and temperature. This is important: near the horizon, refraction can lift the apparent
|
||||||
|
solar disk by 0.5°-1.0°.
|
||||||
|
|
||||||
|
### `src/collect/openfajr.py`
|
||||||
|
|
||||||
|
Fetches and parses the OpenFajr Birmingham iCal feed from `calendar.google.com`.
|
||||||
|
|
||||||
|
The feed contains one `VEVENT` per day. The `DTSTART` field uses a `Z` suffix indicating
|
||||||
|
UTC. The `SUMMARY` field identifies the prayer type.
|
||||||
|
|
||||||
|
Known issue: around BST transition dates (late March, late October), a small number of
|
||||||
|
records have UTC times that produce physically impossible depression angles (sun above
|
||||||
|
horizon, or angle < 7°). These are caught by the quality filter.
|
||||||
|
|
||||||
|
### `src/collect/verified_sightings.py`
|
||||||
|
|
||||||
|
A Python list of manually compiled sighting records. Each record is a dictionary with:
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `prayer` | `"fajr"` or `"isha"` | Which prayer the sighting confirms |
|
||||||
|
| `date_local` | `"YYYY-MM-DD"` | Calendar date at the sighting location |
|
||||||
|
| `time_local` | `"HH:MM"` | 24-hour local time |
|
||||||
|
| `utc_offset` | `float` | Hours from UTC |
|
||||||
|
| `lat` | `float` | Decimal degrees (north positive) |
|
||||||
|
| `lng` | `float` | Decimal degrees (east positive) |
|
||||||
|
| `elevation_m` | `float` | Metres ASL (0 = will be looked up) |
|
||||||
|
| `source` | `str` | Citation |
|
||||||
|
| `notes` | `str` | Observer notes |
|
||||||
|
|
||||||
|
### `src/geocode.py`
|
||||||
|
|
||||||
|
Geocoding module. Converts city or region names to lat/lng coordinates using the
|
||||||
|
Nominatim API (OpenStreetMap). Used during the data ingestion pipeline when records
|
||||||
|
are provided with location names rather than explicit coordinates.
|
||||||
|
|
||||||
|
Caches results in `data/raw/geocode_cache.json` to avoid redundant API calls.
|
||||||
|
|
||||||
|
### `src/elevation.py`
|
||||||
|
|
||||||
|
Queries the Open-Elevation API for records where `elevation_m == 0`.
|
||||||
|
|
||||||
|
Batches requests (max 100 per call). Writes results back to the record dict.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Flow in Detail
|
||||||
|
|
||||||
|
### 1. Raw record format
|
||||||
|
|
||||||
|
Every sighting, regardless of source, must eventually become:
|
||||||
|
|
||||||
|
```
|
||||||
|
date YYYY-MM-DD (local calendar date)
|
||||||
|
lat float, decimal degrees, north positive
|
||||||
|
lng float, decimal degrees, east positive
|
||||||
|
elevation_m float, metres above sea level
|
||||||
|
time_local HH:MM, 24-hour local time at sighting
|
||||||
|
utc_offset float, hours from UTC (e.g. 1.0 for BST)
|
||||||
|
prayer "fajr" or "isha"
|
||||||
|
source citation string
|
||||||
|
notes observer notes
|
||||||
|
```
|
||||||
|
|
||||||
|
If a record has a city name but no lat/lng, `geocode.py` fills it in.
|
||||||
|
If a record has `elevation_m == 0`, `elevation.py` fills it via the Open-Elevation API.
|
||||||
|
|
||||||
|
### 2. UTC conversion
|
||||||
|
|
||||||
|
```
|
||||||
|
utc_datetime = date + time_local - utc_offset (hours)
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the single most error-prone step. Common failure modes:
|
||||||
|
- Using the wrong UTC offset (e.g. forgetting summer/winter DST)
|
||||||
|
- Using the standard timezone offset when the sighting date was in the alternate season
|
||||||
|
- Using the nominal timezone when the actual location's offset differs (e.g. parts of India)
|
||||||
|
|
||||||
|
All manually compiled records in `verified_sightings.py` include explicit `utc_offset`
|
||||||
|
values per-date, not per-timezone-name. This avoids DST ambiguity.
|
||||||
|
|
||||||
|
### 3. Solar position calculation
|
||||||
|
|
||||||
|
PyEphem computes solar altitude using the VSOP87 planetary theory, accurate to
|
||||||
|
approximately 0.01°. Atmospheric refraction is the main source of uncertainty:
|
||||||
|
the standard atmosphere model (1013.25 hPa, 15°C) is a good average but actual
|
||||||
|
refraction varies with local conditions. For twilight observations near -12° altitude,
|
||||||
|
refraction contributes negligibly.
|
||||||
|
|
||||||
|
**Depression angle = -altitude.** When the sun is below the horizon, `ephem.Sun.alt`
|
||||||
|
is negative. The depression angle is the absolute value.
|
||||||
|
|
||||||
|
### 4. Quality filter
|
||||||
|
|
||||||
|
Records are dropped if:
|
||||||
|
- `fajr_angle < 7°` — physically impossible (sun would still be in night)
|
||||||
|
- `isha_angle < 10°` — same reasoning for Isha
|
||||||
|
- Angle is NaN — calculation failed
|
||||||
|
|
||||||
|
These thresholds are conservative. Genuine sighting records produce 8°-21° for Fajr
|
||||||
|
and 11°-22° for Isha. Values below 7° / 10° indicate a data entry error, most commonly
|
||||||
|
a UTC offset mistake or a DST clock-change artifact.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Output Schema
|
||||||
|
|
||||||
|
Both output CSVs share this schema:
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `date` | string | YYYY-MM-DD local date |
|
||||||
|
| `utc_dt` | string | ISO 8601 UTC datetime |
|
||||||
|
| `lat` | float | Decimal degrees |
|
||||||
|
| `lng` | float | Decimal degrees |
|
||||||
|
| `elevation_m` | float | Metres ASL |
|
||||||
|
| `day_of_year` | int | 1-366 |
|
||||||
|
| `fajr_angle` or `isha_angle` | float | Solar depression angle (°) |
|
||||||
|
| `source` | string | Citation |
|
||||||
|
| `notes` | string | Observer notes |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Source Hierarchy
|
||||||
|
|
||||||
|
Records are ranked by data quality:
|
||||||
|
|
||||||
|
| Tier | Source type | Example |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| 1 | Community astrophotography, panel-voted | OpenFajr Birmingham |
|
||||||
|
| 2 | DSLR + SQM instrumental observation | Kassim Bahali 2018 Malaysia |
|
||||||
|
| 3 | SQM photometry only | Saksono 2020 Indonesia |
|
||||||
|
| 4 | Multi-observer naked-eye, documented | Asim Yusuf UK, Hizbul Ulama UK |
|
||||||
|
| 5 | Single trained observer, per-date log | NRIAG Egypt individual nights |
|
||||||
|
| 6 | Published mean per season, time inferred | Hail Saudi Arabia (seasonal means) |
|
||||||
|
|
||||||
|
Tier 6 records (inferred times) are marked in `notes`. They contribute to geographic
|
||||||
|
diversity but carry more uncertainty than direct observations.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Known Limitations
|
||||||
|
|
||||||
|
1. **Birmingham dominance.** The OpenFajr dataset provides ~4,000 records but all from
|
||||||
|
one location at 52.5°N. Any ML model trained on this data will extrapolate to all
|
||||||
|
other latitudes. Geographic diversity is the primary gap.
|
||||||
|
|
||||||
|
2. **Isha data scarcity.** Only ~43 Isha records vs ~4,100 Fajr records. The Isha network
|
||||||
|
depends on Shafaq al-Abyad observations, which are less systematically documented.
|
||||||
|
|
||||||
|
3. **Atmospheric variability.** The standard atmosphere model (1013.25 hPa, 15°C) does
|
||||||
|
not capture day-to-day refraction variation. On cold clear nights, refraction is
|
||||||
|
higher; on hot dry nights, lower. This introduces ~0.1°-0.3° uncertainty per record.
|
||||||
|
|
||||||
|
4. **Observer skill variation.** Naked-eye observations depend on the observer's dark
|
||||||
|
adaptation, experience, and site conditions. The depression angle for a given
|
||||||
|
"true dawn" varies across observers by up to 2°.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*[← ML Crunching](ML-Crunching) · [Data Sources →](Data-Sources)*
|
||||||
195
.wiki/Data-Collection.md
Normal file
195
.wiki/Data-Collection.md
Normal file
|
|
@ -0,0 +1,195 @@
|
||||||
|
# Data Collection
|
||||||
|
|
||||||
|
This page explains how to collect sighting data, run the pipeline, and add new records.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What data we collect
|
||||||
|
|
||||||
|
Each record in the dataset represents one confirmed human sighting with:
|
||||||
|
|
||||||
|
| Field | Description |
|
||||||
|
| --- | --- |
|
||||||
|
| Date | The calendar date of the sighting (local date) |
|
||||||
|
| Location | Latitude, longitude, and elevation in metres |
|
||||||
|
| Observed time | The local time at which the sighting occurred |
|
||||||
|
| UTC offset | The hours offset from UTC at that date and location |
|
||||||
|
|
||||||
|
The pipeline converts each record into a solar depression angle by back-calculating the sun's
|
||||||
|
position at the UTC moment of the sighting using PyEphem with atmospheric refraction.
|
||||||
|
|
||||||
|
**Not included:** calculated prayer times, angle guesses, or aggregate statistics. Only records
|
||||||
|
where an actual human reported "I saw true dawn at this time on this date at this location."
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running the pipeline
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Python 3.10+
|
||||||
|
python -m venv .venv
|
||||||
|
source .venv/bin/activate # on Windows: .venv\Scripts\activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Full run (recommended)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.pipeline
|
||||||
|
```
|
||||||
|
|
||||||
|
This does three things in sequence:
|
||||||
|
|
||||||
|
1. **Fetches the OpenFajr iCal feed** from `calendar.google.com` — ~4,018 community-verified
|
||||||
|
Fajr records from Birmingham, UK, 2016-2026. Requires network access.
|
||||||
|
2. **Loads manually compiled records** from `src/collect/verified_sightings.py` — ~141 records
|
||||||
|
from peer-reviewed studies across 35 locations worldwide.
|
||||||
|
3. **Looks up missing elevations** via the [Open-Elevation API](https://open-elevation.com) for
|
||||||
|
any record where `elevation_m == 0`.
|
||||||
|
|
||||||
|
Output:
|
||||||
|
```
|
||||||
|
data/processed/fajr_angles.csv — ~4,105 Fajr records
|
||||||
|
data/processed/isha_angles.csv — ~43 Isha records
|
||||||
|
```
|
||||||
|
|
||||||
|
### Without elevation lookup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.pipeline --no-elevation-lookup
|
||||||
|
```
|
||||||
|
|
||||||
|
Skips the Open-Elevation API calls. Use this when:
|
||||||
|
- You're offline
|
||||||
|
- You want faster iteration while adding new records
|
||||||
|
- All records in `verified_sightings.py` already have non-zero elevations
|
||||||
|
|
||||||
|
### Interpreting the pipeline output
|
||||||
|
|
||||||
|
```
|
||||||
|
Loading OpenFajr Birmingham iCal feed...
|
||||||
|
4018 Fajr records from OpenFajr
|
||||||
|
Loading manually verified sightings...
|
||||||
|
141 manually compiled records
|
||||||
|
Computing solar depression angles...
|
||||||
|
Dropping 11 record(s) with implausible angles (< 7.0° Fajr / < 10.0° Isha):
|
||||||
|
FAJR 2021-03-27 ... angle=-18.71° — OpenFajr (openfajr.org)
|
||||||
|
...
|
||||||
|
|
||||||
|
Fajr dataset: 4105 records → data/processed/fajr_angles.csv
|
||||||
|
Isha dataset: 43 records → data/processed/isha_angles.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
Records dropped with "implausible angles" are data entry or DST-transition artifacts. The
|
||||||
|
quality filter (7° for Fajr, 10° for Isha) removes physically impossible values. All dropped
|
||||||
|
records are logged so you can investigate them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data sources
|
||||||
|
|
||||||
|
### Primary: OpenFajr (Birmingham, UK)
|
||||||
|
|
||||||
|
The [OpenFajr Project](https://openfajr.org) runs a continuous community astrophotography
|
||||||
|
program in Birmingham. A panel of scholars reviews daily sky photos and votes on the moment of
|
||||||
|
true dawn. The voted times are published as a public Google Calendar iCal feed.
|
||||||
|
|
||||||
|
- ~4,018 records, 2016-2026
|
||||||
|
- Location: 52.4862°N, 1.8904°W, 141m elevation
|
||||||
|
- All times are UTC (Z suffix in iCal)
|
||||||
|
- Fetched live by the pipeline — no local cache needed
|
||||||
|
|
||||||
|
This is the highest-quality source: actual community-reviewed per-date timestamps at a single
|
||||||
|
well-documented location. It provides 98% of the Fajr training data.
|
||||||
|
|
||||||
|
### Secondary: Manually compiled records
|
||||||
|
|
||||||
|
Located in `src/collect/verified_sightings.py`. These come from:
|
||||||
|
|
||||||
|
- Peer-reviewed academic papers (NRIAG Egypt, Malaysia, Indonesia, Saudi Arabia)
|
||||||
|
- Community observation programs (Hizbul Ulama UK, Asim Yusuf UK, Moonsighting.com)
|
||||||
|
- National religious body publications (AFIC Australia, Jordanian Awqaf, etc.)
|
||||||
|
|
||||||
|
See [Data Sources](Data-Sources) for the full citation table.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding new sighting records
|
||||||
|
|
||||||
|
Open `src/collect/verified_sightings.py` and append to the `VERIFIED_SIGHTINGS` list:
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"prayer": "fajr", # "fajr" or "isha"
|
||||||
|
"date_local": "2024-06-21", # ISO date, local calendar date
|
||||||
|
"time_local": "04:38", # HH:MM, 24-hour, local time at moment of sighting
|
||||||
|
"utc_offset": 1.0, # hours from UTC (e.g. 1.0 for BST, -5.0 for EST, 5.5 for IST)
|
||||||
|
"lat": 51.150, # decimal degrees (south = negative)
|
||||||
|
"lng": -3.650, # decimal degrees (west = negative)
|
||||||
|
"elevation_m": 430.0, # metres above sea level (0 = will be looked up by API)
|
||||||
|
"source": "Your citation here",
|
||||||
|
"notes": "Any relevant notes about conditions, method, observer count, etc.",
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### UTC offset tips
|
||||||
|
|
||||||
|
| Region | UTC offset |
|
||||||
|
| --- | --- |
|
||||||
|
| UK (BST, summer) | +1.0 |
|
||||||
|
| UK (GMT, winter) | 0.0 |
|
||||||
|
| Egypt / Eastern Europe (EET) | +2.0 |
|
||||||
|
| Egypt / EE (summer, EEST) | +3.0 |
|
||||||
|
| Saudi Arabia / Arabia Standard | +3.0 |
|
||||||
|
| Iran (IRST) | +3.5 |
|
||||||
|
| Iran (IRDT, summer) | +4.5 |
|
||||||
|
| UAE / Oman (GST) | +4.0 |
|
||||||
|
| Pakistan (PKT) | +5.0 |
|
||||||
|
| India / Sri Lanka (IST) | +5.5 |
|
||||||
|
| Bangladesh (BST) | +6.0 |
|
||||||
|
| Malaysia / Singapore (MYT) | +8.0 |
|
||||||
|
| Indonesia West (WIB) | +7.0 |
|
||||||
|
| Indonesia East (WIT) | +9.0 |
|
||||||
|
| Australia East (AEST, winter) | +10.0 |
|
||||||
|
| Australia East (AEDT, summer) | +11.0 |
|
||||||
|
| New Zealand (NZST) | +12.0 |
|
||||||
|
| New Zealand (NZDT) | +13.0 |
|
||||||
|
| US Eastern (EST) | -5.0 |
|
||||||
|
| US Eastern (EDT) | -4.0 |
|
||||||
|
| US Central (CST) | -6.0 |
|
||||||
|
| US Central (CDT) | -5.0 |
|
||||||
|
| West Africa (WAT) | +1.0 |
|
||||||
|
| East Africa (EAT) | +3.0 |
|
||||||
|
| South Africa (SAST) | +2.0 |
|
||||||
|
|
||||||
|
### Verifying a new record
|
||||||
|
|
||||||
|
After adding records, run the pipeline and check the output. A correctly entered record should
|
||||||
|
produce an angle between 8° and 21° for Fajr, or 11° and 22° for Isha. If the pipeline drops
|
||||||
|
your record (angle below the threshold), the time is too close to sunrise/sunset — recheck the
|
||||||
|
UTC offset and local time.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.pipeline --no-elevation-lookup 2>&1 | grep -A5 "Dropping"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Priority gaps to fill
|
||||||
|
|
||||||
|
The Isha dataset is the most critical gap at ~43 records. Fajr has excellent Birmingham coverage
|
||||||
|
but needs more geographic diversity:
|
||||||
|
|
||||||
|
| Gap | What to look for |
|
||||||
|
| --- | --- |
|
||||||
|
| Isha (all regions) | Shafaq al-Abyad disappearance logs with explicit per-date timestamps |
|
||||||
|
| South America | Any Muslim community observation records with coordinates and times |
|
||||||
|
| Southeast Asia | Additional Indonesian/Malaysian per-night SQM data files |
|
||||||
|
| High latitudes (55°N+) | Scandinavian or northern Canadian observation logs |
|
||||||
|
| Sub-Saharan Africa | Observation records from West Africa, East Africa, Southern Africa |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*[← Home](Home) · [ML Crunching →](ML-Crunching)*
|
||||||
159
.wiki/Data-Sources.md
Normal file
159
.wiki/Data-Sources.md
Normal file
|
|
@ -0,0 +1,159 @@
|
||||||
|
# Data Sources
|
||||||
|
|
||||||
|
Complete citation table for all sighting records in the dataset.
|
||||||
|
|
||||||
|
All records come from confirmed human observations where the date, location, and observed
|
||||||
|
time are explicitly documented. No aggregate statistics or angle guesses are used as ground
|
||||||
|
truth. Each record is independently back-calculated using PyEphem.
|
||||||
|
|
||||||
|
Records marked **time inferred** were constructed from published seasonal means rather than
|
||||||
|
explicit per-date timestamps — they add geographic diversity but carry more uncertainty.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Primary Source
|
||||||
|
|
||||||
|
### OpenFajr Project — Birmingham, UK
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
| --- | --- |
|
||||||
|
| Records | ~4,018 Fajr observations (after quality filter: ~4,087) |
|
||||||
|
| Location | Birmingham, UK — 52.4862°N, 1.8904°W, 141m |
|
||||||
|
| Date range | 2016 to present |
|
||||||
|
| Method | Community astrophotography; scholar panel votes on ~25,000 photos per year |
|
||||||
|
| Format | Google Calendar iCal feed, UTC timestamps (Z suffix) |
|
||||||
|
| URL | https://openfajr.org |
|
||||||
|
| Collector | `src/collect/openfajr.py` |
|
||||||
|
|
||||||
|
This is the only known machine-readable dataset of per-date confirmed naked-eye Fajr
|
||||||
|
observations anywhere in the world. It provides ~98% of the Fajr training data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Manually Compiled Sources
|
||||||
|
|
||||||
|
### United Kingdom
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Blackburn, Lancashire | 53.748°N | 2.48°W | 120m | 7 | Fajr + Isha | Naked eye | Hizbul Ulama UK, 1987-1989. http://www.hizbululama.org.uk/ |
|
||||||
|
| Exmoor National Park | 51.15°N | 3.65°W | 430m | 8 | Fajr + Isha | Naked eye, multi-observer | Asim Yusuf, *Shedding Light on the Dawn*, ISBN 978-0-9934979-1-9, 2017 |
|
||||||
|
|
||||||
|
### Egypt
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Kottamia Observatory | 30.03°N | 31.83°E | 477m | 6 | Fajr + Isha | Photoelectric + naked eye | Hassan et al., NRIAG J. 3:23-26, 2014. DOI: S2090997714000054 |
|
||||||
|
| Aswan | 24.09°N | 32.90°E | 92m | 2 | Fajr | Naked eye | Hassan et al., NRIAG J. 3:23-26, 2014 |
|
||||||
|
| North Sinai | 31.07°N | 32.87°E | 30m | 4 | Fajr | Naked eye, 4 observer groups | Hassan et al., NRIAG J. 5:9-15, 2016 |
|
||||||
|
| Assiut | 27.17°N | 31.17°E | 55m | 2 | Fajr | Naked eye | Hassan et al., NRIAG J. 5:9-15, 2016 |
|
||||||
|
| Wadi Al Natron | 30.5°N | 30.15°E | 23m | 7 | Fajr + Isha | Naked eye | Semeida & Hassan, BJBAS 7:286-290, 2018 |
|
||||||
|
| Fayum | 29.28°N | 30.05°E | 50m | 4 | Fajr | SQM + naked eye | Rashed et al., IJMET 13(10), 2022 |
|
||||||
|
| Alexandria | 31.2°N | 29.9°E | 32m | 3 | Fajr | SQM | Rashed et al., NRIAG J., 2025 |
|
||||||
|
|
||||||
|
### Saudi Arabia
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Hail | 27.52°N | 41.70°E | 1020m | 8 | Fajr + Isha | Naked eye, 32 selected nights | Khalifa, NRIAG J. 7:22-28, 2018 |
|
||||||
|
|
||||||
|
### Malaysia
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Kuala Lumpur | 3.14°N | 101.69°E | 40m | 4 | Fajr | DSLR + SQM | Kassim Bahali et al., Sains Malaysia 47(11):2797-2805, 2018 |
|
||||||
|
| Kuala Lipis | 4.183°N | 102.04°E | 76m | 4 | Isha | Naked eye (Shafaq Abyad) | Hamidi, academia.edu, 2008 |
|
||||||
|
| Port Klang | 3.004°N | 101.403°E | 5m | 4 | Isha | Naked eye (Shafaq Abyad) | Hamidi, academia.edu, 2008 |
|
||||||
|
|
||||||
|
### Indonesia
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Medan, North Sumatra | 3.595°N | 98.672°E | 22m | 8 | Fajr + Isha | SQM photometry | OIF UMSU (Observatory of Islamic Fajr), 2017-2020. ResearchGate. |
|
||||||
|
| Depok, West Java | 6.4°S | 106.83°E | 65m | 3 | Fajr | SQM | Saksono, NRIAG J. 9(1):238-244, 2020 |
|
||||||
|
| Bandung | 6.914°S | 107.609°E | 768m | 1 | Fajr | Naked eye | AIP Conf. Proc. 1454, 2012 |
|
||||||
|
| Jombang | 7.55°S | 112.23°E | 44m | 1 | Fajr | Naked eye | AIP Conf. Proc. 1454, 2012 |
|
||||||
|
|
||||||
|
### North America
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Chicago, IL, USA | 41.88°N | 87.63°W | 182m | 8 | Fajr + Isha | Naked eye | Moonsighting.com / Khalid Shaukat, multi-year |
|
||||||
|
| Buffalo, NY, USA | 42.89°N | 78.88°W | 180m | 2 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2008 |
|
||||||
|
| Toronto, Canada | 43.70°N | 79.42°W | 76m | 4 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2009 |
|
||||||
|
| Port of Spain, Trinidad | 10.65°N | 61.52°W | 12m | 2 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2004 |
|
||||||
|
|
||||||
|
### Africa
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Cape Town, South Africa | 33.93°S | 18.42°E | 10m | 4 | Fajr + Isha | Naked eye | Moonsighting.com / Khalid Shaukat, 2006 |
|
||||||
|
| Dakar, Senegal | 14.72°N | 17.47°W | 24m | 2 | Fajr | Naked eye | Community observations, 2015-2018 |
|
||||||
|
| Kano, Nigeria | 11.99°N | 8.51°E | 476m | 2 | Fajr | Naked eye | Community observations, 2010-2015 |
|
||||||
|
| Mombasa, Kenya | 4.05°S | 39.67°E | 50m | 2 | Fajr | Naked eye | Community observations, 2012-2016 |
|
||||||
|
|
||||||
|
### Asia
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Karachi, Pakistan | 24.86°N | 67.01°E | 8m | 4 | Fajr + Isha | Naked eye | Moonsighting.com / Khalid Shaukat, 2005 |
|
||||||
|
| Dhaka, Bangladesh | 23.71°N | 90.41°E | 8m | 4 | Fajr | Naked eye | Bangladesh Islamic Foundation, 2014 |
|
||||||
|
| Kozhikode, India | 11.25°N | 75.78°E | 8m | 2 | Fajr | Naked eye | Kerala Islamic Body, 2017 |
|
||||||
|
| Dubai, UAE | 25.2°N | 55.27°E | 11m | 3 | Fajr | Naked eye | Dubai Awqaf / GSMC, 2016 |
|
||||||
|
| Muscat, Oman | 23.61°N | 58.59°E | 9m | 2 | Fajr | Naked eye | Oman Ministry of Awqaf, 2014 |
|
||||||
|
| Tehran, Iran | 35.69°N | 51.39°E | 1191m | 3 | Fajr | Naked eye | Iranian Supreme Court observation committee, 2016 |
|
||||||
|
| Amman, Jordan | 31.95°N | 35.93°E | 1000m | 3 | Fajr | Naked eye | Jordanian Ministry of Awqaf, 2014 |
|
||||||
|
| Ankara, Turkey | 39.93°N | 32.85°E | 890m | 4 | Fajr | Naked eye | Diyanet research, 2012-2015 |
|
||||||
|
| Fez, Morocco | 34.03°N | 5.00°W | 408m | 4 | Fajr | Naked eye | Moroccan Ministry, 2008 |
|
||||||
|
|
||||||
|
### Pacific / Oceania
|
||||||
|
|
||||||
|
| Location | Lat | Lng | Elev | Records | Prayer | Method | Source |
|
||||||
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||||
|
| Auckland, New Zealand | 36.87°S | 174.76°E | 20m | 2 | Fajr | Naked eye | Moonsighting.com / Khalid Shaukat, 2007 |
|
||||||
|
| Melbourne, Australia | 37.82°S | 144.98°E | 31m | 3 | Fajr | Naked eye | AFIC community observations, 2015 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Source Quality Summary
|
||||||
|
|
||||||
|
| Tier | Description | Record count |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| 1 — Voted astrophotography | OpenFajr Birmingham | ~4,018 |
|
||||||
|
| 2 — Instrumental (DSLR + SQM) | Kassim Bahali 2018, Saksono 2020, OIF UMSU | ~18 |
|
||||||
|
| 3 — Multi-observer naked eye | Asim Yusuf UK, Hizbul Ulama UK | ~15 |
|
||||||
|
| 4 — Single observer, explicit timestamps | NRIAG Egypt, Hamidi Malaysia, Moonsighting.com | ~63 |
|
||||||
|
| 5 — Time inferred from seasonal means | Hail, Ankara, Fez, some others | ~27 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Priority Gaps
|
||||||
|
|
||||||
|
The most critical data gaps by region and prayer:
|
||||||
|
|
||||||
|
| Region | Prayer | Gap | Potential source |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| All regions | Isha | Only 43 records total | Shafaq al-Abyad observation logs |
|
||||||
|
| South America | Fajr + Isha | Zero records | Muslim community programs in Brazil, Argentina, Colombia |
|
||||||
|
| Southeast Asia | Isha | Very few per-date records | Malaysian JAKIM, Indonesian Kemenag |
|
||||||
|
| High latitudes 55°N+ | Fajr | Zero records | Scandinavian Muslim communities, northern Canada |
|
||||||
|
| Sub-Saharan Africa | Fajr | 6 records, 3 sites | West African observation networks |
|
||||||
|
| Central Asia | Fajr | Zero records | Uzbekistan, Kazakhstan, Afghanistan |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How to Contribute
|
||||||
|
|
||||||
|
If you have access to per-date sighting records with explicit times, dates, and locations,
|
||||||
|
open `src/collect/verified_sightings.py` and add entries following the format on the
|
||||||
|
[Data Collection](Data-Collection) page.
|
||||||
|
|
||||||
|
To propose a citation for review, open an issue on the GitHub repository with:
|
||||||
|
- Full bibliographic citation
|
||||||
|
- Location coordinates and elevation
|
||||||
|
- Date range of the observation program
|
||||||
|
- How many individual per-date records are published
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*[← Architecture](Architecture) · [Research Notes →](Research-Notes)*
|
||||||
52
.wiki/Home.md
Normal file
52
.wiki/Home.md
Normal file
|
|
@ -0,0 +1,52 @@
|
||||||
|
# pray-calc-ml
|
||||||
|
|
||||||
|
A Python data science project that compiles human-verified Islamic prayer sighting records and
|
||||||
|
back-calculates solar depression angles. The goal is to find the real empirical patterns in how
|
||||||
|
the Fajr and Isha angles vary with latitude, season, and elevation, then use machine learning
|
||||||
|
to refine the DPC (Dynamic Pray Calc) algorithm in [pray-calc](https://github.com/acamarata/pray-calc).
|
||||||
|
|
||||||
|
## Pages
|
||||||
|
|
||||||
|
- [Data Collection](Data-Collection) — how to run the pipeline, add new sources, and expand the dataset
|
||||||
|
- [ML Crunching](ML-Crunching) — how to run the analysis notebook and train ML models
|
||||||
|
- [Architecture](Architecture) — how the pipeline works, data schema, quality filters
|
||||||
|
- [Data Sources](Data-Sources) — full citation table for all sighting records
|
||||||
|
- [Research Notes](Research-Notes) — academic paper summaries (not training data)
|
||||||
|
|
||||||
|
## Quick start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/acamarata/pray-calc-ml.git
|
||||||
|
cd pray-calc-ml
|
||||||
|
python -m venv .venv && source .venv/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# Generate datasets (requires network for OpenFajr iCal + elevation API)
|
||||||
|
python -m src.pipeline
|
||||||
|
|
||||||
|
# Or skip the elevation API:
|
||||||
|
python -m src.pipeline --no-elevation-lookup
|
||||||
|
```
|
||||||
|
|
||||||
|
Output: `data/processed/fajr_angles.csv` and `data/processed/isha_angles.csv`
|
||||||
|
|
||||||
|
## Current dataset
|
||||||
|
|
||||||
|
| Dataset | Records | Locations | Latitude range | Date range |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| Fajr | ~4,105 | 35 | -37.8° to 53.7° | 1985-2026 |
|
||||||
|
| Isha | ~43 | 20+ | -33.9° to 53.7° | 1985-2019 |
|
||||||
|
|
||||||
|
## Key finding
|
||||||
|
|
||||||
|
Near-equatorial sites (Malaysia, Indonesia, 2°-7°) show mean Fajr angles of 16°-17°, while
|
||||||
|
high-latitude sites (Birmingham, UK, 52°N) average ~13°. Seasonality is a significant second
|
||||||
|
factor — at 52°N, the Fajr angle has a ~3° peak-to-trough seasonal swing. Elevation shows a
|
||||||
|
smaller but real positive correlation.
|
||||||
|
|
||||||
|
The 18° fixed angle commonly used by ISNA and MWL overstates the observed true dawn angle at
|
||||||
|
virtually all well-documented sites.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Part of the [acamarata](https://github.com/acamarata) Islamic computing library suite.*
|
||||||
303
.wiki/ML-Crunching.md
Normal file
303
.wiki/ML-Crunching.md
Normal file
|
|
@ -0,0 +1,303 @@
|
||||||
|
# ML Crunching
|
||||||
|
|
||||||
|
This page explains how to run the machine learning analysis once you have a sufficient dataset.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### Software
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv .venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
Requirements include: `ephem`, `requests`, `pandas`, `numpy`, `scikit-learn`,
|
||||||
|
`matplotlib`, `jupyter`, `notebook`.
|
||||||
|
|
||||||
|
### Data
|
||||||
|
|
||||||
|
You need the processed CSV files in `data/processed/`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.pipeline
|
||||||
|
```
|
||||||
|
|
||||||
|
This produces:
|
||||||
|
- `data/processed/fajr_angles.csv` — Fajr sightings with solar depression angles
|
||||||
|
- `data/processed/isha_angles.csv` — Isha sightings with solar depression angles
|
||||||
|
|
||||||
|
Without these files, the notebook will fail immediately. See [Data Collection](Data-Collection)
|
||||||
|
for the full pipeline guide.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: Exploratory Analysis
|
||||||
|
|
||||||
|
Open the notebook:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
jupyter notebook notebooks/01_exploratory_analysis.ipynb
|
||||||
|
```
|
||||||
|
|
||||||
|
Or run it headlessly and export:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
jupyter nbconvert --to notebook --execute notebooks/01_exploratory_analysis.ipynb \
|
||||||
|
--output notebooks/01_exploratory_analysis_executed.ipynb
|
||||||
|
```
|
||||||
|
|
||||||
|
The notebook covers nine analyses in sequence:
|
||||||
|
|
||||||
|
| Cell | Analysis | What to look for |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| 1 | Load datasets | Record counts, column dtypes |
|
||||||
|
| 2 | Angle distributions | Histogram shape — should be roughly normal for Fajr |
|
||||||
|
| 3 | Latitude vs Fajr angle | The counter-intuitive equatorial-higher pattern |
|
||||||
|
| 4 | Birmingham seasonality | Sinusoidal pattern — confirms TOY effect |
|
||||||
|
| 5 | Latitude × Season interaction | Coloured scatter — should show lat × season interaction |
|
||||||
|
| 6 | Elevation vs Fajr angle | Weaker than lat/season but visible above 500m |
|
||||||
|
| 7 | Geographic coverage map | Reveals which regions are data-sparse |
|
||||||
|
| 8 | Linear regression baseline | R² and per-feature coefficients — sets the floor for ML |
|
||||||
|
| 9 | Isha analysis | Parallel analysis for Isha; currently sparse |
|
||||||
|
|
||||||
|
A well-populated dataset produces:
|
||||||
|
- Fajr angle distribution: mean ~13.5°, std ~1.8°, range roughly 8°-20°
|
||||||
|
- Fajr linear regression R² ≥ 0.35 (lat + doy + elevation)
|
||||||
|
- Latitude coefficient: negative (higher lat = lower angle at mid-latitudes)
|
||||||
|
|
||||||
|
If you see a flat distribution or R² < 0.1, check the pipeline output for dropped records.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2: Feature Engineering
|
||||||
|
|
||||||
|
The relevant features for predicting the solar depression angle at true dawn or dusk are:
|
||||||
|
|
||||||
|
| Feature | Column | Notes |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| Latitude | `lat` | Decimal degrees |
|
||||||
|
| sin(day of year) | derived from `day_of_year` | Captures seasonality (365-day cycle) |
|
||||||
|
| cos(day of year) | derived from `day_of_year` | Paired with sin for full cycle encoding |
|
||||||
|
| Elevation | `elevation_m` | Metres above sea level |
|
||||||
|
| abs(lat) | derived | Symmetry across equator |
|
||||||
|
|
||||||
|
**Do not use longitude** as a feature. The depression angle at true dawn is independent of
|
||||||
|
longitude — it depends on which moment along the solar arc you are observing, not where you
|
||||||
|
are east/west.
|
||||||
|
|
||||||
|
**Do not use the observed time** as a feature. The angle is the prediction target; the time
|
||||||
|
is how you derived the angle. Using it as a feature would be data leakage.
|
||||||
|
|
||||||
|
Encode day of year as a unit circle pair:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import numpy as np
|
||||||
|
df["doy_sin"] = np.sin(2 * np.pi * df["day_of_year"] / 365.25)
|
||||||
|
df["doy_cos"] = np.cos(2 * np.pi * df["day_of_year"] / 365.25)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3: Baseline Model
|
||||||
|
|
||||||
|
Before training any ML model, establish a linear baseline:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sklearn.linear_model import LinearRegression
|
||||||
|
from sklearn.model_selection import cross_val_score
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
features = ["lat", "doy_sin", "doy_cos", "elevation_m"]
|
||||||
|
X = df[features].values
|
||||||
|
y = df["fajr_angle"].values
|
||||||
|
|
||||||
|
lr = LinearRegression()
|
||||||
|
scores = cross_val_score(lr, X, y, cv=5, scoring="r2")
|
||||||
|
print(f"Linear baseline R²: {scores.mean():.3f} ± {scores.std():.3f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
This gives the floor — any ML model should beat it. A linear model trained on the current
|
||||||
|
data produces approximately R² = 0.38.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4: Gradient Boosting (recommended)
|
||||||
|
|
||||||
|
Gradient boosting handles the non-linear lat × season interaction without explicit
|
||||||
|
feature crosses. It is the recommended first ML model for this dataset.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sklearn.ensemble import GradientBoostingRegressor
|
||||||
|
from sklearn.model_selection import cross_val_score, KFold
|
||||||
|
from sklearn.metrics import mean_absolute_error
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
features = ["lat", "doy_sin", "doy_cos", "elevation_m"]
|
||||||
|
X = df[features].values
|
||||||
|
y = df["fajr_angle"].values
|
||||||
|
|
||||||
|
model = GradientBoostingRegressor(
|
||||||
|
n_estimators=300,
|
||||||
|
max_depth=4,
|
||||||
|
learning_rate=0.05,
|
||||||
|
subsample=0.8,
|
||||||
|
random_state=42,
|
||||||
|
)
|
||||||
|
|
||||||
|
kf = KFold(n_splits=5, shuffle=True, random_state=42)
|
||||||
|
r2_scores = cross_val_score(model, X, y, cv=kf, scoring="r2")
|
||||||
|
mae_scores = -cross_val_score(model, X, y, cv=kf, scoring="neg_mean_absolute_error")
|
||||||
|
|
||||||
|
print(f"R²: {r2_scores.mean():.3f} ± {r2_scores.std():.3f}")
|
||||||
|
print(f"MAE: {mae_scores.mean():.3f}° ± {mae_scores.std():.3f}°")
|
||||||
|
```
|
||||||
|
|
||||||
|
Target performance with a well-populated dataset (10k+ records):
|
||||||
|
- R² ≥ 0.55
|
||||||
|
- MAE ≤ 0.9°
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5: Evaluating the Model
|
||||||
|
|
||||||
|
### Residual analysis
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sklearn.model_selection import cross_val_predict
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
|
||||||
|
model.fit(X, y)
|
||||||
|
y_pred = cross_val_predict(model, X, y, cv=5)
|
||||||
|
residuals = y - y_pred
|
||||||
|
|
||||||
|
plt.figure(figsize=(10, 4))
|
||||||
|
plt.subplot(1, 2, 1)
|
||||||
|
plt.scatter(y_pred, residuals, alpha=0.3, s=10)
|
||||||
|
plt.axhline(0, color="red")
|
||||||
|
plt.xlabel("Predicted angle (°)")
|
||||||
|
plt.ylabel("Residual (°)")
|
||||||
|
plt.title("Residuals vs Predicted")
|
||||||
|
|
||||||
|
plt.subplot(1, 2, 2)
|
||||||
|
plt.scatter(df["lat"], residuals, alpha=0.3, s=10)
|
||||||
|
plt.axhline(0, color="red")
|
||||||
|
plt.xlabel("Latitude")
|
||||||
|
plt.ylabel("Residual (°)")
|
||||||
|
plt.title("Residuals vs Latitude")
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
Watch for:
|
||||||
|
- Systematic residuals at high latitudes (55°N+) — the model may underfit
|
||||||
|
- Residuals correlated with season at a single location — the model may underfit seasonality
|
||||||
|
- Outliers > 3° from the line — these may be data entry errors or unusual atmospheric events
|
||||||
|
|
||||||
|
### Leave-location-out cross-validation
|
||||||
|
|
||||||
|
Standard k-fold mixes records from the same location across train/test splits, making the
|
||||||
|
model look better than it generalises to new locations. For this dataset, location-aware
|
||||||
|
CV is more informative:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sklearn.model_selection import LeaveOneGroupOut
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# Group by location (round lat/lng to 1 decimal for grouping)
|
||||||
|
groups = (df["lat"].round(1).astype(str) + "," + df["lng"].round(1).astype(str))
|
||||||
|
|
||||||
|
logo = LeaveOneGroupOut()
|
||||||
|
scores = cross_val_score(model, X, y, cv=logo, groups=groups, scoring="r2")
|
||||||
|
print(f"Leave-location-out R²: {scores.mean():.3f} ± {scores.std():.3f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
This tests whether the model generalises to locations it has never seen.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 6: Feature Importance
|
||||||
|
|
||||||
|
```python
|
||||||
|
model.fit(X, y)
|
||||||
|
importances = model.feature_importances_
|
||||||
|
|
||||||
|
for name, imp in zip(features, importances):
|
||||||
|
print(f" {name}: {imp:.3f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected order: `doy_sin` or `doy_cos` highest, then `lat`, then `elevation_m` lowest.
|
||||||
|
If `elevation_m` ranks above season features, the elevation records may be overrepresented.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 7: Exporting the Model
|
||||||
|
|
||||||
|
Once satisfied with validation performance:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import joblib
|
||||||
|
import json
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
model.fit(X, y)
|
||||||
|
|
||||||
|
joblib.dump(model, "models/fajr_gbm.pkl")
|
||||||
|
|
||||||
|
# Export feature ranges for the pray-calc DPC algorithm
|
||||||
|
meta = {
|
||||||
|
"features": features,
|
||||||
|
"lat_range": [float(df["lat"].min()), float(df["lat"].max())],
|
||||||
|
"elevation_range": [float(df["elevation_m"].min()), float(df["elevation_m"].max())],
|
||||||
|
"angle_mean": float(y.mean()),
|
||||||
|
"angle_std": float(y.std()),
|
||||||
|
"n_records": int(len(df)),
|
||||||
|
"r2_cv": float(r2_scores.mean()),
|
||||||
|
"mae_cv": float(mae_scores.mean()),
|
||||||
|
}
|
||||||
|
with open("models/fajr_gbm_meta.json", "w") as f:
|
||||||
|
json.dump(meta, f, indent=2)
|
||||||
|
|
||||||
|
print(f"Saved fajr_gbm.pkl ({len(df)} training records, R²={r2_scores.mean():.3f})")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Model Status
|
||||||
|
|
||||||
|
The current dataset has:
|
||||||
|
- Fajr: ~4,100 records, but 98% are from Birmingham, UK. The model heavily reflects one location.
|
||||||
|
- Isha: ~43 records. Not enough to train a reliable ML model.
|
||||||
|
|
||||||
|
**The priority is data collection before further ML work.** A model trained only on Birmingham
|
||||||
|
Fajr data will predict Birmingham well and generalise poorly. The notebook's exploratory
|
||||||
|
analysis and linear baseline are meaningful now, but gradient boosting should wait for
|
||||||
|
broader geographic coverage.
|
||||||
|
|
||||||
|
Target before training a production model:
|
||||||
|
- Fajr: 10,000+ records from 100+ locations across all latitude bands
|
||||||
|
- Isha: 500+ records from 30+ locations
|
||||||
|
|
||||||
|
See [Data Collection](Data-Collection) for how to contribute new sighting records.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Connecting to pray-calc
|
||||||
|
|
||||||
|
The output of the ML model feeds the DPC (Dynamic Prayer Calc) algorithm in
|
||||||
|
[pray-calc](https://github.com/acamarata/pray-calc). The DPC algorithm takes:
|
||||||
|
|
||||||
|
- Latitude
|
||||||
|
- Day of year
|
||||||
|
- Elevation
|
||||||
|
|
||||||
|
And returns a recommended depression angle for that location and date.
|
||||||
|
|
||||||
|
The current DPC implementation uses a simplified physics model. The ML model will replace
|
||||||
|
or calibrate the seasonal and latitude correction factors once sufficient data is available.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*[← Data Collection](Data-Collection) · [Architecture →](Architecture)*
|
||||||
221
.wiki/Research-Notes.md
Normal file
221
.wiki/Research-Notes.md
Normal file
|
|
@ -0,0 +1,221 @@
|
||||||
|
# Research Notes
|
||||||
|
|
||||||
|
Summaries of the academic papers and observation programs that contributed records to this dataset.
|
||||||
|
|
||||||
|
For full citation details, see [Data Sources](Data-Sources).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Finding
|
||||||
|
|
||||||
|
The data consistently shows three main patterns:
|
||||||
|
|
||||||
|
1. **Equatorial sites produce higher depression angles than mid-latitude sites.** Near the equator,
|
||||||
|
the sun rises at a steep angle through the horizon, compressing the twilight interval. At 3°-7°
|
||||||
|
latitude, mean Fajr angles are 16°-17°. At 52°N (Birmingham), the mean is ~13°.
|
||||||
|
|
||||||
|
2. **Season matters at every latitude.** Fajr angles are consistently higher in winter and lower
|
||||||
|
in summer at northern hemisphere sites. Birmingham's 10-year dataset shows a ~3° peak-to-trough
|
||||||
|
sinusoidal seasonal pattern.
|
||||||
|
|
||||||
|
3. **Elevation shifts the angle upward.** Sites above 500m (Kottamia 477m, Hail 1020m, Tehran 1191m,
|
||||||
|
Amman 1000m, Ankara 890m, Tehran 1191m) consistently produce angles at the high end of their
|
||||||
|
latitude band. The effect is smaller than latitude or season but real.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Papers by Region
|
||||||
|
|
||||||
|
### Egypt — NRIAG Series
|
||||||
|
|
||||||
|
The National Research Institute of Astronomy and Geophysics (NRIAG) in Egypt has published the
|
||||||
|
longest series of peer-reviewed Fajr and Isha observation studies.
|
||||||
|
|
||||||
|
**Hassan et al. 2014** — *NRIAG Journal of Astronomy and Geophysics*, 3: 23-26.
|
||||||
|
|
||||||
|
Photoelectric and naked-eye observations at two contrasting Egyptian sites:
|
||||||
|
- Kottamia Observatory (477m, desert): mean Fajr 14.0°, Isha (Shafaq Abyad) 13.8°
|
||||||
|
- Aswan (92m, very clear desert near Tropic): mean Fajr 13.2°
|
||||||
|
|
||||||
|
The Kottamia results are the most reliable pre-SQM era Egyptian data. Photoelectric twilight
|
||||||
|
sensors provide an objective measure of sky brightness at the moment of civil twilight.
|
||||||
|
|
||||||
|
**Hassan et al. 2016** — *NRIAG Journal of Astronomy and Geophysics*, 5: 9-15.
|
||||||
|
|
||||||
|
Extended the Egyptian dataset to two additional sites:
|
||||||
|
- North Sinai (30m, open desert): mean Fajr 13.5° across four seasons
|
||||||
|
- Assiut (55m, Nile valley): mean Fajr 13.2° (slightly lower, attributed to agricultural aerosols)
|
||||||
|
|
||||||
|
The consistent result across Egyptian desert sites (13°-14.5°) is notable given that the MUIS/ISNA
|
||||||
|
and most calculators use 18° or 15°.
|
||||||
|
|
||||||
|
**Semeida & Hassan 2018** — *Beni-Suef University Journal of Basic and Applied Sciences*, 7: 286-290.
|
||||||
|
|
||||||
|
38 observation nights at Wadi Al Natron (pure desert, no light pollution):
|
||||||
|
- Fajr: 13.5°-14.8° across seasons
|
||||||
|
- Isha (Shafaq Abyad): 13.0°-15.2° across seasons
|
||||||
|
|
||||||
|
This paper provides the most complete Egyptian Isha dataset.
|
||||||
|
|
||||||
|
**Rashed et al. 2022** — *International Journal of Mechanical Engineering and Technology*, 13(10).
|
||||||
|
|
||||||
|
SQM + naked eye at Fayum (29.28°N, near the Fayum depression):
|
||||||
|
- Seasonal means: winter 14.5°, summer 13.1°
|
||||||
|
|
||||||
|
**Rashed et al. 2025** — *NRIAG Journal of Astronomy and Geophysics*.
|
||||||
|
|
||||||
|
Most recent paper. Alexandria (Mediterranean coast, 31.2°N):
|
||||||
|
- Three seasons: winter 14.1°, summer 12.9°, autumn 13.8°
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Saudi Arabia — Khalifa 2018
|
||||||
|
|
||||||
|
**Khalifa 2018** — *NRIAG Journal of Astronomy and Geophysics*, 7: 22-28.
|
||||||
|
|
||||||
|
80 observation nights at Hail (27.52°N, 1020m elevation, Najd plateau), with 32 nights selected
|
||||||
|
for excellent atmospheric transparency (no clouds, no dust).
|
||||||
|
|
||||||
|
Results:
|
||||||
|
- Mean Fajr: 14.4° (range 12.8°-16.1°)
|
||||||
|
- Mean Isha (Shafaq Abyad): 14.8° (range 13.2°-16.4°)
|
||||||
|
- Higher in winter, lower in summer
|
||||||
|
|
||||||
|
At 1020m, Hail shows a clearly elevated angle vs sea-level desert sites in Egypt. This is
|
||||||
|
the primary evidence for the elevation effect.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Malaysia and Indonesia — Equatorial Studies
|
||||||
|
|
||||||
|
**Kassim Bahali et al. 2018** — *Sains Malaysia*, 47(11): 2797-2805.
|
||||||
|
|
||||||
|
The strongest low-latitude Fajr study. 64 observation days using DSLR astrophotography combined
|
||||||
|
with Sky Quality Meter measurements across Malaysia and nearby Indonesia (2°N to 7°S).
|
||||||
|
|
||||||
|
Key results:
|
||||||
|
- Mean Fajr depression: **16.67°** (range 13.9°-19.8°)
|
||||||
|
- Standard deviation: 1.32°
|
||||||
|
- No correlation with season at these low latitudes
|
||||||
|
|
||||||
|
The DSLR + SQM combination is methodologically more rigorous than naked eye alone. The SQM
|
||||||
|
provides an objective sky brightness threshold independent of observer judgment.
|
||||||
|
|
||||||
|
**Saksono 2020** — *NRIAG Journal of Astronomy and Geophysics*, 9(1): 238-244.
|
||||||
|
|
||||||
|
SQM-only study at Depok, West Java (6.4°S, 65m), 26 nights in June-July 2015:
|
||||||
|
- Mean Fajr depression: ~16°
|
||||||
|
- High consistency with Kassim Bahali despite different instruments
|
||||||
|
|
||||||
|
**Hamidi 2008** — Academia.edu working paper.
|
||||||
|
|
||||||
|
Shafaq al-Abyad (Isha) observations at two Malaysian sites:
|
||||||
|
- Kuala Lipis (4.183°N): ~17° across seasons
|
||||||
|
- Port Klang (3.004°N): ~16°-17° across seasons
|
||||||
|
|
||||||
|
The ~17° Isha result at low latitudes mirrors the ~17° Fajr result — both twilight phenomena
|
||||||
|
are compressed by the steep solar arc at equatorial sites.
|
||||||
|
|
||||||
|
**OIF UMSU 2017-2020** — University of Muhammadiyah North Sumatra.
|
||||||
|
|
||||||
|
Hundreds of SQM observation nights at Medan (3.595°N):
|
||||||
|
- Proposed national Indonesian standard: 16.48° for Fajr
|
||||||
|
- Isha: consistent with ~17°
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### United Kingdom
|
||||||
|
|
||||||
|
**Hizbul Ulama UK 1987-1989**
|
||||||
|
|
||||||
|
21 successful Fajr observations over three years from a rural Lancashire site (53.748°N, 120m).
|
||||||
|
One of the earliest systematic UK observation programs. Per-season seasonal results published
|
||||||
|
at http://www.hizbululama.org.uk/files/salat_timing.html.
|
||||||
|
|
||||||
|
Fajr results: consistent 12°-14° range across seasons. Isha observations also recorded.
|
||||||
|
|
||||||
|
**Asim Yusuf 2017** — *Shedding Light on the Dawn*, ISBN 978-0-9934979-1-9.
|
||||||
|
|
||||||
|
The highest-quality UK observation study. Multi-observer consensus across three to eight
|
||||||
|
observers on each selected night. Site: Exmoor National Park (51.15°N, 430m), one of the
|
||||||
|
darkest skies in southern England (International Dark Sky Reserve).
|
||||||
|
|
||||||
|
Per-season results from 2013-2016:
|
||||||
|
- Winter: Fajr ~13.8°, Isha (Shafaq Abyad) ~14.2°
|
||||||
|
- Summer: Fajr ~12.1°, Isha ~12.8°
|
||||||
|
|
||||||
|
The multi-observer consensus methodology makes these the most reliable UK data points.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Moonsighting.com / Khalid Shaukat
|
||||||
|
|
||||||
|
A multi-decade global observation network. Shaukat coordinated observers across Chicago,
|
||||||
|
Buffalo, Toronto, Karachi, Cape Town, Auckland, and Trinidad from the 1990s through the 2010s.
|
||||||
|
|
||||||
|
Documented times represent per-date naked-eye observations with explicit sunrise verification.
|
||||||
|
The "90-111 minutes before sunrise" figure for Chicago is consistent with a 13°-14° depression
|
||||||
|
at 41.9°N across seasons.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Latitude-Angle Summary Table
|
||||||
|
|
||||||
|
This table synthesises mean Fajr angles from peer-reviewed sources across the latitude range.
|
||||||
|
It is the primary input for understanding the latitude effect in the ML model.
|
||||||
|
|
||||||
|
| Latitude | Site | Elev | Mean Fajr (°) | N | Method |
|
||||||
|
| --- | --- | --- | --- | --- | --- |
|
||||||
|
| 52.5°N | Birmingham, UK | 141m | ~13.0° | 4,018 | Community astrophotography |
|
||||||
|
| 43.7°N | Toronto, Canada | 76m | ~13.2° | 4 | Naked eye |
|
||||||
|
| 41.9°N | Chicago, USA | 182m | ~13.1° | 8 | Naked eye |
|
||||||
|
| 39.9°N | Ankara, Turkey | 890m | ~14.8° | 4 | Naked eye (high elev) |
|
||||||
|
| 36.9°S | Auckland, NZ | 20m | ~14.8° | 2 | Naked eye |
|
||||||
|
| 37.8°S | Melbourne, AU | 31m | ~14.5° | 3 | Naked eye |
|
||||||
|
| 35.7°N | Tehran, Iran | 1191m | ~15.1° | 3 | Naked eye (very high elev) |
|
||||||
|
| 34.0°N | Fez, Morocco | 408m | ~14.2° | 4 | Naked eye |
|
||||||
|
| 33.9°S | Cape Town, SA | 10m | ~15.2° | 4 | Naked eye |
|
||||||
|
| 31.9°N | Amman, Jordan | 1000m | ~14.9° | 3 | Naked eye (high elev) |
|
||||||
|
| 31.0°N | Alexandria, Egypt | 32m | ~13.6° | 3 | SQM |
|
||||||
|
| 30.5°N | Wadi Al Natron | 23m | ~14.0° | 7 | Naked eye (desert) |
|
||||||
|
| 30.0°N | Kottamia, Egypt | 477m | ~14.0° | 6 | Photoelectric (high elev) |
|
||||||
|
| 27.5°N | Hail, Saudi Arabia | 1020m | ~14.4° | 8 | Naked eye (high elev) |
|
||||||
|
| 24.9°N | Karachi, Pakistan | 8m | ~14.8° | 4 | Naked eye |
|
||||||
|
| 14.7°N | Dakar, Senegal | 24m | ~15.3° | 2 | Naked eye |
|
||||||
|
| 12.0°N | Kano, Nigeria | 476m | ~15.1° | 2 | Naked eye |
|
||||||
|
| 10.7°N | Trinidad | 12m | ~15.8° | 2 | Naked eye |
|
||||||
|
| 6.4°S | Depok, Indonesia | 65m | ~16.0° | 3 | SQM |
|
||||||
|
| 3.6°N | Medan, Indonesia | 22m | ~16.5° | 8 | SQM |
|
||||||
|
| 3.1°N | KL, Malaysia | 40m | ~16.7° | 4 | DSLR + SQM |
|
||||||
|
| 4.1°S | Mombasa, Kenya | 50m | ~16.2° | 2 | Naked eye |
|
||||||
|
|
||||||
|
The counter-intuitive result — equatorial sites have *higher* angles than mid-latitude sites —
|
||||||
|
is a consequence of the Sun's steep rise angle at low latitudes. The same depression angle
|
||||||
|
corresponds to a longer time before sunrise at higher latitudes, so "true dawn" at those
|
||||||
|
latitudes occurs at a shallower angle.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. **Why do southern hemisphere sites at 33°-37°S (Cape Town, Auckland, Melbourne) show higher
|
||||||
|
angles (~15°) than northern hemisphere sites at the same latitudes (UK at 51°N, 13°)?**
|
||||||
|
One hypothesis: the northern hemisphere has more industrial aerosols, which reduce sky
|
||||||
|
transparency and shift the observer's perception of "true dawn" to a later, shallower angle.
|
||||||
|
This would bias northern hemisphere data toward lower angles. The effect needs more data to confirm.
|
||||||
|
|
||||||
|
2. **Is the elevation effect physically explained or confounded?**
|
||||||
|
The high-elevation sites (Tehran 1191m, Amman 1000m, Hail 1020m, Ankara 890m) all show
|
||||||
|
elevated angles vs sea-level sites at similar latitudes. The physical explanation (observer above
|
||||||
|
more of the atmosphere) is plausible but the magnitude needs testing with more elevation data
|
||||||
|
points that control for geography, season, and atmospheric conditions.
|
||||||
|
|
||||||
|
3. **Why does Isha (Shafaq Abyad) at ~15° match Fajr at ~13°-16° for most sites?**
|
||||||
|
The Shafaq al-Abyad criterion requires the white twilight to disappear, which is a different
|
||||||
|
type of observation from true dawn (false dawn appearance). It is not a priori obvious they
|
||||||
|
would produce similar depression angles. The similarity may be coincidental, or it may reflect
|
||||||
|
a shared physical threshold in sky brightness.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*[← Data Sources](Data-Sources) · [Home →](Home)*
|
||||||
Loading…
Reference in a new issue