- Migrate .wiki/ to .github/wiki/ (GCI standard for public repos) - Add _Sidebar.md for GitHub Wiki navigation - Update wiki-sync.yml to reference .github/wiki/ path - Remove .markdownlintignore (covered by .vscode/settings.json) - Migrate .allow-ai-terms to ALLOW_AI_TERMS_REPOS in pre-commit hook - Expand .gitignore with full IDE and AI agent directory list - Update README project structure reference
6.5 KiB
Data Collection
This page explains how to collect sighting data, run the pipeline, and add new records.
What data we collect
Each record in the dataset represents one confirmed human sighting with:
| Field | Description |
|---|---|
| Date | The calendar date of the sighting (local date) |
| Location | Latitude, longitude, and elevation in metres |
| Observed time | The local time at which the sighting occurred |
| UTC offset | The hours offset from UTC at that date and location |
The pipeline converts each record into a solar depression angle by back-calculating the sun's position at the UTC moment of the sighting using PyEphem with atmospheric refraction.
Not included: calculated prayer times, angle guesses, or aggregate statistics. Only records where an actual human reported "I saw true dawn at this time on this date at this location."
Running the pipeline
Prerequisites
# Python 3.10+
python -m venv .venv
source .venv/bin/activate # on Windows: .venv\Scripts\activate
pip install -r requirements.txt
Full run (recommended)
python -m src.pipeline
This does three things in sequence:
- Fetches the OpenFajr iCal feed from
calendar.google.com— ~4,018 community-verified Fajr records from Birmingham, UK, 2016-2026. Requires network access. - Loads manually compiled records from
src/collect/verified_sightings.py— ~141 records from peer-reviewed studies across 35 locations worldwide. - Looks up missing elevations via the Open-Elevation API for
any record where
elevation_m == 0.
Output:
data/processed/fajr_angles.csv — ~4,105 Fajr records
data/processed/isha_angles.csv — ~43 Isha records
Without elevation lookup
python -m src.pipeline --no-elevation-lookup
Skips the Open-Elevation API calls. Use this when:
- You're offline
- You want faster iteration while adding new records
- All records in
verified_sightings.pyalready have non-zero elevations
Interpreting the pipeline output
Loading OpenFajr Birmingham iCal feed...
4018 Fajr records from OpenFajr
Loading manually verified sightings...
141 manually compiled records
Computing solar depression angles...
Dropping 11 record(s) with implausible angles (< 7.0° Fajr / < 10.0° Isha):
FAJR 2021-03-27 ... angle=-18.71° — OpenFajr (openfajr.org)
...
Fajr dataset: 4105 records → data/processed/fajr_angles.csv
Isha dataset: 43 records → data/processed/isha_angles.csv
Records dropped with "implausible angles" are data entry or DST-transition artifacts. The quality filter (7° for Fajr, 10° for Isha) removes physically impossible values. All dropped records are logged so you can investigate them.
Data sources
Primary: OpenFajr (Birmingham, UK)
The OpenFajr Project runs a continuous community astrophotography program in Birmingham. A panel of scholars reviews daily sky photos and votes on the moment of true dawn. The voted times are published as a public Google Calendar iCal feed.
- ~4,018 records, 2016-2026
- Location: 52.4862°N, 1.8904°W, 141m elevation
- All times are UTC (Z suffix in iCal)
- Fetched live by the pipeline — no local cache needed
This is the highest-quality source: actual community-reviewed per-date timestamps at a single well-documented location. It provides 98% of the Fajr training data.
Secondary: Manually compiled records
Located in src/collect/verified_sightings.py. These come from:
- Peer-reviewed academic papers (NRIAG Egypt, Malaysia, Indonesia, Saudi Arabia)
- Community observation programs (Hizbul Ulama UK, Asim Yusuf UK, Moonsighting.com)
- National religious body publications (AFIC Australia, Jordanian Awqaf, etc.)
See Data Sources for the full citation table.
Adding new sighting records
Open src/collect/verified_sightings.py and append to the VERIFIED_SIGHTINGS list:
{
"prayer": "fajr", # "fajr" or "isha"
"date_local": "2024-06-21", # ISO date, local calendar date
"time_local": "04:38", # HH:MM, 24-hour, local time at moment of sighting
"utc_offset": 1.0, # hours from UTC (e.g. 1.0 for BST, -5.0 for EST, 5.5 for IST)
"lat": 51.150, # decimal degrees (south = negative)
"lng": -3.650, # decimal degrees (west = negative)
"elevation_m": 430.0, # metres above sea level (0 = will be looked up by API)
"source": "Your citation here",
"notes": "Any relevant notes about conditions, method, observer count, etc.",
}
UTC offset tips
| Region | UTC offset |
|---|---|
| UK (BST, summer) | +1.0 |
| UK (GMT, winter) | 0.0 |
| Egypt / Eastern Europe (EET) | +2.0 |
| Egypt / EE (summer, EEST) | +3.0 |
| Saudi Arabia / Arabia Standard | +3.0 |
| Iran (IRST) | +3.5 |
| Iran (IRDT, summer) | +4.5 |
| UAE / Oman (GST) | +4.0 |
| Pakistan (PKT) | +5.0 |
| India / Sri Lanka (IST) | +5.5 |
| Bangladesh (BST) | +6.0 |
| Malaysia / Singapore (MYT) | +8.0 |
| Indonesia West (WIB) | +7.0 |
| Indonesia East (WIT) | +9.0 |
| Australia East (AEST, winter) | +10.0 |
| Australia East (AEDT, summer) | +11.0 |
| New Zealand (NZST) | +12.0 |
| New Zealand (NZDT) | +13.0 |
| US Eastern (EST) | -5.0 |
| US Eastern (EDT) | -4.0 |
| US Central (CST) | -6.0 |
| US Central (CDT) | -5.0 |
| West Africa (WAT) | +1.0 |
| East Africa (EAT) | +3.0 |
| South Africa (SAST) | +2.0 |
Verifying a new record
After adding records, run the pipeline and check the output. A correctly entered record should produce an angle between 8° and 21° for Fajr, or 11° and 22° for Isha. If the pipeline drops your record (angle below the threshold), the time is too close to sunrise/sunset — recheck the UTC offset and local time.
python -m src.pipeline --no-elevation-lookup 2>&1 | grep -A5 "Dropping"
Priority gaps to fill
The Isha dataset is the most critical gap at ~43 records. Fajr has excellent Birmingham coverage but needs more geographic diversity:
| Gap | What to look for |
|---|---|
| Isha (all regions) | Shafaq al-Abyad disappearance logs with explicit per-date timestamps |
| South America | Any Muslim community observation records with coordinates and times |
| Southeast Asia | Additional Indonesian/Malaysian per-night SQM data files |
| High latitudes (55°N+) | Scandinavian or northern Canadian observation logs |
| Sub-Saharan Africa | Observation records from West Africa, East Africa, Southern Africa |