mirror of
https://github.com/acamarata/pray-calc-ml.git
synced 2026-07-01 03:14:27 +00:00
Five wiki pages covering Data Collection, ML Crunching, Architecture, Data Sources, and Research Notes. GitHub Actions workflow syncs .wiki/ to the GitHub Wiki on push to main. Adds .markdownlintignore and VS Code settings to exclude .claude/ from lint checks. Adds .allow-ai-terms to allow the .claude/ directory path reference in lint ignore files.
195 lines
6.5 KiB
Markdown
195 lines
6.5 KiB
Markdown
# Data Collection
|
|
|
|
This page explains how to collect sighting data, run the pipeline, and add new records.
|
|
|
|
---
|
|
|
|
## What data we collect
|
|
|
|
Each record in the dataset represents one confirmed human sighting with:
|
|
|
|
| Field | Description |
|
|
| --- | --- |
|
|
| Date | The calendar date of the sighting (local date) |
|
|
| Location | Latitude, longitude, and elevation in metres |
|
|
| Observed time | The local time at which the sighting occurred |
|
|
| UTC offset | The hours offset from UTC at that date and location |
|
|
|
|
The pipeline converts each record into a solar depression angle by back-calculating the sun's
|
|
position at the UTC moment of the sighting using PyEphem with atmospheric refraction.
|
|
|
|
**Not included:** calculated prayer times, angle guesses, or aggregate statistics. Only records
|
|
where an actual human reported "I saw true dawn at this time on this date at this location."
|
|
|
|
---
|
|
|
|
## Running the pipeline
|
|
|
|
### Prerequisites
|
|
|
|
```bash
|
|
# Python 3.10+
|
|
python -m venv .venv
|
|
source .venv/bin/activate # on Windows: .venv\Scripts\activate
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### Full run (recommended)
|
|
|
|
```bash
|
|
python -m src.pipeline
|
|
```
|
|
|
|
This does three things in sequence:
|
|
|
|
1. **Fetches the OpenFajr iCal feed** from `calendar.google.com` — ~4,018 community-verified
|
|
Fajr records from Birmingham, UK, 2016-2026. Requires network access.
|
|
2. **Loads manually compiled records** from `src/collect/verified_sightings.py` — ~141 records
|
|
from peer-reviewed studies across 35 locations worldwide.
|
|
3. **Looks up missing elevations** via the [Open-Elevation API](https://open-elevation.com) for
|
|
any record where `elevation_m == 0`.
|
|
|
|
Output:
|
|
```
|
|
data/processed/fajr_angles.csv — ~4,105 Fajr records
|
|
data/processed/isha_angles.csv — ~43 Isha records
|
|
```
|
|
|
|
### Without elevation lookup
|
|
|
|
```bash
|
|
python -m src.pipeline --no-elevation-lookup
|
|
```
|
|
|
|
Skips the Open-Elevation API calls. Use this when:
|
|
- You're offline
|
|
- You want faster iteration while adding new records
|
|
- All records in `verified_sightings.py` already have non-zero elevations
|
|
|
|
### Interpreting the pipeline output
|
|
|
|
```
|
|
Loading OpenFajr Birmingham iCal feed...
|
|
4018 Fajr records from OpenFajr
|
|
Loading manually verified sightings...
|
|
141 manually compiled records
|
|
Computing solar depression angles...
|
|
Dropping 11 record(s) with implausible angles (< 7.0° Fajr / < 10.0° Isha):
|
|
FAJR 2021-03-27 ... angle=-18.71° — OpenFajr (openfajr.org)
|
|
...
|
|
|
|
Fajr dataset: 4105 records → data/processed/fajr_angles.csv
|
|
Isha dataset: 43 records → data/processed/isha_angles.csv
|
|
```
|
|
|
|
Records dropped with "implausible angles" are data entry or DST-transition artifacts. The
|
|
quality filter (7° for Fajr, 10° for Isha) removes physically impossible values. All dropped
|
|
records are logged so you can investigate them.
|
|
|
|
---
|
|
|
|
## Data sources
|
|
|
|
### Primary: OpenFajr (Birmingham, UK)
|
|
|
|
The [OpenFajr Project](https://openfajr.org) runs a continuous community astrophotography
|
|
program in Birmingham. A panel of scholars reviews daily sky photos and votes on the moment of
|
|
true dawn. The voted times are published as a public Google Calendar iCal feed.
|
|
|
|
- ~4,018 records, 2016-2026
|
|
- Location: 52.4862°N, 1.8904°W, 141m elevation
|
|
- All times are UTC (Z suffix in iCal)
|
|
- Fetched live by the pipeline — no local cache needed
|
|
|
|
This is the highest-quality source: actual community-reviewed per-date timestamps at a single
|
|
well-documented location. It provides 98% of the Fajr training data.
|
|
|
|
### Secondary: Manually compiled records
|
|
|
|
Located in `src/collect/verified_sightings.py`. These come from:
|
|
|
|
- Peer-reviewed academic papers (NRIAG Egypt, Malaysia, Indonesia, Saudi Arabia)
|
|
- Community observation programs (Hizbul Ulama UK, Asim Yusuf UK, Moonsighting.com)
|
|
- National religious body publications (AFIC Australia, Jordanian Awqaf, etc.)
|
|
|
|
See [Data Sources](Data-Sources) for the full citation table.
|
|
|
|
---
|
|
|
|
## Adding new sighting records
|
|
|
|
Open `src/collect/verified_sightings.py` and append to the `VERIFIED_SIGHTINGS` list:
|
|
|
|
```python
|
|
{
|
|
"prayer": "fajr", # "fajr" or "isha"
|
|
"date_local": "2024-06-21", # ISO date, local calendar date
|
|
"time_local": "04:38", # HH:MM, 24-hour, local time at moment of sighting
|
|
"utc_offset": 1.0, # hours from UTC (e.g. 1.0 for BST, -5.0 for EST, 5.5 for IST)
|
|
"lat": 51.150, # decimal degrees (south = negative)
|
|
"lng": -3.650, # decimal degrees (west = negative)
|
|
"elevation_m": 430.0, # metres above sea level (0 = will be looked up by API)
|
|
"source": "Your citation here",
|
|
"notes": "Any relevant notes about conditions, method, observer count, etc.",
|
|
}
|
|
```
|
|
|
|
### UTC offset tips
|
|
|
|
| Region | UTC offset |
|
|
| --- | --- |
|
|
| UK (BST, summer) | +1.0 |
|
|
| UK (GMT, winter) | 0.0 |
|
|
| Egypt / Eastern Europe (EET) | +2.0 |
|
|
| Egypt / EE (summer, EEST) | +3.0 |
|
|
| Saudi Arabia / Arabia Standard | +3.0 |
|
|
| Iran (IRST) | +3.5 |
|
|
| Iran (IRDT, summer) | +4.5 |
|
|
| UAE / Oman (GST) | +4.0 |
|
|
| Pakistan (PKT) | +5.0 |
|
|
| India / Sri Lanka (IST) | +5.5 |
|
|
| Bangladesh (BST) | +6.0 |
|
|
| Malaysia / Singapore (MYT) | +8.0 |
|
|
| Indonesia West (WIB) | +7.0 |
|
|
| Indonesia East (WIT) | +9.0 |
|
|
| Australia East (AEST, winter) | +10.0 |
|
|
| Australia East (AEDT, summer) | +11.0 |
|
|
| New Zealand (NZST) | +12.0 |
|
|
| New Zealand (NZDT) | +13.0 |
|
|
| US Eastern (EST) | -5.0 |
|
|
| US Eastern (EDT) | -4.0 |
|
|
| US Central (CST) | -6.0 |
|
|
| US Central (CDT) | -5.0 |
|
|
| West Africa (WAT) | +1.0 |
|
|
| East Africa (EAT) | +3.0 |
|
|
| South Africa (SAST) | +2.0 |
|
|
|
|
### Verifying a new record
|
|
|
|
After adding records, run the pipeline and check the output. A correctly entered record should
|
|
produce an angle between 8° and 21° for Fajr, or 11° and 22° for Isha. If the pipeline drops
|
|
your record (angle below the threshold), the time is too close to sunrise/sunset — recheck the
|
|
UTC offset and local time.
|
|
|
|
```bash
|
|
python -m src.pipeline --no-elevation-lookup 2>&1 | grep -A5 "Dropping"
|
|
```
|
|
|
|
---
|
|
|
|
## Priority gaps to fill
|
|
|
|
The Isha dataset is the most critical gap at ~43 records. Fajr has excellent Birmingham coverage
|
|
but needs more geographic diversity:
|
|
|
|
| Gap | What to look for |
|
|
| --- | --- |
|
|
| Isha (all regions) | Shafaq al-Abyad disappearance logs with explicit per-date timestamps |
|
|
| South America | Any Muslim community observation records with coordinates and times |
|
|
| Southeast Asia | Additional Indonesian/Malaysian per-night SQM data files |
|
|
| High latitudes (55°N+) | Scandinavian or northern Canadian observation logs |
|
|
| Sub-Saharan Africa | Observation records from West Africa, East Africa, Southern Africa |
|
|
|
|
---
|
|
|
|
*[← Home](Home) · [ML Crunching →](ML-Crunching)*
|