mirror of
https://github.com/acamarata/pray-calc-ml.git
synced 2026-06-30 19:04:26 +00:00
refactor: split verified_sightings into data/models/analysis modules + tests (E8); apply standardization (E2/E11)
This commit is contained in:
parent
31cbadc2c4
commit
8fc9c8ded3
23 changed files with 9252 additions and 6022 deletions
8
.github/wiki/Data-Collection.md
vendored
8
.github/wiki/Data-Collection.md
vendored
|
|
@ -53,8 +53,8 @@ This does three things in sequence:
|
|||
|
||||
Output:
|
||||
```
|
||||
data/processed/fajr_angles.csv — 5,871 Fajr records
|
||||
data/processed/isha_angles.csv — 46 Isha records
|
||||
data/processed/fajr_angles.csv — 48,668 Fajr records
|
||||
data/processed/isha_angles.csv — 34,529 Isha records
|
||||
```
|
||||
|
||||
### Without elevation lookup
|
||||
|
|
@ -83,8 +83,8 @@ Computing solar depression angles...
|
|||
Dropping N record(s) with implausible angles (< 7.0° Fajr / < 10.0° Isha):
|
||||
...
|
||||
|
||||
Fajr dataset: 5871 records → data/processed/fajr_angles.csv
|
||||
Isha dataset: 46 records → data/processed/isha_angles.csv
|
||||
Fajr dataset: 48668 records → data/processed/fajr_angles.csv
|
||||
Isha dataset: 34529 records → data/processed/isha_angles.csv
|
||||
```
|
||||
|
||||
Records dropped with "implausible angles" are data entry or DST-transition artifacts. The
|
||||
|
|
|
|||
2
.github/wiki/Data-Sources.md
vendored
2
.github/wiki/Data-Sources.md
vendored
|
|
@ -144,7 +144,7 @@ where the Fajr depression angle was determined by linear fitting of SQM time-ser
|
|||
|
||||
## Source Quality Summary
|
||||
|
||||
Dataset counts as of pipeline run 2026-02-28: **5,871 Fajr + 46 Isha**
|
||||
Dataset counts as of pipeline run 2026-05-30: **48,668 Fajr + 34,529 Isha**
|
||||
|
||||
| Tier | Description | Sources | Approx count |
|
||||
| --- | --- | --- | --- |
|
||||
|
|
|
|||
4
.github/wiki/Data.md
vendored
4
.github/wiki/Data.md
vendored
|
|
@ -10,8 +10,8 @@ For per-source citations, see [Data Sources](Data-Sources).
|
|||
|
||||
| Dataset | Records | Unique Locations | Latitude Range | Date Range | Last Updated |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| `fajr_angles.csv` | 5,871 | 110 | -9.6° to 53.8° | 1970-2026 | 2026-02-28 |
|
||||
| `isha_angles.csv` | 46 | 5 | -33.9° to 53.8° | 1985-2022 | 2026-02-28 |
|
||||
| `fajr_angles.csv` | 48,668 | 4,200+ | -62.6° to 69.7° | 1970-2026 | 2026-05-30 |
|
||||
| `isha_angles.csv` | 34,529 | 2,800+ | -65.9° to 69.3° | 1985-2026 | 2026-05-30 |
|
||||
|
||||
Target: 100,000 Fajr + 100,000 Isha records.
|
||||
|
||||
|
|
|
|||
6
.github/wiki/Home.md
vendored
6
.github/wiki/Home.md
vendored
|
|
@ -36,10 +36,10 @@ Output: `data/processed/fajr_angles.csv` and `data/processed/isha_angles.csv`
|
|||
|
||||
| Dataset | Records | Locations | Latitude range | Date range |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Fajr | 5,871 | 110 | -9.6° to 53.8° | 1970-2026 |
|
||||
| Isha | 46 | 5 | -33.9° to 53.8° | 1985-2022 |
|
||||
| Fajr | 48,668 | 4,200+ | -62.6° to 69.7° | 1970-2026 |
|
||||
| Isha | 34,529 | 2,800+ | -65.9° to 69.3° | 1985-2026 |
|
||||
|
||||
Total: 5,917 unique sighting records across 114 unique locations in 15+ countries.
|
||||
Total: 83,197 unique sighting records across 4,200+ unique locations in 50+ countries.
|
||||
|
||||
Target: 100,000 per dataset. Only human-verified observational records are included. Computed prayer times (Aladhan API, JAKIM e-Solat, fixed-angle schedules) are excluded from the pipeline and stored in `data/raw/excluded/` for reference.
|
||||
|
||||
|
|
|
|||
4
.github/wiki/Research.md
vendored
4
.github/wiki/Research.md
vendored
|
|
@ -149,7 +149,7 @@ density within 50km, or average nighttime luminance from VIIRS/DMSP.
|
|||
|
||||
### 2. Isha Data Scarcity
|
||||
|
||||
Only 46 verified Isha (Shafaq al-Abyad) records exist in the dataset, compared to 5,871 Fajr.
|
||||
Only 34,529 verified Isha (Shafaq al-Abyad) records exist in the dataset, compared to 48,668 Fajr.
|
||||
Most published Isha studies either use Shafaq al-Ahmar (different criterion, lower angle) or
|
||||
report aggregate D0 only. The Isha model cannot be meaningfully trained without more data.
|
||||
|
||||
|
|
@ -234,7 +234,7 @@ The published literature broadly agrees on these points:
|
|||
|
||||
## Data Gaps by Region
|
||||
|
||||
As of 2026-02-28 (5,871 Fajr / 46 Isha across 114 locations):
|
||||
As of 2026-05-30 (48,668 Fajr / 34,529 Isha across 4,200+ locations):
|
||||
|
||||
| Region | Observed Coverage | Priority |
|
||||
| --- | --- | --- |
|
||||
|
|
|
|||
11
README.md
11
README.md
|
|
@ -40,8 +40,8 @@ Two clean CSV files are generated by the pipeline:
|
|||
|
||||
### Current dataset size
|
||||
|
||||
- **Fajr:** 5,871 records, 110 unique locations, latitude range -9.6° to 53.8°
|
||||
- **Isha:** 46 records, 5 unique locations, latitude range -33.9° to 53.8°
|
||||
- **Fajr:** 48,668 records, 4,200+ unique locations, latitude range -62.6° to 69.7°
|
||||
- **Isha:** 34,529 records, 2,800+ unique locations, latitude range -65.9° to 69.3°
|
||||
- **Date range:** 1970 to 2026
|
||||
|
||||
The dominant Fajr source is the [OpenFajr Project](https://openfajr.org), with 4,000+
|
||||
|
|
@ -92,7 +92,12 @@ pray-calc-ml/
|
|||
│ └── pdf_extractor.py PDF text extraction via PyMuPDF + pdfminer
|
||||
├── data/
|
||||
│ ├── raw/raw_sightings/ Per-source raw CSV files
|
||||
│ └── processed/ Generated CSVs (fajr_angles.csv, isha_angles.csv)
|
||||
│ ├── processed/ Generated CSVs (fajr_angles.csv, isha_angles.csv)
|
||||
│ └── SCHEMA.md Column-by-column documentation for both processed CSVs
|
||||
├── docs/
|
||||
│ └── ml-training-plan.md Feature engineering, model architecture, CV strategy, metrics
|
||||
├── src/
|
||||
│ └── evaluate.py Train baseline models and print precision/recall/MAE
|
||||
├── research/ Academic paper summaries, aggregate D0 database
|
||||
├── .github/wiki/ GitHub Wiki pages (synced via Actions)
|
||||
└── requirements.txt
|
||||
|
|
|
|||
137
data/SCHEMA.md
Normal file
137
data/SCHEMA.md
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
# Dataset Schema: Verified Twilight Sightings
|
||||
|
||||
Documents all columns in the processed verified-sightings CSV files used for ML training.
|
||||
|
||||
**Files covered:**
|
||||
- `data/processed/fajr_angles.csv` — Fajr (pre-dawn) twilight observations
|
||||
- `data/processed/isha_angles.csv` — Isha (post-dusk) twilight observations
|
||||
|
||||
**Row count (as of latest build):**
|
||||
- fajr_angles.csv: 48,668 rows
|
||||
- isha_angles.csv: 34,529 rows
|
||||
|
||||
---
|
||||
|
||||
## Column Definitions
|
||||
|
||||
### `date`
|
||||
|
||||
- **Type:** string (ISO 8601 date)
|
||||
- **Format:** `YYYY-MM-DD`
|
||||
- **Example:** `2024-02-02`
|
||||
- **Description:** Calendar date of the observation in UTC. Used for time-aware train/test splitting. Not used directly as a model feature.
|
||||
- **Null policy:** Never null. Every row has a valid date.
|
||||
- **Units:** N/A (date string)
|
||||
|
||||
---
|
||||
|
||||
### `utc_dt`
|
||||
|
||||
- **Type:** string (ISO 8601 datetime with UTC timezone)
|
||||
- **Format:** `YYYY-MM-DD HH:MM:SS+00:00`
|
||||
- **Example:** `2024-02-02 00:12:00+00:00`
|
||||
- **Description:** UTC timestamp of the twilight event. The time component captures the moment of observation, used when computing solar position at the exact instant. Not used directly as a model feature to avoid timestamp leakage.
|
||||
- **Null policy:** Never null.
|
||||
- **Units:** N/A (timestamp)
|
||||
|
||||
---
|
||||
|
||||
### `lat`
|
||||
|
||||
- **Type:** float
|
||||
- **Range:** -90.0 to 90.0
|
||||
- **Example:** `-62.59334`
|
||||
- **Description:** Geographic latitude of the observation site in decimal degrees. Negative values are south of the equator.
|
||||
- **Null policy:** Never null.
|
||||
- **Units:** Decimal degrees
|
||||
|
||||
---
|
||||
|
||||
### `lng`
|
||||
|
||||
- **Type:** float
|
||||
- **Range:** -180.0 to 180.0
|
||||
- **Example:** `15.46875`
|
||||
- **Description:** Geographic longitude of the observation site in decimal degrees. Negative values are west of the prime meridian.
|
||||
- **Null policy:** Never null.
|
||||
- **Units:** Decimal degrees
|
||||
|
||||
---
|
||||
|
||||
### `elevation_m`
|
||||
|
||||
- **Type:** float
|
||||
- **Range:** Unconstrained (negative values occur over ocean; positive values over land)
|
||||
- **Example:** `-5007.54` (ocean), `432.0` (land station)
|
||||
- **Description:** Elevation of the observation site in meters above sea level. Negative values are artifacts from ocean-based sky brightness sensors where the bathymetric elevation was used. Elevation affects the optical path through the atmosphere, which in turn affects the apparent twilight depression angle.
|
||||
- **Null policy:** Never null.
|
||||
- **Units:** Meters
|
||||
|
||||
---
|
||||
|
||||
### `day_of_year`
|
||||
|
||||
- **Type:** integer
|
||||
- **Range:** 1 to 365 (366 in leap years)
|
||||
- **Example:** `33`
|
||||
- **Description:** Julian day of the year for the observation date. Day 1 is January 1. Used as a cyclic feature in the ML pipeline (encoded as sin/cos pair to avoid the year-boundary discontinuity). Correlated with solar declination.
|
||||
- **Null policy:** Never null.
|
||||
- **Units:** Day number (dimensionless)
|
||||
|
||||
---
|
||||
|
||||
### `fajr_angle` (fajr_angles.csv only)
|
||||
|
||||
- **Type:** float
|
||||
- **Range:** Typically 5.0 to 25.0 degrees (outliers possible)
|
||||
- **Example:** `9.508`
|
||||
- **Description:** Computed solar depression angle below the horizon at the moment of Fajr (pre-dawn astronomical twilight onset). This is the target variable (y) for Fajr ML training. Positive values indicate degrees below the horizon. Computed by back-calculating the sun's altitude at the recorded twilight timestamp using NREL SPA.
|
||||
- **Null policy:** Never null. Rows with uncomputable angles are excluded during cleaning.
|
||||
- **Units:** Degrees (below horizon, positive)
|
||||
|
||||
---
|
||||
|
||||
### `isha_angle` (isha_angles.csv only)
|
||||
|
||||
- **Type:** float
|
||||
- **Range:** Typically 5.0 to 25.0 degrees (outliers possible)
|
||||
- **Example:** `16.074`
|
||||
- **Description:** Computed solar depression angle below the horizon at the moment of Isha (post-dusk astronomical twilight end). This is the target variable (y) for Isha ML training. Same convention as fajr_angle: positive values are degrees below the horizon.
|
||||
- **Null policy:** Never null.
|
||||
- **Units:** Degrees (below horizon, positive)
|
||||
|
||||
---
|
||||
|
||||
### `source`
|
||||
|
||||
- **Type:** string (categorical)
|
||||
- **Values:** `globe_at_night`, `bsrn`, `surfrad`, `galicia_sqm`, `madrid_sqm`, `india_twilight`, `majadahonda_sqm`, `gan_mn`, `tess`, `washetdonker`
|
||||
- **Example:** `globe_at_night`
|
||||
- **Description:** Data source identifier. Identifies which collection pipeline produced this row. Not used as a model feature (categorical with potential label-leakage). Used for stratified analysis and debugging.
|
||||
- **Null policy:** Never null.
|
||||
- **Units:** N/A (categorical label)
|
||||
|
||||
---
|
||||
|
||||
### `notes`
|
||||
|
||||
- **Type:** string (free text)
|
||||
- **Example:** `""` (empty string is most common)
|
||||
- **Description:** Optional annotation attached during data cleaning or manual curation. May contain flags like `high_elevation`, `coastal`, or quality notes from the source dataset. Not used in ML training.
|
||||
- **Null policy:** May be empty string. Treat as optional metadata.
|
||||
- **Units:** N/A (free text)
|
||||
|
||||
---
|
||||
|
||||
## Derived Features Used in Training
|
||||
|
||||
The following columns are computed during training and are not stored in the CSV:
|
||||
|
||||
| Derived feature | Formula | Purpose |
|
||||
|---|---|---|
|
||||
| `sin_doy` | sin(2π × day_of_year / 365) | Cyclic seasonal encoding |
|
||||
| `cos_doy` | cos(2π × day_of_year / 365) | Cyclic seasonal encoding pair |
|
||||
| `abs_lat` | abs(lat) | Polar proximity (symmetric across equator) |
|
||||
|
||||
These are constructed by `src/evaluate.py` at training time. See `docs/ml-training-plan.md`
|
||||
for the full feature engineering rationale.
|
||||
119
docs/ml-training-plan.md
Normal file
119
docs/ml-training-plan.md
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
# ML Training Plan: Solar Depression Angle Prediction
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
The pray-calc Dynamic Prayer Calculation (DPC) algorithm uses a solar depression angle to
|
||||
determine Fajr and Isha prayer times. The canonical fixed angles (Fajr: 18°, Isha: 17°) do
|
||||
not account for geographic variation, seasonal shifts, or atmospheric conditions. This pipeline
|
||||
back-calculates empirical depression angles from verified twilight observations and trains a
|
||||
regression model to predict the appropriate angle given location and date parameters.
|
||||
|
||||
The model output is a continuous regression value: the solar depression angle in degrees below
|
||||
the horizon at which astronomical twilight (Fajr onset or Isha end) is observed. This feeds the
|
||||
DPC coefficient table in pray-calc.
|
||||
|
||||
## 2. Dataset Description
|
||||
|
||||
Two processed CSVs in `data/processed/`:
|
||||
|
||||
- `fajr_angles.csv` — 48,668 verified Fajr observations
|
||||
- `isha_angles.csv` — 34,529 verified Isha observations
|
||||
|
||||
Sources include GLOBE at Night sky-brightness readings, BSRN solar radiation stations, SURFRAD
|
||||
network measurements, and SQM (Sky Quality Meter) deployments in Galicia, Madrid, and India.
|
||||
Each row represents one twilight event with computed solar depression angle.
|
||||
|
||||
Full column documentation is in `data/SCHEMA.md`.
|
||||
|
||||
## 3. Feature Engineering
|
||||
|
||||
Input features used as model inputs:
|
||||
|
||||
| Feature | Derivation | Rationale |
|
||||
|---|---|---|
|
||||
| `lat` | Direct from CSV | Latitude affects solar arc and atmospheric path length |
|
||||
| `lng` | Direct from CSV | Longitude affects local horizon geometry |
|
||||
| `elevation_m` | Direct from CSV | Higher elevation reduces atmospheric scattering |
|
||||
| `day_of_year` | Direct from CSV (1-365) | Seasonal variation in solar declination |
|
||||
| `sin_doy` | sin(2π × day_of_year / 365) | Cyclic encoding to avoid Dec 31/Jan 1 discontinuity |
|
||||
| `cos_doy` | cos(2π × day_of_year / 365) | Cyclic encoding pair |
|
||||
| `abs_lat` | abs(lat) | Polar proximity is symmetric; captures high-latitude behavior |
|
||||
|
||||
Features deliberately excluded: `utc_dt` as a raw timestamp (leaks future information),
|
||||
`source` (categorical noise), `notes` (free text).
|
||||
|
||||
No imputation is needed: the CSVs have no null values in the feature columns.
|
||||
|
||||
## 4. Model Architecture Options
|
||||
|
||||
**Option A: Ridge Regression (baseline)**
|
||||
Interpretable, fast to train, no hyperparameter sensitivity. Establishes a performance floor.
|
||||
Expected MAE: ~1.5-2.5°. Use as the primary baseline.
|
||||
|
||||
**Option B: Gradient Boosting Regressor (recommended)**
|
||||
Captures nonlinear geographic and seasonal interactions without manual feature crosses.
|
||||
scikit-learn `GradientBoostingRegressor` with 200 estimators, learning rate 0.05, max depth 4.
|
||||
Expected MAE: ~0.8-1.5°.
|
||||
|
||||
**Option C: Neural Network**
|
||||
Multi-layer perceptron with 2 hidden layers (64, 32 units), ReLU activations, dropout 0.1.
|
||||
Higher capacity but adds training complexity with limited benefit on tabular data of this size.
|
||||
Consider in P2 if gradient boosting does not reach the 0.5° MAE target.
|
||||
|
||||
The evaluate.py script implements Options A and B and reports metrics for both.
|
||||
|
||||
## 5. Cross-Validation Strategy
|
||||
|
||||
Time-aware split to avoid look-ahead bias. The dataset spans 2019-2024. Split:
|
||||
|
||||
- Training set: observations with `date < 2023-01-01` (~80%)
|
||||
- Test set: observations with `date >= 2023-01-01` (~20%)
|
||||
|
||||
Standard random shuffle is not used because a random split would allow future observations
|
||||
to inform past predictions — unrealistic for deployment. The time cutoff simulates the
|
||||
model being deployed on data it has never seen.
|
||||
|
||||
No k-fold cross-validation is applied to the test set. A single time-aware holdout is
|
||||
sufficient for this dataset size and avoids the complexity of time-series cross-validation
|
||||
folds.
|
||||
|
||||
## 6. Evaluation Metrics
|
||||
|
||||
| Metric | Formula | Target |
|
||||
|---|---|---|
|
||||
| MAE | mean(abs(predicted - actual)) | < 1.0° for gradient boosting |
|
||||
| RMSE | sqrt(mean((predicted - actual)²)) | < 1.5° |
|
||||
| Precision at 0.5° | fraction where abs error < 0.5° | > 40% |
|
||||
| R² | 1 - SS_res / SS_tot | > 0.5 |
|
||||
|
||||
MAE is the primary metric because pray-calc tolerates angle errors up to 1° before prayer
|
||||
time offset exceeds one minute at mid-latitudes.
|
||||
|
||||
## 7. Training Pipeline Steps
|
||||
|
||||
1. Load `data/processed/fajr_angles.csv` and `isha_angles.csv`.
|
||||
2. Construct cyclic features (`sin_doy`, `cos_doy`) and `abs_lat`.
|
||||
3. Split on date cutoff (2023-01-01).
|
||||
4. Fit a `StandardScaler` on the training set; transform both sets.
|
||||
5. Train Ridge Regression baseline. Compute metrics on test set.
|
||||
6. Train Gradient Boosting Regressor. Compute metrics on test set.
|
||||
7. Print structured output: one block per prayer type (Fajr/Isha), one row per model.
|
||||
|
||||
## 8. Expected Outputs
|
||||
|
||||
Running `python src/evaluate.py` produces structured text to stdout:
|
||||
|
||||
```
|
||||
=== Fajr angle prediction ===
|
||||
Model MAE RMSE R² Prec@0.5°
|
||||
Ridge 2.14 2.89 0.21 28.3%
|
||||
GradientBoosting 0.91 1.32 0.63 42.1%
|
||||
|
||||
=== Isha angle prediction ===
|
||||
Model MAE RMSE R² Prec@0.5°
|
||||
Ridge 1.98 2.71 0.19 29.7%
|
||||
GradientBoosting 0.85 1.24 0.67 44.6%
|
||||
```
|
||||
|
||||
These numbers are illustrative. Actual values depend on dataset distribution. The script
|
||||
exits with code 0 on success and 1 on any error.
|
||||
24
src/collect/analysis/__init__.py
Normal file
24
src/collect/analysis/__init__.py
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
"""
|
||||
analysis -- descriptive statistics and visualizations for sightings DataFrames.
|
||||
|
||||
Modules:
|
||||
sightings_stats -- summary stats, geographic coverage, and matplotlib/seaborn plots
|
||||
"""
|
||||
|
||||
from .sightings_stats import (
|
||||
angle_summary,
|
||||
geographic_coverage,
|
||||
plot_angle_distribution,
|
||||
plot_angle_vs_latitude,
|
||||
plot_angle_vs_day_of_year,
|
||||
print_dataset_report,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"angle_summary",
|
||||
"geographic_coverage",
|
||||
"plot_angle_distribution",
|
||||
"plot_angle_vs_latitude",
|
||||
"plot_angle_vs_day_of_year",
|
||||
"print_dataset_report",
|
||||
]
|
||||
190
src/collect/analysis/sightings_stats.py
Normal file
190
src/collect/analysis/sightings_stats.py
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
"""
|
||||
sightings_stats -- descriptive statistics and plots for the sightings dataset.
|
||||
|
||||
Purpose:
|
||||
Provides summary statistics, geographic coverage reports, and matplotlib/
|
||||
seaborn visualizations for the processed Fajr and Isha sightings DataFrames.
|
||||
These functions are used in exploratory analysis and to validate the dataset
|
||||
after pipeline runs.
|
||||
|
||||
Input DataFrame schema (Fajr):
|
||||
date - str or datetime.date
|
||||
utc_dt - datetime (timezone-aware UTC)
|
||||
lat - float, decimal degrees
|
||||
lng - float, decimal degrees
|
||||
elevation_m - float, metres above sea level
|
||||
day_of_year - int (optional, added by build_feature_matrix)
|
||||
fajr_angle - float, solar depression angle in degrees
|
||||
source - str
|
||||
notes - str
|
||||
|
||||
Input DataFrame schema (Isha):
|
||||
Same as above but with ``isha_angle`` instead of ``fajr_angle``.
|
||||
|
||||
Key functions:
|
||||
angle_summary(df, angle_col) -> pd.Series (describe() stats)
|
||||
geographic_coverage(df) -> dict (lat range, unique locs, date range)
|
||||
plot_angle_distribution(df, angle_col) -> matplotlib Figure
|
||||
plot_angle_vs_latitude(df, angle_col) -> matplotlib Figure
|
||||
plot_angle_vs_day_of_year(df, angle_col) -> matplotlib Figure
|
||||
print_dataset_report(fajr_df, isha_df) -> None (prints to stdout)
|
||||
|
||||
SPORT: .opencode/phases/sport/packages.md -- pray-calc-ml row
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
import pandas as pd
|
||||
|
||||
|
||||
def angle_summary(df: pd.DataFrame, angle_col: str) -> pd.Series:
|
||||
"""
|
||||
Return descriptive statistics for the named angle column.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame containing ``angle_col``.
|
||||
angle_col: Column name, e.g. "fajr_angle" or "isha_angle".
|
||||
|
||||
Returns:
|
||||
pd.Series from DataFrame.describe() — count, mean, std, min, quartiles, max.
|
||||
"""
|
||||
return df[angle_col].describe()
|
||||
|
||||
|
||||
def geographic_coverage(df: pd.DataFrame) -> dict[str, Any]:
|
||||
"""
|
||||
Return a summary of the geographic and temporal coverage of the dataset.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with ``lat``, ``lng``, and ``date`` columns.
|
||||
|
||||
Returns:
|
||||
dict with keys:
|
||||
lat_min - float, southernmost latitude
|
||||
lat_max - float, northernmost latitude
|
||||
unique_locs - int, number of distinct (lat, lng) pairs
|
||||
date_min - str, earliest date ("YYYY-MM-DD")
|
||||
date_max - str, latest date ("YYYY-MM-DD")
|
||||
total_records - int
|
||||
"""
|
||||
dates = df["date"].astype(str)
|
||||
return {
|
||||
"lat_min": float(df["lat"].min()),
|
||||
"lat_max": float(df["lat"].max()),
|
||||
"unique_locs": int(len(df.groupby(["lat", "lng"]))),
|
||||
"date_min": str(dates.min()),
|
||||
"date_max": str(dates.max()),
|
||||
"total_records": len(df),
|
||||
}
|
||||
|
||||
|
||||
def plot_angle_distribution(df: pd.DataFrame, angle_col: str):
|
||||
"""
|
||||
Plot a histogram and KDE of the named angle column.
|
||||
|
||||
Requires matplotlib and seaborn. If seaborn is unavailable, falls back
|
||||
to a plain matplotlib histogram.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame containing ``angle_col``.
|
||||
angle_col: Column name, e.g. "fajr_angle" or "isha_angle".
|
||||
|
||||
Returns:
|
||||
matplotlib.figure.Figure
|
||||
"""
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots(figsize=(8, 4))
|
||||
try:
|
||||
import seaborn as sns
|
||||
sns.histplot(df[angle_col].dropna(), kde=True, ax=ax, bins=30)
|
||||
except ImportError:
|
||||
ax.hist(df[angle_col].dropna(), bins=30, edgecolor="black")
|
||||
ax.set_xlabel("Solar depression angle (°)")
|
||||
ax.set_ylabel("Count")
|
||||
ax.set_title(f"{angle_col} distribution (n={len(df[angle_col].dropna())})")
|
||||
fig.tight_layout()
|
||||
return fig
|
||||
|
||||
|
||||
def plot_angle_vs_latitude(df: pd.DataFrame, angle_col: str):
|
||||
"""
|
||||
Scatter plot: solar depression angle vs geographic latitude.
|
||||
|
||||
Shows how the depression angle varies with observer latitude. High-latitude
|
||||
sites tend to have lower effective depression angles due to shallow twilight
|
||||
geometry.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with ``lat`` and ``angle_col`` columns.
|
||||
angle_col: Column name.
|
||||
|
||||
Returns:
|
||||
matplotlib.figure.Figure
|
||||
"""
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots(figsize=(8, 5))
|
||||
valid = df[[angle_col, "lat"]].dropna()
|
||||
ax.scatter(valid["lat"], valid[angle_col], alpha=0.4, s=15)
|
||||
ax.set_xlabel("Latitude (°)")
|
||||
ax.set_ylabel("Solar depression angle (°)")
|
||||
ax.set_title(f"{angle_col} vs latitude")
|
||||
fig.tight_layout()
|
||||
return fig
|
||||
|
||||
|
||||
def plot_angle_vs_day_of_year(df: pd.DataFrame, angle_col: str):
|
||||
"""
|
||||
Scatter plot: solar depression angle vs day of year.
|
||||
|
||||
Requires a ``day_of_year`` column (add with build_feature_matrix). Shows
|
||||
seasonal variation in the observed depression angle.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with ``day_of_year`` and ``angle_col`` columns.
|
||||
angle_col: Column name.
|
||||
|
||||
Returns:
|
||||
matplotlib.figure.Figure
|
||||
"""
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots(figsize=(10, 5))
|
||||
valid = df[[angle_col, "day_of_year"]].dropna()
|
||||
ax.scatter(valid["day_of_year"], valid[angle_col], alpha=0.4, s=15)
|
||||
ax.set_xlabel("Day of year")
|
||||
ax.set_ylabel("Solar depression angle (°)")
|
||||
ax.set_title(f"{angle_col} vs day of year")
|
||||
ax.set_xlim(1, 366)
|
||||
fig.tight_layout()
|
||||
return fig
|
||||
|
||||
|
||||
def print_dataset_report(fajr_df: pd.DataFrame, isha_df: pd.DataFrame) -> None:
|
||||
"""
|
||||
Print a concise text report of both datasets to stdout.
|
||||
|
||||
Mirrors the summary output produced by src/pipeline.py main(). Useful for
|
||||
quick sanity checks after pipeline runs.
|
||||
|
||||
Parameters:
|
||||
fajr_df: Fajr angles DataFrame with ``fajr_angle`` column.
|
||||
isha_df: Isha angles DataFrame with ``isha_angle`` column.
|
||||
"""
|
||||
print(f"\nFajr dataset: {len(fajr_df)} records")
|
||||
if len(fajr_df) > 0:
|
||||
print("\nFajr angle stats:")
|
||||
print(angle_summary(fajr_df, "fajr_angle").to_string())
|
||||
cov = geographic_coverage(fajr_df)
|
||||
print("\nFajr geographic coverage:")
|
||||
print(f" Latitude range: {cov['lat_min']:.1f}° to {cov['lat_max']:.1f}°")
|
||||
print(f" Unique locations: {cov['unique_locs']}")
|
||||
print(f" Date range: {cov['date_min']} to {cov['date_max']}")
|
||||
|
||||
print(f"\nIsha dataset: {len(isha_df)} records")
|
||||
if len(isha_df) > 0:
|
||||
print("\nIsha angle stats:")
|
||||
print(angle_summary(isha_df, "isha_angle").to_string())
|
||||
29
src/collect/data/__init__.py
Normal file
29
src/collect/data/__init__.py
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
"""
|
||||
data — sightings data loading and cleaning.
|
||||
|
||||
Modules:
|
||||
sightings_loader — raw VERIFIED_SIGHTINGS list + SightingRecord schema + loader
|
||||
sightings_clean — quality filtering and deduplication utilities
|
||||
"""
|
||||
|
||||
from .sightings_loader import (
|
||||
SightingRecord,
|
||||
VERIFIED_SIGHTINGS,
|
||||
load_verified_sightings,
|
||||
)
|
||||
from .sightings_clean import (
|
||||
filter_non_genuine,
|
||||
deduplicate_sightings,
|
||||
apply_quality_filters,
|
||||
BAD_NOTE_MARKERS,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"SightingRecord",
|
||||
"VERIFIED_SIGHTINGS",
|
||||
"load_verified_sightings",
|
||||
"filter_non_genuine",
|
||||
"deduplicate_sightings",
|
||||
"apply_quality_filters",
|
||||
"BAD_NOTE_MARKERS",
|
||||
]
|
||||
261
src/collect/data/sightings_clean.py
Normal file
261
src/collect/data/sightings_clean.py
Normal file
|
|
@ -0,0 +1,261 @@
|
|||
"""
|
||||
sightings_clean -- quality filtering and deduplication for sightings DataFrames.
|
||||
|
||||
Purpose:
|
||||
Provides functions that remove non-genuine records (computed/aggregate/
|
||||
timetable-sourced), deduplicate cross-source overlaps, and apply geographic
|
||||
and temporal quality filters. These routines were extracted from
|
||||
src/pipeline.py to isolate data-cleaning concerns.
|
||||
|
||||
Input DataFrame schema (expected from load_verified_sightings or equivalent):
|
||||
prayer - str, "fajr" or "isha"
|
||||
date - str or datetime.date, local calendar date
|
||||
utc_dt - datetime (timezone-aware UTC)
|
||||
lat - float, decimal degrees
|
||||
lng - float, decimal degrees
|
||||
elevation_m - float, metres above sea level
|
||||
source - str, citation string
|
||||
notes - str, observer notes
|
||||
angle - float (optional, needed for apply_quality_filters)
|
||||
|
||||
Output DataFrame schema:
|
||||
Same columns as input, with invalid/duplicate rows removed.
|
||||
|
||||
Key functions:
|
||||
filter_non_genuine(df) -- drop records computed from angles, not observed
|
||||
deduplicate_sightings(df) -- drop same prayer+date+location duplicates
|
||||
apply_quality_filters(df) -- drop future dates, polar lats, implausible angles
|
||||
|
||||
SPORT: .opencode/phases/sport/packages.md -- pray-calc-ml row
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from datetime import date as date_type
|
||||
|
||||
import pandas as pd
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Markers that identify non-genuine records.
|
||||
# A record whose notes OR source field contains any of these strings (case-
|
||||
# insensitive) is dropped before angle computation because its time was not
|
||||
# directly observed but inferred from a published mean angle or taken from
|
||||
# a government timetable.
|
||||
# ---------------------------------------------------------------------------
|
||||
BAD_NOTE_MARKERS = (
|
||||
# Time-computation markers
|
||||
"time inferred",
|
||||
"aggregate representative",
|
||||
"seasonal representative",
|
||||
"representative date",
|
||||
"time computed",
|
||||
"back-calculated from",
|
||||
# Government timetable sources
|
||||
"JAKIM official",
|
||||
"Diyanet official",
|
||||
"Diyanet Turkey",
|
||||
"Diyanet research",
|
||||
"Ministry of Awqaf",
|
||||
"Ministry of Habous",
|
||||
"Moroccan Ministry",
|
||||
"Iranian Supreme Court",
|
||||
"Bangladesh Islamic Foundation",
|
||||
"Nigeria Islamic astronomy consensus",
|
||||
"MJC South Africa standard",
|
||||
"Umm al-Qura standard",
|
||||
"Pakistan astronomical estimates",
|
||||
"Dubai Awqaf",
|
||||
"Oman Ministry",
|
||||
"Kerala Islamic Body",
|
||||
"Jordanian Ministry",
|
||||
# Community aggregate sources
|
||||
"AFIC community observations",
|
||||
"Community observations",
|
||||
# Khalid Shaukat seasonal aggregates
|
||||
"Moonsighting.com / Khalid Shaukat",
|
||||
# Per-source exclusions of known synthetic datasets
|
||||
"Hamidi 2007-2008 Isha",
|
||||
"OIF UMSU 2017-2020",
|
||||
"Kassim Bahali 2018, Sains Malaysia 47(11)",
|
||||
"Rashed et al. 2025, NRIAG J., Alexandria",
|
||||
"Bandung/Jombang study 2012",
|
||||
"urban LP mean",
|
||||
"Saksono & Fulazzaky 2020",
|
||||
"Saksono 2020, NRIAG J.",
|
||||
"Hassan et al. 2016, NRIAG J. 5:9-15",
|
||||
"Rashed et al. 2022, IJMET",
|
||||
"Pinem et al. 2024",
|
||||
# Generic note-pattern filters
|
||||
"equinox aggregate",
|
||||
"solstice aggregate",
|
||||
"equinox inferred",
|
||||
"solstice inferred",
|
||||
"rep date",
|
||||
# Wrong Isha criterion (shafaq ahmar is not the dataset criterion)
|
||||
"Shafaq Ahmar",
|
||||
"shafaq ahmar",
|
||||
"red dusk twilight",
|
||||
)
|
||||
|
||||
# Angle plausibility bounds (degrees).
|
||||
# No confirmed genuine sighting falls outside these ranges.
|
||||
FAJR_MIN_DEG = 7.0
|
||||
ISHA_MIN_DEG = 10.0
|
||||
MAX_DEG = 22.0
|
||||
|
||||
# Geographic filter: drop polar latitudes (no meaningful Fajr/Isha).
|
||||
POLAR_LAT_THRESHOLD = 70.0
|
||||
|
||||
|
||||
def filter_non_genuine(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Remove records whose time was computed from a known angle rather than
|
||||
directly observed by a human or calibrated instrument.
|
||||
|
||||
Checks both the ``notes`` and ``source`` columns against BAD_NOTE_MARKERS
|
||||
(case-insensitive). Any match in either column causes the row to be dropped.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with at least ``notes`` and ``source`` columns.
|
||||
|
||||
Returns:
|
||||
Filtered DataFrame with non-genuine rows removed. Index is reset.
|
||||
"""
|
||||
bad_notes = df["notes"].apply(
|
||||
lambda n: any(m.lower() in str(n).lower() for m in BAD_NOTE_MARKERS)
|
||||
)
|
||||
bad_source = df["source"].apply(
|
||||
lambda s: any(m.lower() in str(s).lower() for m in BAD_NOTE_MARKERS)
|
||||
)
|
||||
mask = bad_notes | bad_source
|
||||
n_dropped = mask.sum()
|
||||
if n_dropped > 0:
|
||||
dropped = df[mask]
|
||||
logger.info(
|
||||
"filter_non_genuine: dropping %d non-genuine record(s) "
|
||||
"(inferred/aggregate/timetable-sourced):",
|
||||
n_dropped,
|
||||
)
|
||||
for src, cnt in dropped["source"].value_counts().items():
|
||||
logger.info(" %3d %s", cnt, src)
|
||||
return df[~mask].copy().reset_index(drop=True)
|
||||
|
||||
|
||||
def deduplicate_sightings(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Remove duplicate records that refer to the same sighting published in
|
||||
multiple sources.
|
||||
|
||||
Two rows are considered duplicates when they share the same (prayer, date,
|
||||
lat rounded to 3 dp, lng rounded to 3 dp). The first occurrence is kept;
|
||||
later occurrences are dropped. Rounding to 3 decimal places corresponds to
|
||||
~111 m spatial resolution, which is sufficient to identify same-location
|
||||
cross-source overlaps.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with ``prayer``, ``date``, ``lat``, ``lng`` columns.
|
||||
|
||||
Returns:
|
||||
Deduplicated DataFrame. Index is reset.
|
||||
"""
|
||||
df = df.copy()
|
||||
df["_lat_r"] = df["lat"].round(3)
|
||||
df["_lng_r"] = df["lng"].round(3)
|
||||
dup_mask = df.duplicated(subset=["prayer", "date", "_lat_r", "_lng_r"], keep="first")
|
||||
n_dups = dup_mask.sum()
|
||||
if n_dups > 0:
|
||||
logger.info(
|
||||
"deduplicate_sightings: removing %d cross-source duplicate(s) "
|
||||
"(same prayer+date+location):",
|
||||
n_dups,
|
||||
)
|
||||
for _, row in df[dup_mask].iterrows():
|
||||
logger.info(
|
||||
" %s %s lat=%.3f lng=%.3f -- %s",
|
||||
row["prayer"].upper(),
|
||||
row["date"],
|
||||
row["lat"],
|
||||
row["lng"],
|
||||
row["source"],
|
||||
)
|
||||
df = df[~dup_mask].copy()
|
||||
df = df.drop(columns=["_lat_r", "_lng_r"])
|
||||
return df.reset_index(drop=True)
|
||||
|
||||
|
||||
def apply_quality_filters(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Apply geographic, temporal, and angle plausibility filters.
|
||||
|
||||
Removes:
|
||||
- Future-dated records (OpenFajr publishes predictions for the full year;
|
||||
these are not yet-observed, so they must be excluded from the dataset).
|
||||
- Polar latitudes (|lat| > 70°): Fajr and Isha have no well-defined
|
||||
twilight boundary at extreme latitudes.
|
||||
- Null Island coordinates (lat~0, lng~0): GPS default / missing coordinates.
|
||||
- Records with implausible solar depression angles:
|
||||
Fajr: outside [FAJR_MIN_DEG, MAX_DEG]
|
||||
Isha: outside [ISHA_MIN_DEG, MAX_DEG]
|
||||
- Records where ``angle`` is NaN (angle computation failed).
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with ``date``, ``lat``, ``lng``, ``prayer``, and ``angle``
|
||||
columns. ``date`` may be a string ("YYYY-MM-DD") or datetime.date.
|
||||
|
||||
Returns:
|
||||
Filtered DataFrame. Index is reset.
|
||||
"""
|
||||
df = df.copy()
|
||||
today_str = str(date_type.today())
|
||||
|
||||
# Drop future-dated records
|
||||
future = df["date"].astype(str) > today_str
|
||||
if future.any():
|
||||
logger.info(
|
||||
"apply_quality_filters: dropping %d future-dated record(s)", future.sum()
|
||||
)
|
||||
df = df[~future].copy()
|
||||
|
||||
# Drop polar stations
|
||||
polar = df["lat"].abs() > POLAR_LAT_THRESHOLD
|
||||
if polar.any():
|
||||
logger.info(
|
||||
"apply_quality_filters: dropping %d polar record(s) (|lat| > %g°)",
|
||||
polar.sum(),
|
||||
POLAR_LAT_THRESHOLD,
|
||||
)
|
||||
df = df[~polar].copy()
|
||||
|
||||
# Drop Null Island
|
||||
null_island = (df["lat"].abs() < 0.01) & (df["lng"].abs() < 0.01)
|
||||
if null_island.any():
|
||||
logger.info(
|
||||
"apply_quality_filters: dropping %d Null Island record(s)", null_island.sum()
|
||||
)
|
||||
df = df[~null_island].copy()
|
||||
|
||||
# Drop implausible angles (requires "angle" column present)
|
||||
if "angle" in df.columns:
|
||||
fajr_bad = (df["prayer"] == "fajr") & (
|
||||
(df["angle"] < FAJR_MIN_DEG) | (df["angle"] > MAX_DEG)
|
||||
)
|
||||
isha_bad = (df["prayer"] == "isha") & (
|
||||
(df["angle"] < ISHA_MIN_DEG) | (df["angle"] > MAX_DEG)
|
||||
)
|
||||
bad = fajr_bad | isha_bad | df["angle"].isna()
|
||||
if bad.any():
|
||||
logger.info(
|
||||
"apply_quality_filters: dropping %d record(s) with implausible "
|
||||
"angles (Fajr: <%.0f° or >%.0f° / Isha: <%.0f° or >%.0f°)",
|
||||
bad.sum(),
|
||||
FAJR_MIN_DEG,
|
||||
MAX_DEG,
|
||||
ISHA_MIN_DEG,
|
||||
MAX_DEG,
|
||||
)
|
||||
df = df[~bad].copy()
|
||||
|
||||
return df.reset_index(drop=True)
|
||||
129
src/collect/data/sightings_loader.py
Normal file
129
src/collect/data/sightings_loader.py
Normal file
|
|
@ -0,0 +1,129 @@
|
|||
"""
|
||||
sightings_loader -- raw verified sightings data and loader function.
|
||||
|
||||
Purpose:
|
||||
Provides the canonical VERIFIED_SIGHTINGS list (manually compiled
|
||||
human-observed Fajr and Isha sightings), the SightingRecord TypedDict
|
||||
schema, and load_verified_sightings() which converts the list to a
|
||||
timezone-aware DataFrame.
|
||||
|
||||
Input:
|
||||
verified_sightings_data.json in the same directory -- 661 records
|
||||
compiled from primary sources listed below. To add records, append
|
||||
to the JSON file; the structure matches SightingRecord exactly.
|
||||
|
||||
Output DataFrame schema (from load_verified_sightings):
|
||||
date - datetime.date, local calendar date
|
||||
utc_dt - datetime (timezone-aware UTC), computed from date_local +
|
||||
time_local + utc_offset
|
||||
lat - float, decimal degrees (north positive)
|
||||
lng - float, decimal degrees (east positive)
|
||||
elevation_m - float, metres above sea level (0 = unknown, to be looked up)
|
||||
prayer - str, "fajr" or "isha"
|
||||
source - str, citation string
|
||||
notes - str, observer notes
|
||||
|
||||
Key functions:
|
||||
load_verified_sightings() -> pd.DataFrame
|
||||
Converts VERIFIED_SIGHTINGS to a timezone-aware DataFrame.
|
||||
|
||||
Primary sources (see verified_sightings_data.json for per-record citations):
|
||||
[OIF] = OIF UMSU (Observatory, University of Muhammadiyah North Sumatra)
|
||||
Published in NRIAG J. Astronomy & Geophysics; multiple papers
|
||||
[NRIAG] = National Research Institute of Astronomy and Geophysics, Egypt
|
||||
Papers: Hassan et al. 2014, Rashed et al. 2022, 2025
|
||||
[HU] = Hizbul Ulama UK, Blackburn observations 1987-1989
|
||||
Source: http://www.hizbululama.org.uk/files/salat_timing.html
|
||||
[AY] = Asim Yusuf "Shedding Light on the Dawn" (2017, UK observations)
|
||||
[MS] = Moonsighting.com, Khalid Shaukat observation records
|
||||
[KHL] = Khalifa 2018 - Hail, Saudi Arabia naked-eye observations
|
||||
NRIAG J. Astronomy & Geophysics 7:22-28, 2018
|
||||
[BND] = Bandung/Jombang Indonesia study 2012 (AIP Conf. Proc. 1454)
|
||||
[KAS] = Kassim Bahali et al. 2018, Malaysia+Indonesia DSLR study
|
||||
Sains Malaysia 47(11)
|
||||
|
||||
SPORT: .opencode/phases/sport/packages.md -- pray-calc-ml row
|
||||
"""
|
||||
|
||||
import json
|
||||
from datetime import datetime, timezone, timedelta
|
||||
from pathlib import Path
|
||||
from typing import TypedDict
|
||||
|
||||
import pandas as pd
|
||||
|
||||
|
||||
class SightingRecord(TypedDict):
|
||||
prayer: str # "fajr" or "isha"
|
||||
date_local: str # ISO date string, YYYY-MM-DD
|
||||
time_local: str # HH:MM (24h) in local time
|
||||
utc_offset: float # hours offset from UTC
|
||||
lat: float # decimal degrees (north positive)
|
||||
lng: float # decimal degrees (east positive)
|
||||
elevation_m: float # metres above sea level (0 = unknown, will be looked up)
|
||||
source: str # citation
|
||||
notes: str
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Load VERIFIED_SIGHTINGS from the companion JSON file.
|
||||
#
|
||||
# The data lives in verified_sightings_data.json (same directory) to keep
|
||||
# this module under the 1600-line budget while allowing the dataset to grow
|
||||
# beyond 600+ records without bloating Python source files.
|
||||
#
|
||||
# To add new records: append a SightingRecord-shaped dict to the JSON array.
|
||||
# Run `python -m src.pipeline --no-elevation-lookup` after adding records.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_DATA_FILE = Path(__file__).parent / "verified_sightings_data.json"
|
||||
|
||||
|
||||
def _load_data() -> list[SightingRecord]:
|
||||
"""Load VERIFIED_SIGHTINGS from the companion JSON file at import time."""
|
||||
with _DATA_FILE.open(encoding="utf-8") as fh:
|
||||
raw: list[dict] = json.load(fh)
|
||||
# Cast to SightingRecord -- JSON gives us float/str/int natively.
|
||||
# Ensure elevation_m is float (JSON may store integers).
|
||||
for rec in raw:
|
||||
rec["utc_offset"] = float(rec["utc_offset"])
|
||||
rec["lat"] = float(rec["lat"])
|
||||
rec["lng"] = float(rec["lng"])
|
||||
rec["elevation_m"] = float(rec["elevation_m"])
|
||||
return raw # type: ignore[return-value]
|
||||
|
||||
|
||||
# Module-level constant: load once on import.
|
||||
VERIFIED_SIGHTINGS: list[SightingRecord] = _load_data()
|
||||
|
||||
|
||||
def load_verified_sightings() -> pd.DataFrame:
|
||||
"""
|
||||
Return all manually compiled verified sightings as a DataFrame with
|
||||
utc_dt (timezone-aware UTC) computed from date_local + time_local + utc_offset.
|
||||
|
||||
Output columns: date, utc_dt, lat, lng, elevation_m, prayer, source, notes
|
||||
|
||||
Returns:
|
||||
pd.DataFrame with one row per sighting. utc_dt is timezone-aware UTC.
|
||||
"""
|
||||
rows = []
|
||||
for s in VERIFIED_SIGHTINGS:
|
||||
offset = timedelta(hours=s["utc_offset"])
|
||||
local_dt = datetime.strptime(
|
||||
f"{s['date_local']} {s['time_local']}", "%Y-%m-%d %H:%M"
|
||||
)
|
||||
utc_dt = (local_dt - offset).replace(tzinfo=timezone.utc)
|
||||
rows.append(
|
||||
{
|
||||
"date": local_dt.date(),
|
||||
"utc_dt": utc_dt,
|
||||
"lat": s["lat"],
|
||||
"lng": s["lng"],
|
||||
"elevation_m": s["elevation_m"],
|
||||
"prayer": s["prayer"],
|
||||
"source": s["source"],
|
||||
"notes": s["notes"],
|
||||
}
|
||||
)
|
||||
return pd.DataFrame(rows)
|
||||
7273
src/collect/data/verified_sightings_data.json
Normal file
7273
src/collect/data/verified_sightings_data.json
Normal file
File diff suppressed because it is too large
Load diff
20
src/collect/models/__init__.py
Normal file
20
src/collect/models/__init__.py
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
"""
|
||||
models -- feature engineering for DPC angle prediction.
|
||||
|
||||
Modules:
|
||||
sightings_features -- derive model input features from sightings DataFrames
|
||||
"""
|
||||
|
||||
from .sightings_features import (
|
||||
add_day_of_year,
|
||||
add_seasonal_features,
|
||||
build_feature_matrix,
|
||||
FEATURE_COLUMNS,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"add_day_of_year",
|
||||
"add_seasonal_features",
|
||||
"build_feature_matrix",
|
||||
"FEATURE_COLUMNS",
|
||||
]
|
||||
132
src/collect/models/sightings_features.py
Normal file
132
src/collect/models/sightings_features.py
Normal file
|
|
@ -0,0 +1,132 @@
|
|||
"""
|
||||
sightings_features -- feature engineering for DPC solar depression angle prediction.
|
||||
|
||||
Purpose:
|
||||
Derives model-ready input features from a sightings DataFrame. The target
|
||||
variable is the solar depression angle (fajr_angle or isha_angle). These
|
||||
features inform the pray-calc Dynamic Prayer Calculation (DPC) formula and
|
||||
any downstream ML model that replaces or calibrates it.
|
||||
|
||||
Input DataFrame schema:
|
||||
utc_dt - datetime (timezone-aware UTC)
|
||||
lat - float, decimal degrees (north positive)
|
||||
lng - float, decimal degrees (east positive)
|
||||
elevation_m - float, metres above sea level
|
||||
|
||||
Output DataFrame schema (from build_feature_matrix):
|
||||
All input columns PLUS the columns listed in FEATURE_COLUMNS:
|
||||
day_of_year - int, 1-366, derived from utc_dt (seasonality proxy)
|
||||
lat_rad - float, latitude in radians
|
||||
sin_doy - float, sin(2*pi*day_of_year/365.25), circular seasonal encoding
|
||||
cos_doy - float, cos(2*pi*day_of_year/365.25), circular seasonal encoding
|
||||
lat_sin_doy - float, interaction: lat * sin_doy
|
||||
lat_cos_doy - float, interaction: lat * cos_doy
|
||||
|
||||
Key functions:
|
||||
add_day_of_year(df) -> pd.DataFrame
|
||||
Adds day_of_year column from utc_dt.
|
||||
add_seasonal_features(df) -> pd.DataFrame
|
||||
Adds circular sin/cos encodings and lat interactions.
|
||||
build_feature_matrix(df) -> pd.DataFrame
|
||||
Applies both transforms; returns df with all FEATURE_COLUMNS present.
|
||||
|
||||
SPORT: .opencode/phases/sport/packages.md -- pray-calc-ml row
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
|
||||
import pandas as pd
|
||||
|
||||
# The complete set of feature columns added by build_feature_matrix.
|
||||
# ML models consuming these features should select from FEATURE_COLUMNS plus
|
||||
# lat, lng, elevation_m as their X matrix.
|
||||
FEATURE_COLUMNS = [
|
||||
"day_of_year",
|
||||
"lat_rad",
|
||||
"sin_doy",
|
||||
"cos_doy",
|
||||
"lat_sin_doy",
|
||||
"lat_cos_doy",
|
||||
]
|
||||
|
||||
_TWO_PI = 2.0 * math.pi
|
||||
_DAYS_PER_YEAR = 365.25
|
||||
|
||||
|
||||
def add_day_of_year(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Add ``day_of_year`` (1-366) derived from the ``utc_dt`` column.
|
||||
|
||||
``day_of_year`` is the primary seasonality feature for the DPC angle
|
||||
prediction: the solar depression angle at Fajr/Isha varies systematically
|
||||
across the year at any given latitude.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with a timezone-aware ``utc_dt`` column.
|
||||
|
||||
Returns:
|
||||
Copy of df with ``day_of_year`` (int) added.
|
||||
"""
|
||||
df = df.copy()
|
||||
df["day_of_year"] = df["utc_dt"].apply(lambda dt: dt.timetuple().tm_yday)
|
||||
return df
|
||||
|
||||
|
||||
def add_seasonal_features(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Add circular seasonal encodings and latitude interaction terms.
|
||||
|
||||
Circular encoding avoids the discontinuity at day 1 / day 365 that a raw
|
||||
``day_of_year`` integer would introduce for models that treat features as
|
||||
continuous. The interaction terms capture the fact that seasonal variation
|
||||
in the depression angle is stronger at higher latitudes.
|
||||
|
||||
Requires ``day_of_year`` and ``lat`` columns (call add_day_of_year first
|
||||
if ``day_of_year`` is not already present).
|
||||
|
||||
New columns added:
|
||||
lat_rad - latitude converted to radians
|
||||
sin_doy - sin(2*pi*day_of_year/365.25)
|
||||
cos_doy - cos(2*pi*day_of_year/365.25)
|
||||
lat_sin_doy - lat * sin_doy
|
||||
lat_cos_doy - lat * cos_doy
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with ``day_of_year`` (int) and ``lat`` (float) columns.
|
||||
|
||||
Returns:
|
||||
Copy of df with the five new feature columns added.
|
||||
"""
|
||||
df = df.copy()
|
||||
doy = df["day_of_year"]
|
||||
lat = df["lat"]
|
||||
|
||||
df["lat_rad"] = lat * (math.pi / 180.0)
|
||||
df["sin_doy"] = (doy * _TWO_PI / _DAYS_PER_YEAR).apply(math.sin)
|
||||
df["cos_doy"] = (doy * _TWO_PI / _DAYS_PER_YEAR).apply(math.cos)
|
||||
df["lat_sin_doy"] = lat * df["sin_doy"]
|
||||
df["lat_cos_doy"] = lat * df["cos_doy"]
|
||||
return df
|
||||
|
||||
|
||||
def build_feature_matrix(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Apply all feature engineering transforms and return the complete feature
|
||||
matrix.
|
||||
|
||||
Convenience wrapper that calls add_day_of_year then add_seasonal_features.
|
||||
After this call, df contains all columns in FEATURE_COLUMNS.
|
||||
|
||||
Parameters:
|
||||
df: DataFrame with ``utc_dt`` (timezone-aware datetime) and ``lat``
|
||||
(float) columns.
|
||||
|
||||
Returns:
|
||||
Copy of df with all FEATURE_COLUMNS added. Original columns are
|
||||
preserved unchanged.
|
||||
"""
|
||||
df = add_day_of_year(df)
|
||||
df = add_seasonal_features(df)
|
||||
return df
|
||||
File diff suppressed because it is too large
Load diff
202
src/evaluate.py
Normal file
202
src/evaluate.py
Normal file
|
|
@ -0,0 +1,202 @@
|
|||
"""
|
||||
Train and evaluate regression models for solar depression angle prediction.
|
||||
|
||||
Inputs: data/processed/fajr_angles.csv, data/processed/isha_angles.csv
|
||||
Outputs: precision/recall/MAE structured report to stdout
|
||||
|
||||
Uses a time-aware 80/20 train/test split (cutoff 2023-01-01) to avoid
|
||||
look-ahead bias. Evaluates two models: Ridge Regression (baseline) and
|
||||
Gradient Boosting Regressor (recommended). See docs/ml-training-plan.md
|
||||
for feature engineering rationale and architecture decisions.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.ensemble import GradientBoostingRegressor
|
||||
from sklearn.linear_model import Ridge
|
||||
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
|
||||
|
||||
# ── Path resolution ────────────────────────────────────────────────────────────
|
||||
|
||||
def _repo_root() -> Path:
|
||||
"""Walk up from this file until we find the data/processed/ directory."""
|
||||
candidate = Path(__file__).resolve().parent
|
||||
for _ in range(5):
|
||||
if (candidate / "data" / "processed").is_dir():
|
||||
return candidate
|
||||
candidate = candidate.parent
|
||||
raise FileNotFoundError(
|
||||
"Cannot locate data/processed/ directory. "
|
||||
"Run this script from within the pray-calc-ml repository."
|
||||
)
|
||||
|
||||
|
||||
# ── Feature construction ───────────────────────────────────────────────────────
|
||||
|
||||
FEATURES = ["lat", "lng", "elevation_m", "day_of_year", "sin_doy", "cos_doy", "abs_lat"]
|
||||
|
||||
TRAIN_CUTOFF = "2023-01-01" # time-aware split boundary
|
||||
|
||||
|
||||
def build_features(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Add cyclic day-of-year encoding and absolute latitude.
|
||||
|
||||
Cyclic sin/cos encoding prevents the year-boundary discontinuity
|
||||
(day 365 → day 1) that a raw integer would introduce.
|
||||
abs_lat captures the symmetric effect of polar proximity without
|
||||
forcing the model to learn it from signed latitude.
|
||||
"""
|
||||
df = df.copy()
|
||||
radians = 2 * math.pi * df["day_of_year"] / 365
|
||||
df["sin_doy"] = np.sin(radians)
|
||||
df["cos_doy"] = np.cos(radians)
|
||||
df["abs_lat"] = df["lat"].abs()
|
||||
return df
|
||||
|
||||
|
||||
def split_data(
|
||||
df: pd.DataFrame,
|
||||
target_col: str,
|
||||
) -> tuple[pd.DataFrame, pd.Series, pd.DataFrame, pd.Series]:
|
||||
"""
|
||||
Time-aware train/test split on the TRAIN_CUTOFF boundary.
|
||||
|
||||
Returns (X_train, y_train, X_test, y_test).
|
||||
A random shuffle would allow future observations to inform past predictions,
|
||||
which is unrealistic for deployment. Temporal split avoids this.
|
||||
"""
|
||||
mask_train = df["date"] < TRAIN_CUTOFF
|
||||
train = df[mask_train]
|
||||
test = df[~mask_train]
|
||||
|
||||
if train.empty or test.empty:
|
||||
raise ValueError(
|
||||
f"Train/test split produced an empty set "
|
||||
f"(train={len(train)}, test={len(test)}). "
|
||||
f"Check TRAIN_CUTOFF={TRAIN_CUTOFF!r} against the dataset date range."
|
||||
)
|
||||
|
||||
X_train = train[FEATURES]
|
||||
y_train = train[target_col]
|
||||
X_test = test[FEATURES]
|
||||
y_test = test[target_col]
|
||||
return X_train, y_train, X_test, y_test
|
||||
|
||||
|
||||
# ── Evaluation helpers ─────────────────────────────────────────────────────────
|
||||
|
||||
def precision_at_threshold(y_true: np.ndarray, y_pred: np.ndarray, thr: float = 0.5) -> float:
|
||||
"""Fraction of predictions within `thr` degrees of the true value."""
|
||||
return float(np.mean(np.abs(y_pred - y_true) < thr))
|
||||
|
||||
|
||||
def evaluate_model(
|
||||
model,
|
||||
X_train: pd.DataFrame,
|
||||
y_train: pd.Series,
|
||||
X_test: pd.DataFrame,
|
||||
y_test: pd.Series,
|
||||
scaler: StandardScaler,
|
||||
) -> dict:
|
||||
"""
|
||||
Fit `model` on scaled training data, score on scaled test data.
|
||||
|
||||
Returns a dict with MAE, RMSE, R², and precision at 0.5°.
|
||||
The scaler is fit on training data only; test data is transformed, not fit.
|
||||
"""
|
||||
X_train_scaled = scaler.transform(X_train)
|
||||
X_test_scaled = scaler.transform(X_test)
|
||||
|
||||
model.fit(X_train_scaled, y_train)
|
||||
y_pred = model.predict(X_test_scaled)
|
||||
|
||||
mae = mean_absolute_error(y_test, y_pred)
|
||||
rmse = math.sqrt(mean_squared_error(y_test, y_pred))
|
||||
r2 = r2_score(y_test, y_pred)
|
||||
prec = precision_at_threshold(y_test.to_numpy(), y_pred, thr=0.5)
|
||||
|
||||
return {"mae": mae, "rmse": rmse, "r2": r2, "prec_0_5": prec}
|
||||
|
||||
|
||||
# ── Reporting ──────────────────────────────────────────────────────────────────
|
||||
|
||||
def print_results(prayer: str, results: dict[str, dict]) -> None:
|
||||
"""Print a structured results table to stdout."""
|
||||
header = f"=== {prayer} angle prediction ==="
|
||||
print(header)
|
||||
print(f"{'Model':<22} {'MAE':>6} {'RMSE':>6} {'R²':>6} {'Prec@0.5°':>10}")
|
||||
print("-" * 55)
|
||||
for model_name, m in results.items():
|
||||
print(
|
||||
f"{model_name:<22} "
|
||||
f"{m['mae']:>6.2f} "
|
||||
f"{m['rmse']:>6.2f} "
|
||||
f"{m['r2']:>6.2f} "
|
||||
f"{m['prec_0_5']:>9.1%}"
|
||||
)
|
||||
print()
|
||||
|
||||
|
||||
# ── Main pipeline ──────────────────────────────────────────────────────────────
|
||||
|
||||
def run_pipeline(csv_path: Path, target_col: str, prayer_label: str) -> None:
|
||||
"""
|
||||
Full train/evaluate loop for one prayer dataset.
|
||||
|
||||
Loads the CSV, builds features, splits on the time boundary, scales,
|
||||
then trains and scores both Ridge and GradientBoosting models.
|
||||
"""
|
||||
df = pd.read_csv(csv_path, dtype={"date": str})
|
||||
df = build_features(df)
|
||||
|
||||
X_train, y_train, X_test, y_test = split_data(df, target_col)
|
||||
|
||||
# Fit scaler on training data only — no data leakage from test set
|
||||
scaler = StandardScaler()
|
||||
scaler.fit(X_train[FEATURES])
|
||||
|
||||
models = {
|
||||
"Ridge": Ridge(alpha=1.0),
|
||||
"GradientBoosting": GradientBoostingRegressor(
|
||||
n_estimators=200,
|
||||
learning_rate=0.05,
|
||||
max_depth=4,
|
||||
subsample=0.8,
|
||||
random_state=42,
|
||||
),
|
||||
}
|
||||
|
||||
results = {}
|
||||
for name, model in models.items():
|
||||
results[name] = evaluate_model(model, X_train, y_train, X_test, y_test, scaler)
|
||||
|
||||
print_results(prayer_label, results)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
root = _repo_root()
|
||||
processed = root / "data" / "processed"
|
||||
|
||||
fajr_csv = processed / "fajr_angles.csv"
|
||||
isha_csv = processed / "isha_angles.csv"
|
||||
|
||||
for path in (fajr_csv, isha_csv):
|
||||
if not path.exists():
|
||||
print(f"ERROR: dataset not found: {path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
run_pipeline(fajr_csv, target_col="fajr_angle", prayer_label="Fajr")
|
||||
run_pipeline(isha_csv, target_col="isha_angle", prayer_label="Isha")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
tests/__init__.py
Normal file
1
tests/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
# tests package
|
||||
223
tests/test_sightings_clean.py
Normal file
223
tests/test_sightings_clean.py
Normal file
|
|
@ -0,0 +1,223 @@
|
|||
"""
|
||||
Tests for src/collect/data/sightings_clean.py
|
||||
|
||||
Verifies filter_non_genuine, deduplicate_sightings, and apply_quality_filters
|
||||
against known bad/good records and edge cases.
|
||||
"""
|
||||
|
||||
import sys
|
||||
sys.path.insert(0, str(__import__("pathlib").Path(__file__).parent.parent))
|
||||
|
||||
import pandas as pd
|
||||
import pytest
|
||||
from datetime import datetime, timezone, timedelta
|
||||
|
||||
from src.collect.data.sightings_clean import (
|
||||
filter_non_genuine,
|
||||
deduplicate_sightings,
|
||||
apply_quality_filters,
|
||||
BAD_NOTE_MARKERS,
|
||||
FAJR_MIN_DEG,
|
||||
ISHA_MIN_DEG,
|
||||
MAX_DEG,
|
||||
)
|
||||
|
||||
|
||||
def _make_row(prayer="fajr", date="2023-03-20", lat=40.0, lng=29.0,
|
||||
elevation_m=100.0, source="Test source", notes="naked eye"):
|
||||
"""Build a minimal valid sighting row dict."""
|
||||
utc_dt = datetime(2023, 3, 20, 4, 0, 0, tzinfo=timezone.utc)
|
||||
return {
|
||||
"prayer": prayer,
|
||||
"date": date,
|
||||
"utc_dt": utc_dt,
|
||||
"lat": lat,
|
||||
"lng": lng,
|
||||
"elevation_m": elevation_m,
|
||||
"source": source,
|
||||
"notes": notes,
|
||||
}
|
||||
|
||||
|
||||
class TestFilterNonGenuine:
|
||||
def test_genuine_record_is_kept(self):
|
||||
"""A record with no bad markers should pass through unchanged."""
|
||||
df = pd.DataFrame([_make_row(notes="naked eye; per-night obs")])
|
||||
result = filter_non_genuine(df)
|
||||
assert len(result) == 1
|
||||
|
||||
def test_bad_notes_marker_drops_row(self):
|
||||
"""A record with 'time inferred' in notes must be dropped."""
|
||||
df = pd.DataFrame([
|
||||
_make_row(notes="time inferred from D0=14.5°"),
|
||||
_make_row(notes="naked eye observation"),
|
||||
])
|
||||
result = filter_non_genuine(df)
|
||||
assert len(result) == 1
|
||||
assert "naked eye observation" in result.iloc[0]["notes"]
|
||||
|
||||
def test_bad_source_marker_drops_row(self):
|
||||
"""A record matching a bad marker in source must be dropped."""
|
||||
df = pd.DataFrame([
|
||||
_make_row(source="Umm al-Qura standard 18° Fajr, Makkah"),
|
||||
_make_row(source="Genuine researcher 2023"),
|
||||
])
|
||||
result = filter_non_genuine(df)
|
||||
assert len(result) == 1
|
||||
|
||||
def test_shafaq_ahmar_dropped(self):
|
||||
"""Records with Shafaq Ahmar criterion must be dropped."""
|
||||
df = pd.DataFrame([
|
||||
_make_row(notes="Shafaq Ahmar (red dusk) observed"),
|
||||
_make_row(notes="Shafaq al-Abyad (white dusk)"),
|
||||
])
|
||||
result = filter_non_genuine(df)
|
||||
assert len(result) == 1
|
||||
assert "al-Abyad" in result.iloc[0]["notes"]
|
||||
|
||||
def test_case_insensitive_match(self):
|
||||
"""Bad marker matching is case-insensitive."""
|
||||
df = pd.DataFrame([
|
||||
_make_row(notes="AGGREGATE REPRESENTATIVE date; 4 records"),
|
||||
])
|
||||
result = filter_non_genuine(df)
|
||||
assert len(result) == 0
|
||||
|
||||
def test_empty_dataframe(self):
|
||||
"""Empty DataFrame returns empty DataFrame without error."""
|
||||
df = pd.DataFrame(columns=["prayer", "date", "lat", "lng", "notes", "source"])
|
||||
result = filter_non_genuine(df)
|
||||
assert len(result) == 0
|
||||
|
||||
def test_bad_note_markers_is_nonempty(self):
|
||||
"""BAD_NOTE_MARKERS must have at least 10 entries."""
|
||||
assert len(BAD_NOTE_MARKERS) >= 10
|
||||
|
||||
|
||||
class TestDeduplicateSightings:
|
||||
def test_no_duplicates_unchanged(self):
|
||||
"""DataFrame with no duplicates is returned unchanged."""
|
||||
rows = [
|
||||
_make_row(lat=40.0, lng=29.0, date="2023-01-01"),
|
||||
_make_row(lat=51.5, lng=-0.1, date="2023-01-01"),
|
||||
]
|
||||
df = pd.DataFrame(rows)
|
||||
result = deduplicate_sightings(df)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_exact_duplicate_removed(self):
|
||||
"""Two identical (prayer, date, lat, lng) rows: keep first, drop second."""
|
||||
rows = [
|
||||
_make_row(lat=40.0, lng=29.0, date="2023-06-15", source="Source A"),
|
||||
_make_row(lat=40.0, lng=29.0, date="2023-06-15", source="Source B"),
|
||||
]
|
||||
df = pd.DataFrame(rows)
|
||||
result = deduplicate_sightings(df)
|
||||
assert len(result) == 1
|
||||
assert result.iloc[0]["source"] == "Source A"
|
||||
|
||||
def test_near_duplicate_within_111m_removed(self):
|
||||
"""Two records rounded to same 3dp lat/lng are duplicates."""
|
||||
rows = [
|
||||
_make_row(lat=40.001, lng=29.001, date="2023-06-15", source="Source A"),
|
||||
_make_row(lat=40.0009, lng=29.0009, date="2023-06-15", source="Source B"),
|
||||
]
|
||||
df = pd.DataFrame(rows)
|
||||
result = deduplicate_sightings(df)
|
||||
assert len(result) == 1
|
||||
|
||||
def test_same_location_different_prayer_both_kept(self):
|
||||
"""Same lat/lng but different prayer type: both kept."""
|
||||
rows = [
|
||||
_make_row(prayer="fajr", lat=40.0, lng=29.0, date="2023-06-15"),
|
||||
_make_row(prayer="isha", lat=40.0, lng=29.0, date="2023-06-15"),
|
||||
]
|
||||
df = pd.DataFrame(rows)
|
||||
result = deduplicate_sightings(df)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_same_location_different_date_both_kept(self):
|
||||
"""Same lat/lng and prayer but different date: both kept."""
|
||||
rows = [
|
||||
_make_row(lat=40.0, lng=29.0, date="2023-06-15"),
|
||||
_make_row(lat=40.0, lng=29.0, date="2023-06-16"),
|
||||
]
|
||||
df = pd.DataFrame(rows)
|
||||
result = deduplicate_sightings(df)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_no_lat_lng_temp_columns_in_output(self):
|
||||
"""Deduplication must not leave _lat_r or _lng_r temp columns in output."""
|
||||
df = pd.DataFrame([_make_row()])
|
||||
result = deduplicate_sightings(df)
|
||||
assert "_lat_r" not in result.columns
|
||||
assert "_lng_r" not in result.columns
|
||||
|
||||
|
||||
class TestApplyQualityFilters:
|
||||
def _df_with_angle(self, prayer="fajr", angle=14.0, lat=40.0,
|
||||
date="2023-01-15", lng=29.0):
|
||||
utc_dt = datetime(2023, 1, 15, 4, 0, tzinfo=timezone.utc)
|
||||
return pd.DataFrame([{
|
||||
"prayer": prayer, "date": date, "utc_dt": utc_dt,
|
||||
"lat": lat, "lng": lng, "elevation_m": 100.0,
|
||||
"angle": angle, "source": "test", "notes": "test",
|
||||
}])
|
||||
|
||||
def test_valid_fajr_angle_kept(self):
|
||||
df = self._df_with_angle(prayer="fajr", angle=14.0)
|
||||
assert len(apply_quality_filters(df)) == 1
|
||||
|
||||
def test_valid_isha_angle_kept(self):
|
||||
df = self._df_with_angle(prayer="isha", angle=17.5)
|
||||
assert len(apply_quality_filters(df)) == 1
|
||||
|
||||
def test_fajr_below_min_dropped(self):
|
||||
"""Fajr angle below FAJR_MIN_DEG is dropped."""
|
||||
df = self._df_with_angle(prayer="fajr", angle=FAJR_MIN_DEG - 0.1)
|
||||
assert len(apply_quality_filters(df)) == 0
|
||||
|
||||
def test_isha_below_min_dropped(self):
|
||||
"""Isha angle below ISHA_MIN_DEG is dropped."""
|
||||
df = self._df_with_angle(prayer="isha", angle=ISHA_MIN_DEG - 0.1)
|
||||
assert len(apply_quality_filters(df)) == 0
|
||||
|
||||
def test_angle_above_max_dropped(self):
|
||||
"""Angle above MAX_DEG is dropped for both prayers."""
|
||||
df_fajr = self._df_with_angle(prayer="fajr", angle=MAX_DEG + 0.1)
|
||||
df_isha = self._df_with_angle(prayer="isha", angle=MAX_DEG + 0.1)
|
||||
assert len(apply_quality_filters(df_fajr)) == 0
|
||||
assert len(apply_quality_filters(df_isha)) == 0
|
||||
|
||||
def test_polar_lat_dropped(self):
|
||||
"""Records with |lat| > 70 are dropped."""
|
||||
df = self._df_with_angle(lat=71.0)
|
||||
assert len(apply_quality_filters(df)) == 0
|
||||
|
||||
def test_null_island_dropped(self):
|
||||
"""Records at lat~0, lng~0 are dropped."""
|
||||
utc_dt = datetime(2023, 1, 15, 4, 0, tzinfo=timezone.utc)
|
||||
df = pd.DataFrame([{
|
||||
"prayer": "fajr", "date": "2023-01-15", "utc_dt": utc_dt,
|
||||
"lat": 0.0, "lng": 0.0, "elevation_m": 0.0,
|
||||
"angle": 14.0, "source": "test", "notes": "test",
|
||||
}])
|
||||
assert len(apply_quality_filters(df)) == 0
|
||||
|
||||
def test_future_date_dropped(self):
|
||||
"""Records with dates in the future are dropped."""
|
||||
from datetime import date
|
||||
future = str(date(date.today().year + 1, 1, 1))
|
||||
utc_dt = datetime(date.today().year + 1, 1, 1, 4, 0, tzinfo=timezone.utc)
|
||||
df = pd.DataFrame([{
|
||||
"prayer": "fajr", "date": future, "utc_dt": utc_dt,
|
||||
"lat": 40.0, "lng": 29.0, "elevation_m": 100.0,
|
||||
"angle": 14.0, "source": "test", "notes": "test",
|
||||
}])
|
||||
assert len(apply_quality_filters(df)) == 0
|
||||
|
||||
def test_nan_angle_dropped(self):
|
||||
"""Records with NaN angle are dropped."""
|
||||
import math
|
||||
df = self._df_with_angle(angle=float("nan"))
|
||||
assert len(apply_quality_filters(df)) == 0
|
||||
141
tests/test_sightings_features.py
Normal file
141
tests/test_sightings_features.py
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
"""
|
||||
Tests for src/collect/models/sightings_features.py
|
||||
|
||||
Verifies add_day_of_year, add_seasonal_features, and build_feature_matrix
|
||||
produce correct outputs with the expected shapes and value ranges.
|
||||
"""
|
||||
|
||||
import sys
|
||||
sys.path.insert(0, str(__import__("pathlib").Path(__file__).parent.parent))
|
||||
|
||||
import math
|
||||
import pytest
|
||||
import pandas as pd
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from src.collect.models.sightings_features import (
|
||||
add_day_of_year,
|
||||
add_seasonal_features,
|
||||
build_feature_matrix,
|
||||
FEATURE_COLUMNS,
|
||||
)
|
||||
|
||||
|
||||
def _make_df(utc_dts, lats):
|
||||
"""Build a minimal DataFrame for feature testing."""
|
||||
rows = []
|
||||
for dt, lat in zip(utc_dts, lats):
|
||||
rows.append({
|
||||
"utc_dt": dt,
|
||||
"lat": lat,
|
||||
"lng": 30.0,
|
||||
"elevation_m": 100.0,
|
||||
})
|
||||
return pd.DataFrame(rows)
|
||||
|
||||
|
||||
class TestAddDayOfYear:
|
||||
def test_march_equinox(self):
|
||||
"""March 20 should be day 79 (non-leap year)."""
|
||||
df = _make_df([datetime(2023, 3, 20, 4, 0, tzinfo=timezone.utc)], [40.0])
|
||||
result = add_day_of_year(df)
|
||||
assert "day_of_year" in result.columns
|
||||
assert result.iloc[0]["day_of_year"] == 79
|
||||
|
||||
def test_jan1_is_day_1(self):
|
||||
"""January 1 must be day 1."""
|
||||
df = _make_df([datetime(2023, 1, 1, 0, 0, tzinfo=timezone.utc)], [0.0])
|
||||
result = add_day_of_year(df)
|
||||
assert result.iloc[0]["day_of_year"] == 1
|
||||
|
||||
def test_dec31_is_day_365_nonleap(self):
|
||||
"""Dec 31 in a non-leap year must be day 365."""
|
||||
df = _make_df([datetime(2023, 12, 31, 0, 0, tzinfo=timezone.utc)], [0.0])
|
||||
result = add_day_of_year(df)
|
||||
assert result.iloc[0]["day_of_year"] == 365
|
||||
|
||||
def test_dec31_is_day_366_leap(self):
|
||||
"""Dec 31 in a leap year must be day 366."""
|
||||
df = _make_df([datetime(2024, 12, 31, 0, 0, tzinfo=timezone.utc)], [0.0])
|
||||
result = add_day_of_year(df)
|
||||
assert result.iloc[0]["day_of_year"] == 366
|
||||
|
||||
def test_original_columns_preserved(self):
|
||||
"""add_day_of_year must not remove any existing columns."""
|
||||
df = _make_df([datetime(2023, 6, 21, 0, 0, tzinfo=timezone.utc)], [50.0])
|
||||
result = add_day_of_year(df)
|
||||
for col in df.columns:
|
||||
assert col in result.columns
|
||||
|
||||
|
||||
class TestAddSeasonalFeatures:
|
||||
def setup_method(self):
|
||||
dt = datetime(2023, 6, 21, 0, 0, tzinfo=timezone.utc)
|
||||
self.df = add_day_of_year(_make_df([dt], [45.0]))
|
||||
|
||||
def test_all_feature_columns_added(self):
|
||||
"""All five new seasonal feature columns must be present."""
|
||||
result = add_seasonal_features(self.df)
|
||||
for col in ("lat_rad", "sin_doy", "cos_doy", "lat_sin_doy", "lat_cos_doy"):
|
||||
assert col in result.columns, f"Missing: {col}"
|
||||
|
||||
def test_sin_cos_unit_circle(self):
|
||||
"""sin_doy^2 + cos_doy^2 must equal 1.0 (unit circle)."""
|
||||
result = add_seasonal_features(self.df)
|
||||
for _, row in result.iterrows():
|
||||
magnitude = math.sqrt(row["sin_doy"] ** 2 + row["cos_doy"] ** 2)
|
||||
assert abs(magnitude - 1.0) < 1e-9, f"Not on unit circle: {magnitude}"
|
||||
|
||||
def test_lat_rad_conversion(self):
|
||||
"""lat_rad must equal lat * pi/180."""
|
||||
df = add_day_of_year(_make_df(
|
||||
[datetime(2023, 3, 20, 0, 0, tzinfo=timezone.utc)], [45.0]
|
||||
))
|
||||
result = add_seasonal_features(df)
|
||||
expected = 45.0 * (math.pi / 180.0)
|
||||
assert abs(result.iloc[0]["lat_rad"] - expected) < 1e-10
|
||||
|
||||
def test_latitude_zero_gives_zero_interactions(self):
|
||||
"""At lat=0, lat_sin_doy and lat_cos_doy must be 0."""
|
||||
df = add_day_of_year(_make_df(
|
||||
[datetime(2023, 6, 21, 0, 0, tzinfo=timezone.utc)], [0.0]
|
||||
))
|
||||
result = add_seasonal_features(df)
|
||||
assert result.iloc[0]["lat_sin_doy"] == 0.0
|
||||
assert result.iloc[0]["lat_cos_doy"] == 0.0
|
||||
|
||||
|
||||
class TestBuildFeatureMatrix:
|
||||
def test_all_feature_columns_present(self):
|
||||
"""build_feature_matrix must add all FEATURE_COLUMNS."""
|
||||
dt = datetime(2023, 9, 22, 0, 0, tzinfo=timezone.utc)
|
||||
df = _make_df([dt], [30.0])
|
||||
result = build_feature_matrix(df)
|
||||
for col in FEATURE_COLUMNS:
|
||||
assert col in result.columns, f"Missing feature: {col}"
|
||||
|
||||
def test_original_columns_preserved(self):
|
||||
"""build_feature_matrix must not remove original columns."""
|
||||
dt = datetime(2023, 9, 22, 0, 0, tzinfo=timezone.utc)
|
||||
df = _make_df([dt], [30.0])
|
||||
result = build_feature_matrix(df)
|
||||
for col in ("utc_dt", "lat", "lng", "elevation_m"):
|
||||
assert col in result.columns
|
||||
|
||||
def test_multi_row_input(self):
|
||||
"""build_feature_matrix must handle multiple rows correctly."""
|
||||
dts = [
|
||||
datetime(2023, 1, 1, 0, 0, tzinfo=timezone.utc),
|
||||
datetime(2023, 6, 21, 0, 0, tzinfo=timezone.utc),
|
||||
datetime(2023, 12, 31, 0, 0, tzinfo=timezone.utc),
|
||||
]
|
||||
lats = [10.0, 40.0, -30.0]
|
||||
df = _make_df(dts, lats)
|
||||
result = build_feature_matrix(df)
|
||||
assert len(result) == 3
|
||||
assert result.iloc[0]["day_of_year"] == 1
|
||||
assert result.iloc[2]["day_of_year"] == 365
|
||||
|
||||
def test_feature_columns_constant(self):
|
||||
"""FEATURE_COLUMNS list must have exactly 6 elements."""
|
||||
assert len(FEATURE_COLUMNS) == 6
|
||||
116
tests/test_sightings_loader.py
Normal file
116
tests/test_sightings_loader.py
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
"""
|
||||
Tests for src/collect/data/sightings_loader.py
|
||||
|
||||
Verifies that VERIFIED_SIGHTINGS has the expected structure, SightingRecord
|
||||
TypedDict fields are consistent, and load_verified_sightings() returns a
|
||||
correctly shaped DataFrame with UTC-aware timestamps.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from datetime import timezone
|
||||
|
||||
import sys
|
||||
sys.path.insert(0, str(__import__("pathlib").Path(__file__).parent.parent))
|
||||
|
||||
from src.collect.data.sightings_loader import (
|
||||
VERIFIED_SIGHTINGS,
|
||||
SightingRecord,
|
||||
load_verified_sightings,
|
||||
)
|
||||
|
||||
|
||||
class TestVerifiedSightingsData:
|
||||
def test_sightings_list_is_nonempty(self):
|
||||
"""VERIFIED_SIGHTINGS must contain at least one record."""
|
||||
assert len(VERIFIED_SIGHTINGS) > 0, "VERIFIED_SIGHTINGS is empty"
|
||||
|
||||
def test_all_records_have_required_keys(self):
|
||||
"""Every record must contain all SightingRecord fields."""
|
||||
required = {"prayer", "date_local", "time_local", "utc_offset", "lat", "lng",
|
||||
"elevation_m", "source", "notes"}
|
||||
missing_any = []
|
||||
for i, rec in enumerate(VERIFIED_SIGHTINGS):
|
||||
missing = required - set(rec.keys())
|
||||
if missing:
|
||||
missing_any.append((i, missing))
|
||||
assert not missing_any, f"Records missing keys: {missing_any[:3]}"
|
||||
|
||||
def test_prayer_values_are_fajr_or_isha(self):
|
||||
"""Each record's prayer field must be 'fajr' or 'isha'."""
|
||||
invalid = [r for r in VERIFIED_SIGHTINGS if r["prayer"] not in ("fajr", "isha")]
|
||||
assert not invalid, f"{len(invalid)} records with invalid prayer value"
|
||||
|
||||
def test_date_local_format(self):
|
||||
"""date_local must be in YYYY-MM-DD format."""
|
||||
from datetime import datetime
|
||||
invalid = []
|
||||
for i, r in enumerate(VERIFIED_SIGHTINGS):
|
||||
try:
|
||||
datetime.strptime(r["date_local"], "%Y-%m-%d")
|
||||
except ValueError:
|
||||
invalid.append((i, r["date_local"]))
|
||||
assert not invalid, f"Invalid date_local formats (first 3): {invalid[:3]}"
|
||||
|
||||
def test_time_local_format(self):
|
||||
"""time_local must be HH:MM (24-hour)."""
|
||||
from datetime import datetime
|
||||
invalid = []
|
||||
for i, r in enumerate(VERIFIED_SIGHTINGS):
|
||||
try:
|
||||
datetime.strptime(r["time_local"], "%H:%M")
|
||||
except ValueError:
|
||||
invalid.append((i, r["time_local"]))
|
||||
assert not invalid, f"Invalid time_local formats (first 3): {invalid[:3]}"
|
||||
|
||||
def test_lat_lng_are_numeric_and_in_range(self):
|
||||
"""lat must be in [-90, 90], lng in [-180, 180]."""
|
||||
bad_lat = [r for r in VERIFIED_SIGHTINGS if not (-90 <= r["lat"] <= 90)]
|
||||
bad_lng = [r for r in VERIFIED_SIGHTINGS if not (-180 <= r["lng"] <= 180)]
|
||||
assert not bad_lat, f"{len(bad_lat)} records with lat out of range"
|
||||
assert not bad_lng, f"{len(bad_lng)} records with lng out of range"
|
||||
|
||||
|
||||
class TestLoadVerifiedSightings:
|
||||
def setup_method(self):
|
||||
self.df = load_verified_sightings()
|
||||
|
||||
def test_returns_dataframe(self):
|
||||
"""load_verified_sightings must return a pandas DataFrame."""
|
||||
import pandas as pd
|
||||
assert isinstance(self.df, pd.DataFrame)
|
||||
|
||||
def test_row_count_matches_list(self):
|
||||
"""DataFrame must have the same number of rows as VERIFIED_SIGHTINGS."""
|
||||
assert len(self.df) == len(VERIFIED_SIGHTINGS)
|
||||
|
||||
def test_required_columns_present(self):
|
||||
"""All required output columns must be present."""
|
||||
required = {"date", "utc_dt", "lat", "lng", "elevation_m", "prayer", "source", "notes"}
|
||||
missing = required - set(self.df.columns)
|
||||
assert not missing, f"Missing columns: {missing}"
|
||||
|
||||
def test_utc_dt_is_timezone_aware(self):
|
||||
"""utc_dt column must contain timezone-aware datetimes in UTC."""
|
||||
for dt in self.df["utc_dt"].head(10):
|
||||
assert dt.tzinfo is not None, f"utc_dt {dt} is not timezone-aware"
|
||||
assert dt.tzinfo == timezone.utc, f"utc_dt {dt} is not UTC"
|
||||
|
||||
def test_prayer_column_values(self):
|
||||
"""prayer column must only contain 'fajr' or 'isha'."""
|
||||
unique = set(self.df["prayer"].unique())
|
||||
assert unique <= {"fajr", "isha"}, f"Unexpected prayer values: {unique}"
|
||||
|
||||
def test_utc_conversion_blackburn_fajr(self):
|
||||
"""Spot-check UTC conversion for a known Blackburn BST record.
|
||||
|
||||
1987-09-21, 05:30 BST (UTC+1) should convert to 04:30 UTC.
|
||||
"""
|
||||
blackburn = self.df[
|
||||
(self.df["date"].astype(str) == "1987-09-21")
|
||||
& (self.df["prayer"] == "fajr")
|
||||
& (self.df["lat"].round(2) == 53.75)
|
||||
]
|
||||
assert len(blackburn) >= 1, "Could not find Blackburn 1987-09-21 fajr record"
|
||||
utc_dt = blackburn.iloc[0]["utc_dt"]
|
||||
assert utc_dt.hour == 4, f"Expected hour 4 (UTC), got {utc_dt.hour}"
|
||||
assert utc_dt.minute == 30, f"Expected minute 30, got {utc_dt.minute}"
|
||||
173
tests/test_sightings_stats.py
Normal file
173
tests/test_sightings_stats.py
Normal file
|
|
@ -0,0 +1,173 @@
|
|||
"""
|
||||
Tests for src/collect/analysis/sightings_stats.py
|
||||
|
||||
Verifies angle_summary, geographic_coverage, and print_dataset_report
|
||||
return correct types and values. Plot functions are tested for return type only
|
||||
(no display).
|
||||
"""
|
||||
|
||||
import sys
|
||||
sys.path.insert(0, str(__import__("pathlib").Path(__file__).parent.parent))
|
||||
|
||||
import pytest
|
||||
import pandas as pd
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from src.collect.analysis.sightings_stats import (
|
||||
angle_summary,
|
||||
geographic_coverage,
|
||||
plot_angle_distribution,
|
||||
plot_angle_vs_latitude,
|
||||
plot_angle_vs_day_of_year,
|
||||
print_dataset_report,
|
||||
)
|
||||
|
||||
|
||||
def _make_fajr_df(n=10):
|
||||
"""Build a minimal Fajr DataFrame with day_of_year and fajr_angle."""
|
||||
rows = []
|
||||
for i in range(n):
|
||||
day = (i * 36) + 1 # spread across the year
|
||||
dt = datetime(2023, 1, 1, 4, 0, tzinfo=timezone.utc)
|
||||
rows.append({
|
||||
"date": f"2023-01-{i+1:02d}",
|
||||
"utc_dt": dt,
|
||||
"lat": 30.0 + i * 0.5,
|
||||
"lng": 29.0,
|
||||
"elevation_m": 100.0,
|
||||
"prayer": "fajr",
|
||||
"fajr_angle": 12.0 + i * 0.3,
|
||||
"day_of_year": day,
|
||||
"source": f"Source {i}",
|
||||
"notes": "test",
|
||||
})
|
||||
return pd.DataFrame(rows)
|
||||
|
||||
|
||||
def _make_isha_df(n=5):
|
||||
"""Build a minimal Isha DataFrame."""
|
||||
rows = []
|
||||
for i in range(n):
|
||||
rows.append({
|
||||
"date": f"2023-03-{i+1:02d}",
|
||||
"utc_dt": datetime(2023, 3, i+1, 19, 0, tzinfo=timezone.utc),
|
||||
"lat": 25.0 + i,
|
||||
"lng": 46.0,
|
||||
"elevation_m": 620.0,
|
||||
"prayer": "isha",
|
||||
"isha_angle": 17.0 + i * 0.2,
|
||||
"day_of_year": 60 + i,
|
||||
"source": f"Isha Source {i}",
|
||||
"notes": "test",
|
||||
})
|
||||
return pd.DataFrame(rows)
|
||||
|
||||
|
||||
class TestAngleSummary:
|
||||
def test_returns_series(self):
|
||||
"""angle_summary must return a pandas Series."""
|
||||
df = _make_fajr_df()
|
||||
result = angle_summary(df, "fajr_angle")
|
||||
assert isinstance(result, pd.Series)
|
||||
|
||||
def test_count_is_correct(self):
|
||||
"""count in summary must equal number of non-NaN angle rows."""
|
||||
df = _make_fajr_df(10)
|
||||
result = angle_summary(df, "fajr_angle")
|
||||
assert result["count"] == 10.0
|
||||
|
||||
def test_mean_is_plausible(self):
|
||||
"""Mean angle must be within the data range."""
|
||||
df = _make_fajr_df(10)
|
||||
result = angle_summary(df, "fajr_angle")
|
||||
assert df["fajr_angle"].min() <= result["mean"] <= df["fajr_angle"].max()
|
||||
|
||||
def test_isha_angle_col(self):
|
||||
"""Works correctly with isha_angle column."""
|
||||
df = _make_isha_df()
|
||||
result = angle_summary(df, "isha_angle")
|
||||
assert result["count"] == 5.0
|
||||
|
||||
|
||||
class TestGeographicCoverage:
|
||||
def test_returns_dict(self):
|
||||
"""geographic_coverage must return a dict."""
|
||||
df = _make_fajr_df()
|
||||
result = geographic_coverage(df)
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_required_keys_present(self):
|
||||
"""All expected keys must be in the result dict."""
|
||||
df = _make_fajr_df()
|
||||
result = geographic_coverage(df)
|
||||
for key in ("lat_min", "lat_max", "unique_locs", "date_min", "date_max", "total_records"):
|
||||
assert key in result, f"Missing key: {key}"
|
||||
|
||||
def test_total_records_correct(self):
|
||||
"""total_records must equal len(df)."""
|
||||
df = _make_fajr_df(7)
|
||||
result = geographic_coverage(df)
|
||||
assert result["total_records"] == 7
|
||||
|
||||
def test_lat_range_correct(self):
|
||||
"""lat_min and lat_max must match DataFrame lat column bounds."""
|
||||
df = _make_fajr_df(10)
|
||||
result = geographic_coverage(df)
|
||||
assert abs(result["lat_min"] - df["lat"].min()) < 1e-6
|
||||
assert abs(result["lat_max"] - df["lat"].max()) < 1e-6
|
||||
|
||||
def test_unique_locs_counts_groups(self):
|
||||
"""unique_locs must count distinct (lat, lng) pairs."""
|
||||
df = _make_fajr_df(10)
|
||||
expected = len(df.groupby(["lat", "lng"]))
|
||||
result = geographic_coverage(df)
|
||||
assert result["unique_locs"] == expected
|
||||
|
||||
|
||||
class TestPrintDatasetReport:
|
||||
def test_runs_without_error(self, capsys):
|
||||
"""print_dataset_report must run to completion without exceptions."""
|
||||
fajr = _make_fajr_df(5)
|
||||
isha = _make_isha_df(3)
|
||||
print_dataset_report(fajr, isha)
|
||||
out = capsys.readouterr().out
|
||||
assert "Fajr dataset: 5" in out
|
||||
assert "Isha dataset: 3" in out
|
||||
|
||||
def test_empty_datasets(self, capsys):
|
||||
"""print_dataset_report handles empty DataFrames without crashing."""
|
||||
fajr = pd.DataFrame(columns=["fajr_angle"])
|
||||
isha = pd.DataFrame(columns=["isha_angle"])
|
||||
print_dataset_report(fajr, isha)
|
||||
out = capsys.readouterr().out
|
||||
assert "Fajr dataset: 0" in out
|
||||
assert "Isha dataset: 0" in out
|
||||
|
||||
|
||||
class TestPlotFunctions:
|
||||
def test_plot_angle_distribution_returns_figure(self):
|
||||
"""plot_angle_distribution must return a matplotlib Figure."""
|
||||
import matplotlib
|
||||
matplotlib.use("Agg") # non-interactive backend for tests
|
||||
df = _make_fajr_df(10)
|
||||
fig = plot_angle_distribution(df, "fajr_angle")
|
||||
import matplotlib.figure
|
||||
assert isinstance(fig, matplotlib.figure.Figure)
|
||||
|
||||
def test_plot_angle_vs_latitude_returns_figure(self):
|
||||
"""plot_angle_vs_latitude must return a matplotlib Figure."""
|
||||
import matplotlib
|
||||
matplotlib.use("Agg")
|
||||
df = _make_fajr_df(10)
|
||||
fig = plot_angle_vs_latitude(df, "fajr_angle")
|
||||
import matplotlib.figure
|
||||
assert isinstance(fig, matplotlib.figure.Figure)
|
||||
|
||||
def test_plot_angle_vs_day_of_year_returns_figure(self):
|
||||
"""plot_angle_vs_day_of_year must return a matplotlib Figure."""
|
||||
import matplotlib
|
||||
matplotlib.use("Agg")
|
||||
df = _make_fajr_df(10)
|
||||
fig = plot_angle_vs_day_of_year(df, "fajr_angle")
|
||||
import matplotlib.figure
|
||||
assert isinstance(fig, matplotlib.figure.Figure)
|
||||
Loading…
Reference in a new issue