{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fajr and Isha Angle: Exploratory Analysis\n", "\n", "This notebook explores the compiled datasets of verified human sightings to find patterns\n", "in how the solar depression angle at Fajr and Isha varies with:\n", "\n", "- **Latitude** — distance from the equator\n", "- **Day of Year (TOY)** — seasonality\n", "- **Elevation** — metres above sea level\n", "\n", "Run the pipeline first:\n", "```bash\n", "python -m src.pipeline --no-elevation-lookup\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib.ticker as ticker\n", "from pathlib import Path\n", "\n", "ROOT = Path.cwd().parent\n", "\n", "fajr = pd.read_csv(ROOT / 'data/processed/fajr_angles.csv', parse_dates=['utc_dt'])\n", "isha = pd.read_csv(ROOT / 'data/processed/isha_angles.csv', parse_dates=['utc_dt'])\n", "\n", "print(f'Fajr records: {len(fajr)}')\n", "print(f'Isha records: {len(isha)}')\n", "print(f'Fajr latitude range: {fajr[\"lat\"].min():.1f}° to {fajr[\"lat\"].max():.1f}°')\n", "print(f'Fajr date range: {fajr[\"date\"].min()} to {fajr[\"date\"].max()}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Angle Distribution Overview" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n", "\n", "axes[0].hist(fajr['fajr_angle'], bins=60, color='steelblue', alpha=0.8, edgecolor='white')\n", "axes[0].axvline(fajr['fajr_angle'].mean(), color='red', linestyle='--', label=f'Mean {fajr[\"fajr_angle\"].mean():.2f}°')\n", "axes[0].axvline(fajr['fajr_angle'].median(), color='orange', linestyle='--', label=f'Median {fajr[\"fajr_angle\"].median():.2f}°')\n", "axes[0].set_xlabel('Solar Depression Angle (°)')\n", "axes[0].set_ylabel('Count')\n", "axes[0].set_title(f'Fajr Angle Distribution (n={len(fajr):,})')\n", "axes[0].legend()\n", "\n", "if len(isha) > 0:\n", " axes[1].hist(isha['isha_angle'], bins=20, color='darkorange', alpha=0.8, edgecolor='white')\n", " axes[1].axvline(isha['isha_angle'].mean(), color='red', linestyle='--', label=f'Mean {isha[\"isha_angle\"].mean():.2f}°')\n", " axes[1].set_xlabel('Solar Depression Angle (°)')\n", " axes[1].set_ylabel('Count')\n", " axes[1].set_title(f'Isha Angle Distribution (n={len(isha):,})')\n", " axes[1].legend()\n", "\n", "plt.tight_layout()\n", "plt.savefig(ROOT / 'data/processed/angle_distribution.png', dpi=150, bbox_inches='tight')\n", "plt.show()\n", "\n", "print('\\nFajr angle percentiles:')\n", "print(fajr['fajr_angle'].describe().to_string())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Latitude vs Fajr Angle" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(14, 6))\n", "\n", "# Scatter for non-Birmingham records (smaller dataset, more geographic variety)\n", "bham = fajr[fajr['lat'].between(52.4, 52.5)]\n", "other = fajr[~fajr['lat'].between(52.4, 52.5)]\n", "\n", "ax.scatter(bham['lat'], bham['fajr_angle'], alpha=0.1, s=8, color='steelblue', label=f'Birmingham OpenFajr (n={len(bham):,})')\n", "ax.scatter(other['lat'], other['fajr_angle'], alpha=0.8, s=40, color='red', zorder=5, label=f'Other locations (n={len(other):,})')\n", "\n", "# Mean by latitude band\n", "fajr['lat_band'] = (fajr['lat'] / 5).round() * 5 # round to nearest 5°\n", "band_means = fajr.groupby('lat_band')['fajr_angle'].mean()\n", "ax.plot(band_means.index, band_means.values, 'k--', linewidth=2, label='Band mean (5° bins)')\n", "\n", "ax.set_xlabel('Latitude (°)')\n", "ax.set_ylabel('Fajr Depression Angle (°)')\n", "ax.set_title('Fajr Angle vs Latitude')\n", "ax.legend()\n", "ax.grid(True, alpha=0.3)\n", "ax.axhline(fajr['fajr_angle'].mean(), color='gray', linestyle=':', alpha=0.5, label='Overall mean')\n", "\n", "plt.tight_layout()\n", "plt.savefig(ROOT / 'data/processed/fajr_vs_latitude.png', dpi=150, bbox_inches='tight')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Seasonality (Day of Year) vs Fajr Angle — Birmingham" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Birmingham has 4,000+ records — ideal for TOY analysis\n", "bham = fajr[fajr['lat'].between(52.4, 52.5)].copy()\n", "\n", "fig, axes = plt.subplots(2, 1, figsize=(14, 10))\n", "\n", "# Raw scatter\n", "axes[0].scatter(bham['day_of_year'], bham['fajr_angle'], alpha=0.3, s=5, color='steelblue')\n", "axes[0].set_xlabel('Day of Year')\n", "axes[0].set_ylabel('Fajr Depression Angle (°)')\n", "axes[0].set_title('Birmingham Fajr Angle vs Day of Year (raw)')\n", "axes[0].set_xticks([1, 60, 121, 182, 244, 305, 365])\n", "axes[0].set_xticklabels(['Jan', 'Mar', 'May', 'Jul', 'Sep', 'Nov', 'Dec'])\n", "axes[0].grid(True, alpha=0.3)\n", "\n", "# Rolling mean (30-day window)\n", "bham_sorted = bham.sort_values('day_of_year')\n", "bham_sorted['rolling_mean'] = bham_sorted['fajr_angle'].rolling(window=30, center=True).mean()\n", "axes[1].scatter(bham_sorted['day_of_year'], bham_sorted['fajr_angle'], alpha=0.15, s=5, color='steelblue')\n", "axes[1].plot(bham_sorted['day_of_year'], bham_sorted['rolling_mean'], 'r-', linewidth=2, label='30-day rolling mean')\n", "axes[1].axhline(bham['fajr_angle'].mean(), color='gray', linestyle='--', label=f'Overall mean {bham[\"fajr_angle\"].mean():.2f}°')\n", "axes[1].set_xlabel('Day of Year')\n", "axes[1].set_ylabel('Fajr Depression Angle (°)')\n", "axes[1].set_title('Birmingham Fajr Angle vs Day of Year (smoothed)')\n", "axes[1].set_xticks([1, 60, 121, 182, 244, 305, 365])\n", "axes[1].set_xticklabels(['Jan', 'Mar', 'May', 'Jul', 'Sep', 'Nov', 'Dec'])\n", "axes[1].legend()\n", "axes[1].grid(True, alpha=0.3)\n", "\n", "plt.tight_layout()\n", "plt.savefig(ROOT / 'data/processed/birmingham_seasonality.png', dpi=150, bbox_inches='tight')\n", "plt.show()\n", "\n", "# Stats by season\n", "bham['season'] = pd.cut(bham['day_of_year'],\n", " bins=[0, 80, 172, 266, 355, 366],\n", " labels=['Winter', 'Spring', 'Summer', 'Autumn', 'Winter2'])\n", "print('Birmingham Fajr angle by season:')\n", "print(bham.groupby('season')['fajr_angle'].describe().to_string())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Latitude × Season Interaction" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# For non-Birmingham locations with per-season data\n", "other = fajr[~fajr['lat'].between(52.4, 52.5)].copy()\n", "\n", "fig, ax = plt.subplots(figsize=(14, 6))\n", "\n", "scatter = ax.scatter(other['day_of_year'], other['fajr_angle'],\n", " c=other['lat'], cmap='RdYlBu', s=80, alpha=0.8,\n", " vmin=-40, vmax=55)\n", "\n", "cbar = plt.colorbar(scatter, ax=ax)\n", "cbar.set_label('Latitude (°)')\n", "ax.set_xlabel('Day of Year')\n", "ax.set_ylabel('Fajr Depression Angle (°)')\n", "ax.set_title('Fajr Angle vs Season, colored by Latitude')\n", "ax.set_xticks([1, 60, 121, 182, 244, 305, 365])\n", "ax.set_xticklabels(['Jan', 'Mar', 'May', 'Jul', 'Sep', 'Nov', 'Dec'])\n", "ax.grid(True, alpha=0.3)\n", "\n", "plt.tight_layout()\n", "plt.savefig(ROOT / 'data/processed/lat_season_interaction.png', dpi=150, bbox_inches='tight')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Elevation vs Fajr Angle" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Elevation effect — compare sites with different elevations at similar latitudes\n", "other = fajr[~fajr['lat'].between(52.4, 52.5)].copy()\n", "\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "\n", "scatter = ax.scatter(other['elevation_m'], other['fajr_angle'],\n", " c=other['lat'].abs(), cmap='viridis', s=80, alpha=0.8)\n", "\n", "cbar = plt.colorbar(scatter, ax=ax)\n", "cbar.set_label('|Latitude| (°)')\n", "ax.set_xlabel('Elevation (m)')\n", "ax.set_ylabel('Fajr Depression Angle (°)')\n", "ax.set_title('Fajr Angle vs Elevation')\n", "ax.grid(True, alpha=0.3)\n", "\n", "# Correlation\n", "corr = other[['elevation_m', 'fajr_angle']].corr().iloc[0, 1]\n", "ax.text(0.05, 0.95, f'Pearson r = {corr:.3f}', transform=ax.transAxes,\n", " fontsize=12, verticalalignment='top')\n", "\n", "plt.tight_layout()\n", "plt.savefig(ROOT / 'data/processed/elevation_effect.png', dpi=150, bbox_inches='tight')\n", "plt.show()\n", "\n", "print(f'Elevation vs Fajr angle correlation: {corr:.3f}')\n", "\n", "# Key elevation comparisons\n", "print('\\nHigh-elevation sites (>500m):')\n", "high_elev = other[other['elevation_m'] > 500].groupby(['lat', 'elevation_m'])['fajr_angle'].mean()\n", "print(high_elev.to_string())\n", "\n", "print('\\nLow-elevation sites (<50m):')\n", "low_elev = other[other['elevation_m'] < 50].groupby(['lat', 'elevation_m'])['fajr_angle'].mean()\n", "print(low_elev.to_string())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Geographic Coverage Map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Site coverage summary\n", "all_data = pd.concat([\n", " fajr[['lat', 'lng', 'elevation_m', 'source']].assign(prayer='fajr'),\n", " isha[['lat', 'lng', 'elevation_m', 'source']].assign(prayer='isha'),\n", "])\n", "\n", "sites = all_data.groupby(['lat', 'lng', 'elevation_m']).agg(\n", " n_fajr=('prayer', lambda x: (x == 'fajr').sum()),\n", " n_isha=('prayer', lambda x: (x == 'isha').sum()),\n", ").reset_index()\n", "\n", "print(f'Unique observation sites: {len(sites)}')\n", "print(f'Latitude range: {sites[\"lat\"].min():.2f}° to {sites[\"lat\"].max():.2f}°')\n", "print()\n", "print('Sites with most records:')\n", "print(sites.sort_values('n_fajr', ascending=False).head(10).to_string())\n", "\n", "fig, ax = plt.subplots(figsize=(16, 8))\n", "sc = ax.scatter(sites['lng'], sites['lat'],\n", " s=np.sqrt(sites['n_fajr'] + sites['n_isha']) * 8 + 20,\n", " c=sites['lat'], cmap='RdYlBu', alpha=0.8, edgecolors='black', linewidth=0.5)\n", "cbar = plt.colorbar(sc, ax=ax)\n", "cbar.set_label('Latitude (°)')\n", "ax.set_xlabel('Longitude (°)')\n", "ax.set_ylabel('Latitude (°)')\n", "ax.set_title('Observation Sites (bubble size = record count)')\n", "ax.axhline(0, color='gray', linestyle='--', alpha=0.5, linewidth=0.8)\n", "ax.grid(True, alpha=0.3)\n", "ax.set_xlim(-100, 185)\n", "ax.set_ylim(-45, 65)\n", "\n", "plt.tight_layout()\n", "plt.savefig(ROOT / 'data/processed/site_map.png', dpi=150, bbox_inches='tight')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Simple Linear Regression: Fajr Angle ~ f(lat, day_of_year, elevation)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import LinearRegression\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.metrics import r2_score, mean_absolute_error\n", "import numpy as np\n", "\n", "# Use all data\n", "features = ['lat', 'day_of_year', 'elevation_m']\n", "X = fajr[features].copy()\n", "\n", "# Add squared terms for non-linearity\n", "X['lat_abs'] = fajr['lat'].abs()\n", "X['lat_sq'] = fajr['lat'] ** 2\n", "X['doy_sin'] = np.sin(2 * np.pi * fajr['day_of_year'] / 365.25)\n", "X['doy_cos'] = np.cos(2 * np.pi * fajr['day_of_year'] / 365.25)\n", "X['doy_sin2'] = np.sin(4 * np.pi * fajr['day_of_year'] / 365.25)\n", "X['doy_cos2'] = np.cos(4 * np.pi * fajr['day_of_year'] / 365.25)\n", "\n", "y = fajr['fajr_angle']\n", "\n", "scaler = StandardScaler()\n", "X_scaled = scaler.fit_transform(X)\n", "\n", "model = LinearRegression()\n", "model.fit(X_scaled, y)\n", "y_pred = model.predict(X_scaled)\n", "\n", "print(f'R² = {r2_score(y, y_pred):.4f}')\n", "print(f'MAE = {mean_absolute_error(y, y_pred):.4f}°')\n", "print()\n", "print('Feature coefficients:')\n", "for feat, coef in zip(X.columns, model.coef_):\n", " print(f' {feat:15s}: {coef:.4f}')\n", "\n", "# Residuals\n", "residuals = y - y_pred\n", "print(f'\\nResidual stats:')\n", "print(residuals.describe().to_string())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Residual plot\n", "fig, axes = plt.subplots(1, 3, figsize=(16, 5))\n", "\n", "axes[0].scatter(fajr['lat'], residuals, alpha=0.1, s=5)\n", "axes[0].axhline(0, color='red', linestyle='--')\n", "axes[0].set_xlabel('Latitude')\n", "axes[0].set_ylabel('Residual (°)')\n", "axes[0].set_title('Residuals vs Latitude')\n", "\n", "axes[1].scatter(fajr['day_of_year'], residuals, alpha=0.1, s=5)\n", "axes[1].axhline(0, color='red', linestyle='--')\n", "axes[1].set_xlabel('Day of Year')\n", "axes[1].set_ylabel('Residual (°)')\n", "axes[1].set_title('Residuals vs Day of Year')\n", "\n", "axes[2].scatter(fajr['elevation_m'], residuals, alpha=0.3, s=20)\n", "axes[2].axhline(0, color='red', linestyle='--')\n", "axes[2].set_xlabel('Elevation (m)')\n", "axes[2].set_ylabel('Residual (°)')\n", "axes[2].set_title('Residuals vs Elevation')\n", "\n", "plt.suptitle('Linear Regression Residuals')\n", "plt.tight_layout()\n", "plt.savefig(ROOT / 'data/processed/regression_residuals.png', dpi=150, bbox_inches='tight')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Isha Angle Analysis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if len(isha) > 0:\n", " fig, axes = plt.subplots(1, 3, figsize=(16, 5))\n", "\n", " axes[0].scatter(isha['lat'], isha['isha_angle'], color='darkorange', alpha=0.8, s=60)\n", " axes[0].set_xlabel('Latitude (°)')\n", " axes[0].set_ylabel('Isha Depression Angle (°)')\n", " axes[0].set_title('Isha Angle vs Latitude')\n", " axes[0].grid(True, alpha=0.3)\n", "\n", " axes[1].scatter(isha['day_of_year'], isha['isha_angle'], color='darkorange', alpha=0.8, s=60)\n", " axes[1].set_xlabel('Day of Year')\n", " axes[1].set_ylabel('Isha Depression Angle (°)')\n", " axes[1].set_title('Isha Angle vs Season')\n", " axes[1].set_xticks([1, 60, 121, 182, 244, 305, 365])\n", " axes[1].set_xticklabels(['Jan', 'Mar', 'May', 'Jul', 'Sep', 'Nov', 'Dec'])\n", " axes[1].grid(True, alpha=0.3)\n", "\n", " axes[2].scatter(isha['elevation_m'], isha['isha_angle'], color='darkorange', alpha=0.8, s=60)\n", " axes[2].set_xlabel('Elevation (m)')\n", " axes[2].set_ylabel('Isha Depression Angle (°)')\n", " axes[2].set_title('Isha Angle vs Elevation')\n", " axes[2].grid(True, alpha=0.3)\n", "\n", " plt.suptitle(f'Isha Analysis (n={len(isha)} records)')\n", " plt.tight_layout()\n", " plt.savefig(ROOT / 'data/processed/isha_analysis.png', dpi=150, bbox_inches='tight')\n", " plt.show()\n", "\n", " print('Isha angle stats by latitude band:')\n", " isha['lat_band'] = pd.cut(isha['lat'], bins=[-40, -10, 10, 30, 45, 60],\n", " labels=['30-40°S', '10°S-10°N', '10-30°N', '30-45°N', '45-60°N'])\n", " print(isha.groupby('lat_band')['isha_angle'].describe().to_string())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Summary and Hypotheses for ML\n", "\n", "### Observed patterns:\n", "\n", "1. **Latitude effect**: Near-equatorial sites (Malaysia, Indonesia, 2°-7°) show higher Fajr angles (~16°-17°) compared to mid-latitude sites (UK ~13°, Egypt ~14°). This is counter-intuitive but physically explainable: the sun's arc through the horizon zone is steeper at low latitudes, so each degree of depression corresponds to a shorter time interval.\n", "\n", "2. **Seasonality (TOY)**: At fixed latitude, Fajr angle is lower in summer than winter. This is clear in the Birmingham dataset (10+ years of data). Summer twilight is shorter and the sun's path through the horizon zone is shallower.\n", "\n", "3. **Elevation**: Higher-elevation sites tend toward slightly higher angles. Desert observatory sites (Kottamia 477m, Hail 1020m, Tehran 1191m) show angles on the higher end. This is consistent with the physical effect: elevated observers see through less atmosphere, so the first light of dawn appears at a slightly steeper angle.\n", "\n", "4. **Latitude × Season interaction**: The seasonal swing is larger at high latitudes (Birmingham has a ~3° range from summer to winter) and smaller at equatorial sites (Malaysian sites show < 1° seasonal variation).\n", "\n", "### Next steps for ML:\n", "\n", "- Train gradient boosted models (XGBoost, LightGBM) on all available data\n", "- Key features: `lat`, `lat_abs`, `lat_sq`, `day_of_year`, `doy_sin`, `doy_cos`, `elevation_m`, `lat × doy_sin`, `lat × doy_cos`\n", "- Expand Isha dataset (currently only 43 records) before training Isha model\n", "- Outlier analysis: identify records that deviate significantly from the fitted model and investigate whether they represent data entry errors, unusual atmospheric conditions, or genuine outliers\n", "- Cross-validation strategy: leave-one-location-out (not random split) to test generalization to unseen locations" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.14.0" } }, "nbformat": 4, "nbformat_minor": 4 }