pray-calc-ml/research/malaysia-light-pollution-2024.md
Aric Camarata ada08e7ec4 data: expand dataset from 5.9k to 91k records via 6 new SQM sources
Add 6 new data collection pipelines and their processed outputs:

Sources added:
- TESS/Stars4All photometer network: 37 months (Jun 2017-Aug 2020),
  ~40k raw events from 100+ European stations via Zenodo archives
- Globe at Night citizen science: 26k twilight observations (2006-2024),
  filtered from 308k total observations for solar depression 6-22 deg
- GaN-MN continuous monitoring: 45 months (Jan 2022-Sep 2025),
  ~12.5k twilight events from 88 stations across 20+ countries
- Galicia SQM network: 14 stations, 1-min resolution, 7.5k events
- Madrid/Majadahonda SQM: multi-year continuous monitoring, 3.1k events
- washetdonker.nl Netherlands: 7 stations, 3.3k morning events
- Academic papers: Jordan (Abed 2015), Fayum Egypt, India photometer

Pipeline changes:
- ingest.py: add all new files to APPROVED_RAW_CSVS allowlist,
  fix filter to use allowlist instead of hardcoded exclusions
- .gitignore: exclude bulk raw data directories (BSRN, TESS, GaN-MN,
  washetdonker, Globe at Night downloads)

Final dataset: 56,668 Fajr + 34,763 Isha = 91,431 total records
Previous: 5,871 Fajr + 46 Isha = 5,917 total records
2026-03-22 16:39:29 -04:00

3.5 KiB

Alteration of Twilight Sky Brightness Profile by Light Pollution

Authors: Muhamad Syazwan Faid, Mohd Zambri Zainuddin, Nazri Muslim, Nor Hazmin Sabri, Zainol Abidin Ibrahim, Chong Hun Yih Year: 2024 Journal: Scientific Reports, 14: 26237 DOI: 10.1038/s41598-024-76550-3 PMID: 39496720, PMCID: PMC11535048 URL: https://www.nature.com/articles/s41598-024-76550-3 Sites studied: 12 sites across Malaysia (urban/suburban/pristine) + 1 in Australia Observation method: SQM (Sky Quality Meter) Date range: 2008-2022 (84 total observations, 72 validated) Mean angle (Fajr): Pristine: -17.49 deg, Suburban: -15.67 deg, Urban: -11.50 deg (brightness stability onset)

Sites

# Location Coordinates Classification Zenith mag/arcsec2
1 Putrajaya 2 54'N, 101 41'E Urban 17.11
2 Teluk Kemang 2 27'N, 101 51'E Rural 19.5
3 Tanjung Balau 1 48'N, 104 24'E Rural 19.78
4 Pantai Masjid Tengku Zaharah 5 24'N, 103 57'E Rural 19.85
5 Pantai Batu Buruk 5 19'N, 103 9'E Rural 19.23
6 Coonabarabran, Australia 31 15'S, 149 16'E Pristine 21.59
7 Pantai Mek Mas 6 19'N, 102 9'E Pristine 21.3
8 Balai Cerap Unisza 5 24'N, 102 35'E Pristine 20.08
9 Pulau Bunting 5 51'N, 100 20'E Pristine 18.94 (rejected)
10 Simpang Mengayau, Sabah 7 12'N, 116 30'E Pristine 21.64
11 Pantai Melawi Bachok 5 24'N, 102 35'E Pristine -
12 Pantai Nenasi 3 36'N, 103 30'E Pristine - (rejected)

Summary

A large-scale SQM study comparing twilight brightness profiles across urban, suburban, and pristine sites in Malaysia and Australia. 84 twilight SQM datasets collected from 2014-2022, with 72 validated (12 rejected: 2 sites with inconsistent data).

The key finding: light pollution shifts the "brightness stability point" (the solar depression angle at which twilight brightness becomes indistinguishable from the full-night baseline) dramatically:

  • Pristine: -17.49 degrees (close to the classical 18 deg)
  • Suburban: -15.67 degrees (shifted inward by ~2 deg)
  • Urban: -11.50 degrees (shifted inward by ~6 deg!)

This means in heavily light-polluted areas, the SQM cannot detect dawn until the sun is much closer to the horizon, because the urban sky glow already exceeds the natural twilight brightness.

Teluk Kemang long-term data (2008-2016) shows progressive brightening correlating with population growth, with full-night brightness changing from 19.69 to 19.55 mag/arcsec2 over 8 years.

Data Availability

AGGREGATE ONLY. The paper reports:

  • Mean brightness stability parameters per classification (Table 3)
  • Full-night zenith brightness per site (Table 2)
  • Teluk Kemang historical trend (Table 4)
  • NO per-night dates or individual SQM time series

The 72 validated data points are summarized as profiles, not individual observations. No supplementary data files with raw per-night data were published.

For ML Training

This paper is primarily useful for understanding how light pollution affects apparent dawn angle. The aggregate brightness stability values can serve as calibration points:

  • Pristine sites: dawn detection at ~17.5 deg (useful as SQM benchmark)
  • The per-site zenith brightness values help classify observation quality

Cannot extract per-night training rows. Already partially covered by abdelhadi_2022_malaysia_sqm.csv in our dataset (same research group, overlapping sites).