Why You Can Trust Our Data: 46,000 Dive Logs Collected Over 18 Years

2026-03-16

Why you can trust our data — starting from 118 observations in 2006, our diving visibility database has grown to 46,000+ over 18 years. Here's how we collected and verified it all.

46,483

Total observations

42

Dive sites

20 yrs

Data span

8+

Source types

Data Accumulation History

2006
118
2010
1,000
2015
2,500
2018
3,163
2021
4,763
2023
5,200
2025
5,392
2026
46,483

Cumulative observation count. Major jump in 2026 from full site integration.

Diversity of Data Sources

Japanese dive shops publish daily logs on various blog platforms. No single API covers them all — we had to build custom scrapers for each platform.

SourceObsKey Sites
ExBlog7,800Yonaguni, Osezaki
WordPress REST API5,200Ito, others
CSV (manual)4,460IOP, Akinohama
Custom site scrape12,000Futo, Kushimoto, Kumomi, Echizen
Hatena Blog2,095Omijima
FC2 Blog1,533Kerama
Blogspot1,392Tajiri
Wix Blog2,696Hirasawa
Others9,307Ishigaki, Kerama, etc.
Each scraper uses regex patterns tuned to how each blog writes visibility data — handling variations like '透明度', '透視度', 'vis' and more.

Top 10 Sites by Observation Count

#SiteObsSince
1Yonaguni4,8262010
2Futo3,4932013
3Kushimoto3,1682015
4IOP3,1512006
5Hirasawa2,6962015
6Echizen2,6522012
7Mikomoto2,2632011
8Omijima2,0952016
9Kumomi1,9802018
10Ito1,9802016

Yonaguni Leads by a Wide Margin

Yonaguni Diving Service (YDS) has posted to ExBlog almost daily since 2010, contributing 4,826 records. This consistent logging culture is the backbone of the database's value.

Challenges in Building the Database

Challenge 1: Inconsistent Formats

Shops write '透明度15m', 'vis 15', '15〜20m', '透視度10m↑' — all different formats. Extracting min/max values required site-specific regex patterns.

Challenge 2: Outliers and Errors

Miyakejima 215m typo, Futo's Saipan trip log mix-up, Amami's physically impossible 100m — we manually identified and removed 11 outliers. Data quality control is an ongoing effort.

Challenge 3: Blog Closures and Migrations

Dive shop blogs sometimes shut down or migrate. Hirasawa moved from Livedoor to Wix, requiring a completely new scraper. Continuous maintenance is essential.

The Value of This Database

Foundation for AI

46,000 observations train our LightGBM models for visibility and water temperature prediction. More data per site generally means better accuracy.

Seasonal & Long-term Trends

20 years of data reveals seasonal patterns and climate change / Kuroshio meander impacts. Trends invisible in short-term data become apparent.

Practical Info for Divers

Data-driven answers to 'when and where to dive for best visibility.' Not anecdotal — based on thousands of real measurements.

About the Data

46,483 observations collected from 42 nationwide dive shop daily logs (2006 – Mar 2026). 11 outliers removed. Scrapers run 3x daily via GitHub Actions for continuous updates.

🌊 Check Visibility Forecasts

View AI-powered 7-day visibility forecasts for 30+ dive sites across Japan.

Open Forecast App →