Script for creating text data files graphs and 7z archives from Tutiempo dataset

Aktualności: Prosimy o dodawanie na końcu wiadomości tagów w nawiasach kwadratowych, które pozwolą na szybsze wyszukiwanie interesującej treści.
Przykłady tagów: [2016] [pogoda] [burza] [prognoza] [grad]

Autor Wątek: Script for creating text data files graphs and 7z archives from Tutiempo dataset  (Przeczytany 2526 razy)

0 użytkowników i 1 Gość przegląda ten wątek.

Offline TommyAst

  • Newbie
  • *
  • Wiadomości: 43
  • Reputacja: 17
Hi

Here is script, that create text data files by year and solid txt file one per station from page www.tutiempo.net/en/Climate/. There are more than 300 selected stations with synoptic ID (from GSOD database). It contain all Czech Slovak and Poland stations and more than 200 selected stations from world.

This script too create yearly graphs (Temperature, Preception, Wind, Visiblity, Pressure, Humidity and Occurence of rain, snowing, thunderstorm and fog). Graphs are png images with maximum compression (1280x960, 256 colors).

Finally the script create 7zip archives (Maximum compression LZMA/Ultra/1024 MiB/273) - Archive with solid station files, with yearly files, with yearly files and graphs and with graphs.

Script run in Linux. You need programs - Gnuplot, 7zip, wget, gawk or awk, image magic for convert images. 7zip use 1024 MB vocabulary and it required about 11 GiB ram. You can use smaller vocabulary (16 32 48 64 96 128 256 384 512 768 MiB) and it takes smaller ammount of ram. Used disk space is about 2-3,5 GB (lees after compression). Time of script run is abbout 10-25 hours (6-9 hours downloading, 3-9 hours creating and converting graphs,1-4 hours 7z compressing).

Number of files during procces is about 200 000 - 400 000 (most of it are graphs).

The bash script and scripts for gnuplot are aviable here:
http://meteotommy.twilightsparkle.cz/Tutiempo_SCRIPTS.7z

First - You have to define your user name at the begin of bash script. You have to change user name and ${W} variablein bash script. Data are /home/${USER_NAME}/TUTIEMPO/ and scripts (Gnuplot and bash) are in /home/${USERNAME}/SKRIPT/Tutiempo/

Do not run this scripts paralell in many places - it might overload server www.tutiempo.net
« Ostatnia zmiana: Październik 08, 2014, 19:02:44 pm wysłana przez TommyAst »



Offline TommyAst

  • Newbie
  • *
  • Wiadomości: 43
  • Reputacja: 17
Odp: Script for creating text data files graphs and 7z archives from Tutiempo dataset
« Odpowiedź #1 dnia: Październik 14, 2014, 13:02:01 pm »
The script have run succesfully and some errors was repaired.

The time of run is longer. About 30 hours creating yearly TXT files, 20-50 hours creating graphs and converting, 1-4 hod 7z compressing, 30 hours creating solid TXT files. The total time is 3,5-5 days.

Ammount of data is - 3,4 GiB archives. Solid TXT files - about 360 MiB, 50 MiB after compression, 437 files. TXT files by years withouth graphs - 440 folders, 14 113 files, cca 250 MiB, 51 MiB after compression. TXT files by years with graphs - 440 folders, 252 406 files, size is about 2,5 GiB, about 1,6 GiB compressed.

There is some mistake with creating graphs of occurence rain, snowing, thunder and fog in this version.

TXT data and graphs are aviable here:
http://meteotommy.twilightsparkle.cz/OBR/tutiempo_gsod/

And vhole 7z archives:
http://meteotommy.twilightsparkle.cz/Tutiempo_GSOD_Archives/


Offline pdjakow

  • Administrator
  • Ojciec Dyktator
  • *****
  • Wiadomości: 2757
  • Age: 39
  • Miejsce pobytu: Wrocław
  • Reputacja: 197
  • Płeć: Mężczyzna
    • http://meteomodel.pl
Odp: Script for creating text data files graphs and 7z archives from Tutiempo dataset
« Odpowiedź #2 dnia: Październik 15, 2014, 10:25:51 am »
Thanks Tommy!

It's very usefull. Data for old Wroclaw site:
http://meteotommy.twilightsparkle.cz/OBR/tutiempo_gsod/POLAND/WROCLAW_I_124250/


Modele Numeryczne GFS/WRF
http://meteomodel.pl

Offline charly

  • Zastrzeżona
  • Prezes
  • *
  • Wiadomości: 1078
  • Age: 2013
  • Miejsce pobytu: Pabianice
  • Reputacja: 44
  • Płeć: Mężczyzna
  • One man's freedom is another man's terror
Odp: Script for creating text data files graphs and 7z archives from Tutiempo dataset
« Odpowiedź #3 dnia: Październik 21, 2014, 12:42:06 pm »
Tommy, why those links do not work anymore? Is this temporary only?
w sieci widywany również jako aqu32 lub chochlik

Aktualne warunki na stacji meteo Pabianice:
http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=IWOJEWDZ30

Offline TommyAst

  • Newbie
  • *
  • Wiadomości: 43
  • Reputacja: 17
Odp: Script for creating text data files graphs and 7z archives from Tutiempo dataset
« Odpowiedź #4 dnia: Październik 26, 2014, 23:21:03 pm »
Charly

HDD in server twillightsparkle.cz had error and the server do not run for three days (whole backup have been created few days ago) and now all have been returned from backup.

Offline charly

  • Zastrzeżona
  • Prezes
  • *
  • Wiadomości: 1078
  • Age: 2013
  • Miejsce pobytu: Pabianice
  • Reputacja: 44
  • Płeć: Mężczyzna
  • One man's freedom is another man's terror
Odp: Script for creating text data files graphs and 7z archives from Tutiempo dataset
« Odpowiedź #5 dnia: Październik 27, 2014, 00:10:32 am »
Thanks Tommy, but I had noticed it even before your post ;-)
w sieci widywany również jako aqu32 lub chochlik

Aktualne warunki na stacji meteo Pabianice:
http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=IWOJEWDZ30

Offline TommyAst

  • Newbie
  • *
  • Wiadomości: 43
  • Reputacja: 17
HTML strutcure of page was recently changed and I rewrote scripts.

Here is new version of data. Images are sorting by data type and there are new types - occurence of fog, rain, snow, thunder. I have added some new stations, number of selected stations in the world is near 350 (and whole Czech, Slovakia, Poland).

http://meteotommy.twilightsparkle.cz/OBR/tutiempo_gsod/

The running for complete dataset from Tutiempo Climate is running. But it is huge ammount of data and it takes very long time.


Offline TommyAst

  • Newbie
  • *
  • Wiadomości: 43
  • Reputacja: 17
Odp: Script for creating text data files graphs and 7z archives from Tutiempo dataset
« Odpowiedź #7 dnia: Sierpień 23, 2015, 10:55:17 am »
New version of Tutiempo-GSOD graphs was created, now with more stations and more types of graphs.

http://meteotommy.twilightsparkle.cz/OBR/tutiempo_gsod/

It runs 10 days and create 450 000 - 500 000 files and 12-13 GiB of data.

Offline TommyAst

  • Newbie
  • *
  • Wiadomości: 43
  • Reputacja: 17
Odp: Script for creating text data files graphs and 7z archives from Tutiempo dataset
« Odpowiedź #8 dnia: Kwiecień 23, 2017, 23:44:52 pm »
Graphs for last year 2016 have been created and I added more stations from world.

Maximum and minimum in years 2015-2016 are token more often from hourly data than in years before.

The last Tutiempo dataset:
http://meteotommy.twilightsparkle.cz/OBR/tutiempo_gsod/

It is only for Czech, Slovakia and Poland with all stations completed and around 500 selected stations from world. I have created data and graphs too for complete Tutiempo-GSOD database, but it is about 100 GiB (about 55 GiB 7z archive) and about 10 000 000 files.

Source data are : https://en.tutiempo.net/climate
It is originally from GSOD (Global Summary Of Day) database: ftp://ftp.ncdc.noaa.gov/pub/data/gsod/

Unfortunately, html structure of Tutiempo has been changed in March 2017 again. The parsing scripts for levels continents-countries-stations stops work but script for parsing page with climate data works still. The new version of Gnuplot does not work with scripts so good too, it makes graphs with another colors, strange.