Ogimet reap from html to txt or csv

Aktualności: Prosimy o dodawanie na końcu wiadomości tagów w nawiasach kwadratowych, które pozwolą na szybsze wyszukiwanie interesującej treści.
Przykłady tagów: [2016] [pogoda] [burza] [prognoza] [grad]

Autor Wątek: Ogimet reap from html to txt or csv  (Przeczytany 2965 razy)

0 użytkowników i 1 Gość przegląda ten wątek.

Offline TommyAst

  • Junior Member
  • **
  • Wiadomości: 160
  • Reputacja: 18
Ogimet reap from html to txt or csv
« dnia: Kwiecień 24, 2014, 17:18:54 pm »
For reap daily extremes from state/continenT/world use (Ogimet data ranking from September 1999):

#!/bin/bash

W="/home/user/POC/OGIMET/" ## this is main directory for downloading or creating TXT files

for State in Czech_Republic_Czec Slovakia_Slova Europe_Eur; do ## Here is state and continents for reaping extremes
## for State in Europe_Eur; do

first_day=${first_day:-2013-05-01} ## first day for running
## zou can use format for first day: 2010-04-01 or 20100401
number_of_days=${number_of_days:-125} #number of days }start with the first day)
ofs=${ofs:-"\t"} # ' ' '#' ## parser between values in the file which is going to created

for ((n=0;n<$number_of_days;n++)); do
day=`date +"%Y%m%d" -d "$first_day $n day"`
out=`date +"%d.%m.%Y" -d "$first_day $n day"`

## here is parsing html stream and reap extreme values
## here zou have html files in directory ${W}Ogimet_Data_Ranking/${State}/${day::4}/${day::6}/ you can too download html from the server
cd ${W}Ogimet_Data_Ranking/${State}/${day::4}/${day::6}/ ##
XXX=`cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>;@;g' | sed 's;<[^>]*;@;' | sed 's;[()/,];;g' | grep "^ *1 " | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' | sed 's/ ; /;/g' | tr " " "_" | sed 's;_@;@;g' | tr "@" ";"`
## XXX=`cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>;@;g' | sed 's;<[^>]*;@;' | sed 's;[()/,];;g' | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' | sed 's/ ; /;/g' | tr " " "_" | sed 's;_@;@;g' | tr "@" ";" | sed 's/;;/;/g' | sed 's/;1;/;@;1;/g' | sed 's/_1;/@;1;/g' | sed 's/;_/;/g' | sed 's/_;/;/g'`
## Q=`echo "${out}@${XXX}" | sed 's/@_/;/g' | sed 's/\;\;/;/g'`
Q=`echo "${out};${XXX}" | sed 's/\;\;/;/g'`
echo "${Q}"  >> ${W}Ogimet_Data/${State}.csv ## write date and extremes
done ## all days in one state/continent/world reay
echo "Ready ${State}"
done ## All states ready

echo 'ALL READY'

## cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>; ;g' | sed 's;<[^>]*;;' | sed 's;[()/];;g' | grep "^ *1 " | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g'` ## all values in the page
## grep "^ *1 " which rank extremes (first, second, third .... 512), now fisrst
## cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>;@;g' | sed 's;<[^>]*;@;' | sed 's;[()/,];;g' | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' | sed 's/ ; /;/g' | tr " " "_" | sed 's;_@;@;g' | tr "@" ";" | sed 's/;;/;/g' | sed 's/;1;/;@;1;/g' | sed 's/_1;/@;1;/g' | sed 's/;_/;/g' | sed 's/_;/;/g'`
## whole table, values separated by ;, stations separated by @
## cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>; ;g' | sed 's;<[^>]*;;' | sed 's;[()/];;g' | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' cut -d "&" -f 1` ## writting whole table
« Ostatnia zmiana: Kwiecień 24, 2014, 17:35:11 pm wysłana przez TommyAst »



Offline TommyAst

  • Junior Member
  • **
  • Wiadomości: 160
  • Reputacja: 18
Odp: Ogimet reap from html to txt or csv
« Odpowiedź #1 dnia: Kwiecień 24, 2014, 17:19:19 pm »
Next is for Ogimet daily data (From ISD SYNOP), convert from html to txt or csv. Station list is for czcech and Slovakia, for Poland you can insert another station list. There are collums sorted in final text files, but sorting collums, which are parsed from html table, is difficult.

#!/bin/bash

W="/home/USER/POC/" ## MAIN DIRECTORY FOR DATA

## parsing for GSOD data in Ogimet web site - parsing is not too difficult as parsing Ogimet daily data:
## wget -qO- "http://www.ogimet.com/cgi-bin/gsodres?lang=en&mode=0&state=Czec&ind=11520&ord=DIR&ano=2013&mes=01&day=31&ndays=31" | sed "s/<img/<TD @/g" | grep "TD" | sed "s/font color=/@/g" | sed "s/>----</><@----><font></g" | sed "s/ align=\"center\" bgcolor=/><@></g" | cut -d "@" -f 2 | sed "s/src=\"\/sigw/\"\#000000\">/g" | sed "s/\/sw_/sw_/g" | sed "s/\" alt=\"F\">/<\/font><\/TD>/g" | sed "s/TD align=/font/g" | sed "s;\(/[0-9]*<\);\1font><;" | grep font | sed "s/><\"\#/\&/g" | cut -d "<" -f 1 | sed "s/\&\([A-Z]*\">\)/>\&/g" | cut -d ">" -f 2 | tr "\n" ";" | tr "\&" "\n" | tail -n+2 | sed "s/sw_niebla.png;/FG /g" | sed "s/sw_lluvia.png;/DE /g" | sed "s/sw_nieve.png;/SN /g" | sed "s/sw_tormenta.png;/TS /g" | sed "s/sw_nsw_azul.png;//g"

## Ogimet daily summaries withouth collum sorting and with translate weather symbols:
## wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/<[^>]*src=\"\/sigw\/\([^>\"]*\)\"[^>]*>/\1/g" | sed "s/<td>/\t/g" | sed "s/<[^>]*>//g" | tr "\n" " " | sed "s/  *sw_nsw/ sw_nsw/g" | sed "s/    */\n/g" | grep "png" | sed "s/^\([0-9]*\)\/\([0-9]*\)/\2.\1.${Y}/" | sed "s/ \([^s]\)/;\1/g" | sed "s/ /;/" | sed "s/sw_bruma_noche.png/X/g" | sed "s/sw_bruma.png/X/g" | sed "s/sw_nsw_azul.png/XX/g" | sed "s/sw_nsw_azul_noche.png/XX/g" | sed "s/sw_cubierto.png/Z/g" | sed "s/sw_cubierto_noche.png/Z/g" | sed "s/sw_lunanub.png/PJ/g" | sed "s/sw_nuboso.png/PJ/g" | sed "s/sw_luna.png/J/g" | sed "s/sw_sol.png/J/g" | sed "s/sw_lluvia.png/DE/g" | sed "s/sw_lluvia_noche.png/DE/g" | sed "s/sw_lluvia_d.png/DEl/g" | sed "s/sw_lluvia_d_noche.png/DEl/g" | sed "s/sw_llovizna_noche.png/DEm/g" | sed "s/sw_llovizna.png/DEm/g" | sed "s/sw_nieve.png/SN/g" | sed "s/sw_nieve_noche.png/SN/g" | sed "s/sw_nieve_d.png/SNl/g" | sed "s/sw_nieve_d_noche.png/SNl/g" | sed "s/sw_aguanieve.png/DE_SN/g" | sed "s/sw_aguanieve_noche.png/DE_SN/g" | sed "s/sw_lluvia_eng.png/DE_LD/g" | sed "s/sw_lluvia_eng_noche.png/DE_LD/g" | sed "s/sw_chu_lluvia.png/DP/g" | sed "s/sw_chu_lluvia_noche.png/DP/g" | sed "s/sw_chu_nieve.png/SP/g" | sed "s/sw_chu_nieve_noche.png/SP/g" | sed "s/sw_aguanieve_gran.png/DE_SNk/g" | sed "s/sw_aguanieve_gran_noche.png/DE_SNk/g" | sed "s/sw_cb_sct.png/CB/g" | sed "s/sw_cb_sct_noche.png/CB/g" | sed "s/sw_relampago.png/BL/g" | sed "s/sw_relampago_noche.png/BL/g" | sed "s/sw_tormenta_noche.png/Bourka/g" | sed "s/sw_tormenta.png/Bourka/g" | sed "s/sw_niebla.png/FG/g" | sed "s/sw_niebla_noche.png/FG/g" | sed "s/sw_niebla_eng.png/FG_Freezing/g" | sed "s/sw_niebla_eng_noche.png/FG_Freezing/g" | sed "s/sw_cencellada.png/FG_Ice/g" | sed "s/sw_cencellada_noche.png/FG_Ice/g" | sed "s/sw_cri_hielo_noche.png/SN_Cristals/g" | sed "s/sw_cri_hielo.png/SN_Cristals/g" | sed "s/sw_nieve_gran.png/SNk/g" | sed "s/sw_nieve_gran_noche.png/SNk/g" | sed "s/sw_calima.png/Opar/g" | sed "s/sw_calima_noche.png/Opar/g" | sed "s/----//g" | sed "s/---//g" | sed "s/;--;/;;/g" | sed "s/;-;/;;/g" | sed "s/;Tr;/;0.04;/g" >> ${Stanice}_${ST}.csv

## Parsing page with extremes - ogimet data ranking, ,rank 1 or n (first tip XXX and Q), or reap all values in the page with extremes,  maximum is  512  (second tip XXX and Q)
## XXX=`cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>;@;g' | sed 's;<[^>]*;@;' | sed 's;[()/,];;g' | grep "^ *1 " | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' | sed 's/ ; /;/g' | tr " " "_" | sed 's;_@;@;g' | tr "@" ";"`
## XXX=`cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>;@;g' | sed 's;<[^>]*;@;' | sed 's;[()/,];;g' | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' | sed 's/ ; /;/g' | tr " " "_" | sed 's;_@;@;g' | tr "@" ";" | sed 's/;;/;/g' | sed 's/;1;/;@;1;/g' | sed 's/_1;/@;1;/g' | sed 's/;_/;/g' | sed 's/_;/;/g'`
## Q=`echo "${out}@${XXX}" | sed 's/@_/;/g' | sed 's/\;\;/;/g'`
## Q=`echo "${out};${XXX}" | sed 's/\;\;/;/g'`

## save page tmp.html ## wget --user-agent="Mozilla/5.0" -qO- "${URL}" >  tmp.html
## reap text to processed.txt ##  wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/<[^>]*src=\"\/sigw\/\([^>\"]*\)\"[^>]*>/\1/g" | sed "s/<[^>]*>//g" | tr "\n" " " | sed "s/  *sw_nsw/ sw_nsw/g" | sed "s/    */\n/g" | grep "png" | sed "s/^\([0-9]*\)\/\([0-9]*\)/\2.\1.${Y}/" | sed "s/ \([^s]\)/;\1/g" | sed "s/ /;/" >processed.txt

## list of values ##  "\">Date" "\">Daily" "\">Snow" "\">Vis" "\">Sun" "\">Cl" "\">Prec." "\">Pres." "\">Gust" "\">Int." "\">Dir." "\">Hr." "\">Avg" "\">min" "\">Max"
if [sloupec = "\">Date"]; then i=64;fi;
if [sloupec = "\">Max"]; then i=1;fi;
if [sloupec = "\">min"]; then i=2;fi;
if [sloupec = "\">Avg"]; then i=3;fi;
if [sloupec = "\">Hr."]; then i=4;fi;
if [sloupec = "\">Dir."]; then i=5;fi;
if [sloupec = "\">Int."]; then i=6;fi;
if [sloupec = "\">Gust"]; then i=7;fi;
if [sloupec = "\">Pres."]; then i=8;fi;
if [sloupec = "\">Prec."]; then i=9;fi;
if [sloupec = "\">Cl"]; then i=10;fi;
if [sloupec = "\">Sun"]; then i=11;fi;
if [sloupec = "\">Vis"]; then i=12;fi;
if [sloupec = "\">Snow"]; then i=13;fi;
if [sloupec = "\">Daily"]; then i=14;fi;
## sorting collums ## for sloupec in "\">Date" "\">Daily" "\">Snow" "\">Vis" "\">Sun" "\">Cl" "\">Prec." "\">Pres." "\">Gust" "\">Int." "\">Dir." "\">Hr." "\">Avg" "\">min" "\">Max"; do if [ `grep "<TH[^>]*>$sloupec</TH>" tmp.html | wc -l` -eq 0 ]; then i=55; cat processed.txt | sed "s/ \([^s]\)/;\1/g" | sed "s/ /;/" | sed "s/\(\([^;]*;\)\{$i}\)/\1;/" >tmp.txt; mv tmp.txt processed.txt; fi; done
## rename weather symbols ## cat processed.txt | sed "s/sw_bruma_noche.png/X/g" | sed "s/sw_bruma.png/X/g" | sed "s/sw_nsw_azul.png/XX/g" | sed "s/sw_nsw_azul_noche.png/XX/g" | sed "s/sw_cubierto.png/Z/g" | sed "s/sw_cubierto_noche.png/Z/g" | sed "s/sw_lunanub.png/PJ/g" | sed "s/sw_nuboso.png/PJ/g" | sed "s/sw_luna.png/J/g" | sed "s/sw_sol.png/J/g" | sed "s/sw_lluvia.png/DE/g" | sed "s/sw_lluvia_noche.png/DE/g" | sed "s/sw_lluvia_d.png/DEl/g" | sed "s/sw_lluvia_d_noche.png/DEl/g" | sed "s/sw_llovizna_noche.png/DEm/g" | sed "s/sw_llovizna.png/DEm/g" | sed "s/sw_nieve.png/SN/g" | sed "s/sw_nieve_noche.png/SN/g" | sed "s/sw_nieve_d.png/SNl/g" | sed "s/sw_nieve_d_noche.png/SNl/g" | sed "s/sw_aguanieve.png/DE_SN/g" | sed "s/sw_aguanieve_noche.png/DE_SN/g" | sed "s/sw_lluvia_eng.png/DE_LD/g" | sed "s/sw_lluvia_eng_noche.png/DE_LD/g" | sed "s/sw_chu_lluvia.png/DP/g" | sed "s/sw_chu_lluvia_noche.png/DP/g" | sed "s/sw_chu_nieve.png/SP/g" | sed "s/sw_chu_nieve_noche.png/SP/g" | sed "s/sw_aguanieve_gran.png/DE_SNk/g" | sed "s/sw_aguanieve_gran_noche.png/DE_SNk/g" | sed "s/sw_cb_sct.png/CB/g" | sed "s/sw_cb_sct_noche.png/CB/g" | sed "s/sw_relampago.png/BL/g" | sed "s/sw_relampago_noche.png/BL/g" | sed "s/sw_tormenta_noche.png/Bourka/g" | sed "s/sw_tormenta.png/Bourka/g" | sed "s/sw_niebla.png/FG/g" | sed "s/sw_niebla_noche.png/FG/g" | sed "s/sw_niebla_eng.png/FG_Freezing/g" | sed "s/sw_niebla_eng_noche.png/FG_Freezing/g" | sed "s/sw_cencellada.png/FG_Ice/g" | sed "s/sw_cencellada_noche.png/FG_Ice/g" | sed "s/sw_cri_hielo_noche.png/SN_Cristals/g" | sed "s/sw_cri_hielo.png/SN_Cristals/g" | sed "s/sw_nieve_gran.png/SNk/g" | sed "s/sw_nieve_gran_noche.png/SNk/g" | sed "s/sw_calima.png/Opar/g" | sed "s/sw_calima_noche.png/Opar/g" | sed "s/----//g" | sed "s/---//g" | sed "s/;--;/;;/g" | sed "s/;-;/;;/g" | sed "s/;Tr;/;0.04;/g" >> ${Stanice}_${ST}.csv
sleep ${Pause}


Yst=1999 ## first year
Yend=2014 ## second year
Pause='0.36' ## pause after every file
Pause2='5.54' ## pause between stations

mkdir -p ${W}Ogimet_Data/Ogimet_Data_CSV/CRSK/ ## directory with data
cd ${W}Ogimet_Data/Ogimet_Data_CSV/CRSK/

for ST in 117230 114140 117820 115180 118160 117220 115460 116240 117660 114231 115410 114060 119160 114570 115090 119276 118800 116930 115400 116480 118580 119930 114232 114870 119680 119271 116360 116280 116980 114233 114234 116030 119180 119300 119270 117100 117870 114180 114640 119780 116920 118550 116520 116430 118260 119274 114500 114480 119340 117350 115670 115200 117480 117500 116590 118670 119273 114230 117300 119030 119275 119330 119760 116830 119380 115380 114380 115020 116790 118650 118410 117740 116690 118560 118190 118010 119550 119580 119520; do ## Stations to do

if [ "$ST" = "117230" ]; then Stanice="Brno_Turany"; fi  ## list of stations - ID and name
if [ "$ST" = "114140" ]; then Stanice="Karlovy_Vary"; fi
if [ "$ST" = "117820" ]; then Stanice="Ostrava_Mosnov"; fi
if [ "$ST" = "115180" ]; then Stanice="Praga_Ruzyne"; fi
if [ "$ST" = "118160" ]; then Stanice="Bratislava_Ivanka"; fi
if [ "$ST" = "117220" ]; then Stanice="SOKOLNICE_BRNO"; fi
if [ "$ST" = "115460" ]; then Stanice="ROZNO_C_BUDEJOVICE"; fi
if [ "$ST" = "116240" ]; then Stanice="CASLAV"; fi
if [ "$ST" = "117660" ]; then Stanice="CERVENA"; fi
if [ "$ST" = "114231" ]; then Stanice="CESKE_BUDEJOVICE"; fi
if [ "$ST" = "115410" ]; then Stanice="CESKE_BUDEJOVICE"; fi
if [ "$ST" = "114060" ]; then Stanice="CHEB"; fi
if [ "$ST" = "119160" ]; then Stanice="CHOPOK"; fi
if [ "$ST" = "114570" ]; then Stanice="CHURANOV"; fi
if [ "$ST" = "115090" ]; then Stanice="DOKSANY"; fi
if [ "$ST" = "119276" ]; then Stanice="Dolny_Hricov"; fi
if [ "$ST" = "118800" ]; then Stanice="DUDINCE"; fi
if [ "$ST" = "116930" ]; then Stanice="DUKOVANY"; fi
if [ "$ST" = "115400" ]; then Stanice="HOSIN"; fi
if [ "$ST" = "116480" ]; then Stanice="HRADEC_KRALOVE"; fi
if [ "$ST" = "118580" ]; then Stanice="HURBANOVO"; fi
if [ "$ST" = "119930" ]; then Stanice="KAMENICA_NAD_CIROCH"; fi
if [ "$ST" = "114232" ]; then Stanice="KBELY"; fi
if [ "$ST" = "114870" ]; then Stanice="KOCELOVICE"; fi
if [ "$ST" = "119680" ]; then Stanice="Kosice"; fi
if [ "$ST" = "119271" ]; then Stanice="Kosice_Barca"; fi
if [ "$ST" = "116360" ]; then Stanice="KOSTELNI_MYSLOVA"; fi
if [ "$ST" = "116280" ]; then Stanice="KRAMOLIN_KRESIN"; fi
if [ "$ST" = "116980" ]; then Stanice="KUCHAROVICE"; fi
if [ "$ST" = "114233" ]; then Stanice="KUNOVICE"; fi
if [ "$ST" = "114234" ]; then Stanice="LIBEREC"; fi
if [ "$ST" = "116030" ]; then Stanice="LIBEREC"; fi
if [ "$ST" = "119180" ]; then Stanice="LIESEK"; fi
if [ "$ST" = "119300" ]; then Stanice="LOMNICKY_STIT"; fi
if [ "$ST" = "119270" ]; then Stanice="LUCENEC"; fi
if [ "$ST" = "117100" ]; then Stanice="LUKA"; fi
if [ "$ST" = "117870" ]; then Stanice="LYSA_HORA"; fi
if [ "$ST" = "114180" ]; then Stanice="MARIANSKE_LAZNE"; fi
if [ "$ST" = "114640" ]; then Stanice="MILESOVKA"; fi
if [ "$ST" = "119780" ]; then Stanice="MILHOSTOV"; fi
if [ "$ST" = "116920" ]; then Stanice="NAMEST_NAD_OSLAV"; fi
if [ "$ST" = "118550" ]; then Stanice="NITRA"; fi
if [ "$ST" = "116520" ]; then Stanice="PARDUBICE"; fi
if [ "$ST" = "116430" ]; then Stanice="PEC_POD_SNEZKOU"; fi
if [ "$ST" = "118260" ]; then Stanice="Piestany"; fi
if [ "$ST" = "119274" ]; then Stanice="Piestany"; fi
if [ "$ST" = "114500" ]; then Stanice="MIKULKA_PIZEN"; fi
if [ "$ST" = "114480" ]; then Stanice="PLZEN_LINE"; fi
if [ "$ST" = "119340" ]; then Stanice="Poprad_Tatry"; fi
if [ "$ST" = "117350" ]; then Stanice="PRADED_MOUNTAIN"; fi
if [ "$ST" = "115670" ]; then Stanice="KBELY_PRAHA"; fi
if [ "$ST" = "115200" ]; then Stanice="LIBUS_PRAHA"; fi
if [ "$ST" = "117480" ]; then Stanice="PREROV"; fi
if [ "$ST" = "117500" ]; then Stanice="PREROV"; fi
if [ "$ST" = "116590" ]; then Stanice="PRIBYSLAV"; fi
if [ "$ST" = "118670" ]; then Stanice="PRIEVIDZA"; fi
if [ "$ST" = "119273" ]; then Stanice="Prievidza"; fi
if [ "$ST" = "114230" ]; then Stanice="PRIMDA"; fi
if [ "$ST" = "117300" ]; then Stanice="SERAK"; fi
if [ "$ST" = "119030" ]; then Stanice="Sliac"; fi
if [ "$ST" = "119275" ]; then Stanice="Sliac"; fi
if [ "$ST" = "119330" ]; then Stanice="STRBSKE_PLESO"; fi
if [ "$ST" = "119760" ]; then Stanice="STROPKOV_TISINEC"; fi
if [ "$ST" = "116830" ]; then Stanice="SVRATOUCH"; fi
if [ "$ST" = "119380" ]; then Stanice="TELGART"; fi
if [ "$ST" = "115380" ]; then Stanice="TEMELIN"; fi
if [ "$ST" = "114380" ]; then Stanice="TUSIMICE"; fi
if [ "$ST" = "115020" ]; then Stanice="USTI_NAD_LABEM"; fi
if [ "$ST" = "116790" ]; then Stanice="USTI_NAD_ORLICI"; fi
if [ "$ST" = "118650" ]; then Stanice="ZILINA"; fi
if [ "$ST" = "118410" ]; then Stanice="ZILINA_HRICOV"; fi
if [ "$ST" = "117740" ]; then Stanice="HOLESOV"; fi
if [ "$ST" = "116690" ]; then Stanice="Polom"; fi
if [ "$ST" = "118560" ]; then Stanice="Mochovce"; fi
if [ "$ST" = "118190" ]; then Stanice="Jaslovske_Bohunice"; fi
if [ "$ST" = "118010" ]; then Stanice="Malacky"; fi
if [ "$ST" = "119550" ]; then Stanice="Presov"; fi
if [ "$ST" = "119580" ]; then Stanice="Kojovska_Hola"; fi
if [ "$ST" = "119520" ]; then Stanice="Poprad_Ganovce"; fi

A=${Yst}
B=${Yend}
for Y in `seq $A $B`; do
for M in 01 02 03 04 05 06 07 08 09 10 11 12; do
K="$Y$M"

if [ "$M" = "01" ]; then C=31; fi ## Number of days in the month
if [ "$M" = "02" ]; then C=28; fi
if [ "$K" = "189602" ]; then C=29; fi
if [ "$K" = "190402" ]; then C=29; fi
if [ "$K" = "190802" ]; then C=29; fi
if [ "$K" = "191202" ]; then C=29; fi
if [ "$K" = "191602" ]; then C=29; fi
if [ "$K" = "192002" ]; then C=29; fi
if [ "$K" = "192402" ]; then C=29; fi
if [ "$K" = "192802" ]; then C=29; fi
if [ "$K" = "193202" ]; then C=29; fi
if [ "$K" = "193602" ]; then C=29; fi
if [ "$K" = "194002" ]; then C=29; fi
if [ "$K" = "194402" ]; then C=29; fi
if [ "$K" = "194802" ]; then C=29; fi
if [ "$K" = "195202" ]; then C=29; fi
if [ "$K" = "195602" ]; then C=29; fi
if [ "$K" = "196002" ]; then C=29; fi
if [ "$K" = "196402" ]; then C=29; fi
if [ "$K" = "196802" ]; then C=29; fi
if [ "$K" = "197202" ]; then C=29; fi
if [ "$K" = "197602" ]; then C=29; fi
if [ "$K" = "198002" ]; then C=29; fi
if [ "$K" = "198402" ]; then C=29; fi
if [ "$K" = "198802" ]; then C=29; fi
if [ "$K" = "199202" ]; then C=29; fi
if [ "$K" = "199602" ]; then C=29; fi
if [ "$K" = "200002" ]; then C=29; fi
if [ "$K" = "200402" ]; then C=29; fi
if [ "$K" = "200802" ]; then C=29; fi
if [ "$K" = "201202" ]; then C=29; fi
if [ "$M" = "03" ]; then C=31; fi
if [ "$M" = "04" ]; then C=30; fi
if [ "$M" = "05" ]; then C=31; fi
if [ "$M" = "06" ]; then C=30; fi
if [ "$M" = "07" ]; then C=31; fi
if [ "$M" = "08" ]; then C=31; fi
if [ "$M" = "09" ]; then C=30; fi
if [ "$M" = "10" ]; then C=31; fi
if [ "$M" = "11" ]; then C=30; fi
if [ "$M" = "12" ]; then C=31; fi

X=${C} ## number of ays in the month X=C
for D in ${C}; do ## last day in month
## for D in `seq -w $C`; do ## daily
for H in 23; do
## DATE="${Y}${M}${D}${H}" ## for html files in a directory
## cat ${PATH}CRSK/${DATE::4}/${DATE::6}/${DATE::8}/${DATE::8}.htm

## for html files from web server
URL="http://www.ogimet.com/cgi-bin/gsynres?lang=en&ind=${ST::5}&ndays=${X}&ano=${Y}&mes=${M}&day=${D}&hora=${H}&ord=DIR&Send=Send" ## Ogimet daily data for one month
wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/<[^>]*src=\"\/sigw\/\([^>\"]*\)\"[^>]*>/\1/g" | sed "s/<td>/\t/g" | sed "s/<[^>]*>//g" | tr "\n" " " | sed "s/  *sw_nsw/ sw_nsw/g" | sed "s/    */\n/g" | grep "png" | sed "s/^\([0-9]*\)\/\([0-9]*\)/\2.\1.${Y}/" | sed "s/ \([^s]\)/;\1/g" | sed "s/ /;/" | sed "s/sw_bruma_noche.png/X/g" | sed "s/sw_bruma.png/X/g" | sed "s/sw_nsw_azul.png/XX/g" | sed "s/sw_nsw_azul_noche.png/XX/g" | sed "s/sw_cubierto.png/Z/g" | sed "s/sw_cubierto_noche.png/Z/g" | sed "s/sw_lunanub.png/PJ/g" | sed "s/sw_nuboso.png/PJ/g" | sed "s/sw_luna.png/J/g" | sed "s/sw_sol.png/J/g" | sed "s/sw_lluvia.png/DE/g" | sed "s/sw_lluvia_noche.png/DE/g" | sed "s/sw_lluvia_d.png/DEl/g" | sed "s/sw_lluvia_d_noche.png/DEl/g" | sed "s/sw_llovizna_noche.png/DEm/g" | sed "s/sw_llovizna.png/DEm/g" | sed "s/sw_nieve.png/SN/g" | sed "s/sw_nieve_noche.png/SN/g" | sed "s/sw_nieve_d.png/SNl/g" | sed "s/sw_nieve_d_noche.png/SNl/g" | sed "s/sw_aguanieve.png/DE_SN/g" | sed "s/sw_aguanieve_noche.png/DE_SN/g" | sed "s/sw_lluvia_eng.png/DE_LD/g" | sed "s/sw_lluvia_eng_noche.png/DE_LD/g" | sed "s/sw_chu_lluvia.png/DP/g" | sed "s/sw_chu_lluvia_noche.png/DP/g" | sed "s/sw_chu_nieve.png/SP/g" | sed "s/sw_chu_nieve_noche.png/SP/g" | sed "s/sw_aguanieve_gran.png/DE_SNk/g" | sed "s/sw_aguanieve_gran_noche.png/DE_SNk/g" | sed "s/sw_cb_sct.png/CB/g" | sed "s/sw_cb_sct_noche.png/CB/g" | sed "s/sw_relampago.png/BL/g" | sed "s/sw_relampago_noche.png/BL/g" | sed "s/sw_tormenta_noche.png/Bourka/g" | sed "s/sw_tormenta.png/Bourka/g" | sed "s/sw_niebla.png/FG/g" | sed "s/sw_niebla_noche.png/FG/g" | sed "s/sw_niebla_eng.png/FG_Freezing/g" | sed "s/sw_niebla_eng_noche.png/FG_Freezing/g" | sed "s/sw_cencellada.png/FG_Ice/g" | sed "s/sw_cencellada_noche.png/FG_Ice/g" | sed "s/sw_cri_hielo_noche.png/SN_Cristals/g" | sed "s/sw_cri_hielo.png/SN_Cristals/g" | sed "s/sw_nieve_gran.png/SNk/g" | sed "s/sw_nieve_gran_noche.png/SNk/g" | sed "s/sw_calima.png/Opar/g" | sed "s/sw_calima_noche.png/Opar/g" | sed "s/----//g" | sed "s/---//g" | sed "s/;--;/;;/g" | sed "s/;-;/;;/g" | sed "s/;Tr;/;0.04;/g" >> ${Stanice}_${ST}.csv
sleep ${Pause}

done ## hours
done ## days
done ## months
done ## years
## fi
sleep ${Pause2}
CU=`date -d today`
echo "Ready ${Stanice} ${ST} $CU"
done ## stations

echo 'READY CZECH AND SLOVAKIA'
« Ostatnia zmiana: Kwiecień 24, 2014, 17:41:41 pm wysłana przez TommyAst »


Offline TommyAst

  • Junior Member
  • **
  • Wiadomości: 160
  • Reputacja: 18
Odp: Ogimet reap from html to txt or csv
« Odpowiedź #2 dnia: Kwiecień 24, 2014, 17:32:57 pm »
THIS POST IS FOR REAP VARIOUS TZPES OF OGIMET DATA FROM HTML TO TXT OR CSV FILES, WITHOUTH COLLUMS SORTING. LIST OF STATIONS IS FOR CZECH AND SLOVAKIA, YOU CAN INSERT STATION LIST FOR POLAND OR ANOTHER COUNTRY.

HERE IS THE CODE FOR BASH. YOU HAVE TO INSTALLED GAWK, SED, WGET ON MACHINE WITH LINUX.

#!/bin/bash

W="/home/grb/procyon/Earth_Observatory/POC/" ## CHOOSE MAIN DIRECTORY ===============================================================================

## Ogimet daily summaries withouth collums sorting:
## wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/<[^>]*src=\"\/sigw\/\([^>\"]*\)\"[^>]*>/\1/g" | sed "s/<td>/\t/g" | sed "s/<[^>]*>//g" | tr "\n" " " | sed "s/  *sw_nsw/ sw_nsw/g" | sed "s/    */\n/g" | grep "png" | sed "s/^\([0-9]*\)\/\([0-9]*\)/\2.\1.${Y}/" | sed "s/ \([^s]\)/;\1/g" | sed "s/ /;/" | sed "s/sw_bruma_noche.png/X/g" | sed "s/sw_bruma.png/X/g" | sed "s/sw_nsw_azul.png/XX/g" | sed "s/sw_nsw_azul_noche.png/XX/g" | sed "s/sw_cubierto.png/Z/g" | sed "s/sw_cubierto_noche.png/Z/g" | sed "s/sw_lunanub.png/PJ/g" | sed "s/sw_nuboso.png/PJ/g" | sed "s/sw_luna.png/J/g" | sed "s/sw_sol.png/J/g" | sed "s/sw_lluvia.png/DE/g" | sed "s/sw_lluvia_noche.png/DE/g" | sed "s/sw_lluvia_d.png/DEl/g" | sed "s/sw_lluvia_d_noche.png/DEl/g" | sed "s/sw_llovizna_noche.png/DEm/g" | sed "s/sw_llovizna.png/DEm/g" | sed "s/sw_nieve.png/SN/g" | sed "s/sw_nieve_noche.png/SN/g" | sed "s/sw_nieve_d.png/SNl/g" | sed "s/sw_nieve_d_noche.png/SNl/g" | sed "s/sw_aguanieve.png/DE_SN/g" | sed "s/sw_aguanieve_noche.png/DE_SN/g" | sed "s/sw_lluvia_eng.png/DE_LD/g" | sed "s/sw_lluvia_eng_noche.png/DE_LD/g" | sed "s/sw_chu_lluvia.png/DP/g" | sed "s/sw_chu_lluvia_noche.png/DP/g" | sed "s/sw_chu_nieve.png/SP/g" | sed "s/sw_chu_nieve_noche.png/SP/g" | sed "s/sw_aguanieve_gran.png/DE_SNk/g" | sed "s/sw_aguanieve_gran_noche.png/DE_SNk/g" | sed "s/sw_cb_sct.png/CB/g" | sed "s/sw_cb_sct_noche.png/CB/g" | sed "s/sw_relampago.png/BL/g" | sed "s/sw_relampago_noche.png/BL/g" | sed "s/sw_tormenta_noche.png/Bourka/g" | sed "s/sw_tormenta.png/Bourka/g" | sed "s/sw_niebla.png/FG/g" | sed "s/sw_niebla_noche.png/FG/g" | sed "s/sw_niebla_eng.png/FG_Freezing/g" | sed "s/sw_niebla_eng_noche.png/FG_Freezing/g" | sed "s/sw_cencellada.png/FG_Ice/g" | sed "s/sw_cencellada_noche.png/FG_Ice/g" | sed "s/sw_cri_hielo_noche.png/SN_Cristals/g" | sed "s/sw_cri_hielo.png/SN_Cristals/g" | sed "s/sw_nieve_gran.png/SNk/g" | sed "s/sw_nieve_gran_noche.png/SNk/g" | sed "s/sw_calima.png/Opar/g" | sed "s/sw_calima_noche.png/Opar/g" | sed "s/----//g" | sed "s/---//g" | sed "s/;--;/;;/g" | sed "s/;-;/;;/g" | sed "s/;Tr;/;0.04;/g" >> ${Stanice}_${ST}.csv

## save html page to tmp.html ## wget --user-agent="Mozilla/5.0" -qO- "${URL}" >  tmp.html
## reap values to processed.txt ##  wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/<[^>]*src=\"\/sigw\/\([^>\"]*\)\"[^>]*>/\1/g" | sed "s/<[^>]*>//g" | tr "\n" " " | sed "s/  *sw_nsw/ sw_nsw/g" | sed "s/    */\n/g" | grep "png" | sed "s/^\([0-9]*\)\/\([0-9]*\)/\2.\1.${Y}/" | sed "s/ \([^s]\)/;\1/g" | sed "s/ /;/" >processed.txt

Yst=1999 ## first year
Yend=2013 ## second year
Pause='2.36' ## pause after every file
Pause2='25.54' ## pause between stations

## now choose which kind of data are you going to parsing: ===========================================================================================
## WO="${W}Ogimet_Data/GSOD/"
## WO="${W}Ogimet_Data/GSOD_ST/"
## WO="${W}Ogimet_Data/Ogimet_Daily/"
WO="${W}Ogimet_Data/Ogimet_HOD/"
## WO="${W}Ogimet_Data/Ogimet_Climat/"

mkdir -p "${WO}CRSK/"
cd "${WO}CRSK/"

for ST in 117230 114140 117820 115180 118160 117220 115460 116240 117660 114231 115410 114060 119160 114570 115090 119276 118800 116930 115400 116480 118580 119930 114232 114870 119680 119271 116360 116280 116980 114233 114234 116030 119180 119300 119270 117100 117870 114180 114640 119780 116920 118550 116520 116430 118260 119274 114500 114480 119340 117350 115670 115200 117480 117500 116590 118670 119273 114230 117300 119030 119275 119330 119760 116830 119380 115380 114380 115020 116790 118650 118410 117740 116690 118560 118190 118010 119550 119580 119520; do ## ID stanice
## station list

if [ "$ST" = "117230" ]; then Stanice="Brno_Turany"; fi  ## list of station ID and name
if [ "$ST" = "114140" ]; then Stanice="Karlovy_Vary"; fi
if [ "$ST" = "117820" ]; then Stanice="Ostrava_Mosnov"; fi
if [ "$ST" = "115180" ]; then Stanice="Praga_Ruzyne"; fi
if [ "$ST" = "118160" ]; then Stanice="Bratislava_Ivanka"; fi
if [ "$ST" = "117220" ]; then Stanice="SOKOLNICE_BRNO"; fi
if [ "$ST" = "115460" ]; then Stanice="ROZNO_C_BUDEJOVICE"; fi
if [ "$ST" = "116240" ]; then Stanice="CASLAV"; fi
if [ "$ST" = "117660" ]; then Stanice="CERVENA"; fi
if [ "$ST" = "114231" ]; then Stanice="CESKE_BUDEJOVICE"; fi
if [ "$ST" = "115410" ]; then Stanice="CESKE_BUDEJOVICE"; fi
if [ "$ST" = "114060" ]; then Stanice="CHEB"; fi
if [ "$ST" = "119160" ]; then Stanice="CHOPOK"; fi
if [ "$ST" = "114570" ]; then Stanice="CHURANOV"; fi
if [ "$ST" = "115090" ]; then Stanice="DOKSANY"; fi
if [ "$ST" = "119276" ]; then Stanice="Dolny_Hricov"; fi
if [ "$ST" = "118800" ]; then Stanice="DUDINCE"; fi
if [ "$ST" = "116930" ]; then Stanice="DUKOVANY"; fi
if [ "$ST" = "115400" ]; then Stanice="HOSIN"; fi
if [ "$ST" = "116480" ]; then Stanice="HRADEC_KRALOVE"; fi
if [ "$ST" = "118580" ]; then Stanice="HURBANOVO"; fi
if [ "$ST" = "119930" ]; then Stanice="KAMENICA_NAD_CIROCH"; fi
if [ "$ST" = "114232" ]; then Stanice="KBELY"; fi
if [ "$ST" = "114870" ]; then Stanice="KOCELOVICE"; fi
if [ "$ST" = "119680" ]; then Stanice="Kosice"; fi
if [ "$ST" = "119271" ]; then Stanice="Kosice_Barca"; fi
if [ "$ST" = "116360" ]; then Stanice="KOSTELNI_MYSLOVA"; fi
if [ "$ST" = "116280" ]; then Stanice="KRAMOLIN_KRESIN"; fi
if [ "$ST" = "116980" ]; then Stanice="KUCHAROVICE"; fi
if [ "$ST" = "114233" ]; then Stanice="KUNOVICE"; fi
if [ "$ST" = "114234" ]; then Stanice="LIBEREC"; fi
if [ "$ST" = "116030" ]; then Stanice="LIBEREC"; fi
if [ "$ST" = "119180" ]; then Stanice="LIESEK"; fi
if [ "$ST" = "119300" ]; then Stanice="LOMNICKY_STIT"; fi
if [ "$ST" = "119270" ]; then Stanice="LUCENEC"; fi
if [ "$ST" = "117100" ]; then Stanice="LUKA"; fi
if [ "$ST" = "117870" ]; then Stanice="LYSA_HORA"; fi
if [ "$ST" = "114180" ]; then Stanice="MARIANSKE_LAZNE"; fi
if [ "$ST" = "114640" ]; then Stanice="MILESOVKA"; fi
if [ "$ST" = "119780" ]; then Stanice="MILHOSTOV"; fi
if [ "$ST" = "116920" ]; then Stanice="NAMEST_NAD_OSLAV"; fi
if [ "$ST" = "118550" ]; then Stanice="NITRA"; fi
if [ "$ST" = "116520" ]; then Stanice="PARDUBICE"; fi
if [ "$ST" = "116430" ]; then Stanice="PEC_POD_SNEZKOU"; fi
if [ "$ST" = "118260" ]; then Stanice="Piestany"; fi
if [ "$ST" = "119274" ]; then Stanice="Piestany"; fi
if [ "$ST" = "114500" ]; then Stanice="MIKULKA_PIZEN"; fi
if [ "$ST" = "114480" ]; then Stanice="PLZEN_LINE"; fi
if [ "$ST" = "119340" ]; then Stanice="Poprad_Tatry"; fi
if [ "$ST" = "117350" ]; then Stanice="PRADED_MOUNTAIN"; fi
if [ "$ST" = "115670" ]; then Stanice="KBELY_PRAHA"; fi
if [ "$ST" = "115200" ]; then Stanice="LIBUS_PRAHA"; fi
if [ "$ST" = "117480" ]; then Stanice="PREROV"; fi
if [ "$ST" = "117500" ]; then Stanice="PREROV"; fi
if [ "$ST" = "116590" ]; then Stanice="PRIBYSLAV"; fi
if [ "$ST" = "118670" ]; then Stanice="PRIEVIDZA"; fi
if [ "$ST" = "119273" ]; then Stanice="Prievidza"; fi
if [ "$ST" = "114230" ]; then Stanice="PRIMDA"; fi
if [ "$ST" = "117300" ]; then Stanice="SERAK"; fi
if [ "$ST" = "119030" ]; then Stanice="Sliac"; fi
if [ "$ST" = "119275" ]; then Stanice="Sliac"; fi
if [ "$ST" = "119330" ]; then Stanice="STRBSKE_PLESO"; fi
if [ "$ST" = "119760" ]; then Stanice="STROPKOV_TISINEC"; fi
if [ "$ST" = "116830" ]; then Stanice="SVRATOUCH"; fi
if [ "$ST" = "119380" ]; then Stanice="TELGART"; fi
if [ "$ST" = "115380" ]; then Stanice="TEMELIN"; fi
if [ "$ST" = "114380" ]; then Stanice="TUSIMICE"; fi
if [ "$ST" = "115020" ]; then Stanice="USTI_NAD_LABEM"; fi
if [ "$ST" = "116790" ]; then Stanice="USTI_NAD_ORLICI"; fi
if [ "$ST" = "118650" ]; then Stanice="ZILINA"; fi
if [ "$ST" = "118410" ]; then Stanice="ZILINA_HRICOV"; fi
if [ "$ST" = "117740" ]; then Stanice="HOLESOV"; fi
if [ "$ST" = "116690" ]; then Stanice="Polom"; fi
if [ "$ST" = "118560" ]; then Stanice="Mochovce"; fi
if [ "$ST" = "118190" ]; then Stanice="Jaslovske_Bohunice"; fi
if [ "$ST" = "118010" ]; then Stanice="Malacky"; fi
if [ "$ST" = "119550" ]; then Stanice="Presov"; fi
if [ "$ST" = "119580" ]; then Stanice="Kojovska_Hola"; fi
if [ "$ST" = "119520" ]; then Stanice="Poprad_Ganovce"; fi

## A=${Yst}
B=${Yend}
for Y in `seq $A $B`; do
for M in 01 02 03 04 05 06 07 08 09 10 11 12; do
K="$Y$M"

if [ "$M" = "01" ]; then C=31; fi ## Number of days in the month
if [ "$M" = "02" ]; then C=28; fi
if [ "$K" = "189602" ]; then C=29; fi
if [ "$K" = "190402" ]; then C=29; fi
if [ "$K" = "190802" ]; then C=29; fi
if [ "$K" = "191202" ]; then C=29; fi
if [ "$K" = "191602" ]; then C=29; fi
if [ "$K" = "192002" ]; then C=29; fi
if [ "$K" = "192402" ]; then C=29; fi
if [ "$K" = "192802" ]; then C=29; fi
if [ "$K" = "193202" ]; then C=29; fi
if [ "$K" = "193602" ]; then C=29; fi
if [ "$K" = "194002" ]; then C=29; fi
if [ "$K" = "194402" ]; then C=29; fi
if [ "$K" = "194802" ]; then C=29; fi
if [ "$K" = "195202" ]; then C=29; fi
if [ "$K" = "195602" ]; then C=29; fi
if [ "$K" = "196002" ]; then C=29; fi
if [ "$K" = "196402" ]; then C=29; fi
if [ "$K" = "196802" ]; then C=29; fi
if [ "$K" = "197202" ]; then C=29; fi
if [ "$K" = "197602" ]; then C=29; fi
if [ "$K" = "198002" ]; then C=29; fi
if [ "$K" = "198402" ]; then C=29; fi
if [ "$K" = "198802" ]; then C=29; fi
if [ "$K" = "199202" ]; then C=29; fi
if [ "$K" = "199602" ]; then C=29; fi
if [ "$K" = "200002" ]; then C=29; fi
if [ "$K" = "200402" ]; then C=29; fi
if [ "$K" = "200802" ]; then C=29; fi
if [ "$K" = "201202" ]; then C=29; fi
if [ "$M" = "03" ]; then C=31; fi
if [ "$M" = "04" ]; then C=30; fi
if [ "$M" = "05" ]; then C=31; fi
if [ "$M" = "06" ]; then C=30; fi
if [ "$M" = "07" ]; then C=31; fi
if [ "$M" = "08" ]; then C=31; fi
if [ "$M" = "09" ]; then C=30; fi
if [ "$M" = "10" ]; then C=31; fi
if [ "$M" = "11" ]; then C=30; fi
if [ "$M" = "12" ]; then C=31; fi

X=${C} ## number of days, for monthly file use X=C
for D in ${C}; do ## number of days in the month
## for D in `seq -w $C`; do ## use for one file for one day
for H in 23; do
## DATE="${Y}${M}${D}${H}"

## cat ${WO}CRSK/${DATE::4}/${DATE::6}/${DATE::8}/${DATE::8}.htm ## this is link to html files - use for html files in the disc

## Ogimet daily
## URL="http://www.ogimet.com/cgi-bin/gsynres?lang=en&ind=${ST::5}&ndays=${X}&ano=${Y}&mes=${M}&day=${D}&hora=${H}&ord=DIR&Send=Send" ## Ogimet Data Mesicni
## wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/<[^>]*src=\"\/sigw\/\([^>\"]*\)\"[^>]*>/\1/g" | sed "s/<td>/\t/g" | sed "s/<[^>]*>//g" | tr "\n" " " | sed "s/  *sw_nsw/ sw_nsw/g" | sed "s/    */\n/g" | grep "png" | sed "s/^\([0-9]*\)\/\([0-9]*\)/\2.\1.${Y}/" | sed "s/ \([^s]\)/;\1/g" | sed "s/ /;/" | sed "s/sw_bruma_noche.png/X/g" | sed "s/sw_bruma.png/X/g" | sed "s/sw_nsw_azul.png/XX/g" | sed "s/sw_nsw_azul_noche.png/XX/g" | sed "s/sw_cubierto.png/Z/g" | sed "s/sw_cubierto_noche.png/Z/g" | sed "s/sw_lunanub.png/PJ/g" | sed "s/sw_nuboso.png/PJ/g" | sed "s/sw_luna.png/J/g" | sed "s/sw_sol.png/J/g" | sed "s/sw_lluvia.png/DE/g" | sed "s/sw_lluvia_noche.png/DE/g" | sed "s/sw_lluvia_d.png/DEl/g" | sed "s/sw_lluvia_d_noche.png/DEl/g" | sed "s/sw_llovizna_noche.png/DEm/g" | sed "s/sw_llovizna.png/DEm/g" | sed "s/sw_nieve.png/SN/g" | sed "s/sw_nieve_noche.png/SN/g" | sed "s/sw_nieve_d.png/SNl/g" | sed "s/sw_nieve_d_noche.png/SNl/g" | sed "s/sw_aguanieve.png/DE_SN/g" | sed "s/sw_aguanieve_noche.png/DE_SN/g" | sed "s/sw_lluvia_eng.png/DE_LD/g" | sed "s/sw_lluvia_eng_noche.png/DE_LD/g" | sed "s/sw_chu_lluvia.png/DP/g" | sed "s/sw_chu_lluvia_noche.png/DP/g" | sed "s/sw_chu_nieve.png/SP/g" | sed "s/sw_chu_nieve_noche.png/SP/g" | sed "s/sw_aguanieve_gran.png/DE_SNk/g" | sed "s/sw_aguanieve_gran_noche.png/DE_SNk/g" | sed "s/sw_cb_sct.png/CB/g" | sed "s/sw_cb_sct_noche.png/CB/g" | sed "s/sw_relampago.png/BL/g" | sed "s/sw_relampago_noche.png/BL/g" | sed "s/sw_tormenta_noche.png/Bourka/g" | sed "s/sw_tormenta.png/Bourka/g" | sed "s/sw_niebla.png/FG/g" | sed "s/sw_niebla_noche.png/FG/g" | sed "s/sw_niebla_eng.png/FG_Freezing/g" | sed "s/sw_niebla_eng_noche.png/FG_Freezing/g" | sed "s/sw_cencellada.png/FG_Ice/g" | sed "s/sw_cencellada_noche.png/FG_Ice/g" | sed "s/sw_cri_hielo_noche.png/SN_Cristals/g" | sed "s/sw_cri_hielo.png/SN_Cristals/g" | sed "s/sw_nieve_gran.png/SNk/g" | sed "s/sw_nieve_gran_noche.png/SNk/g" | sed "s/sw_calima.png/Opar/g" | sed "s/sw_calima_noche.png/Opar/g" | sed "s/----//g" | sed "s/---//g" | sed "s/;--;/;;/g" | sed "s/;-;/;;/g" | sed "s/;Tr;/;0.04;/g" >> ${Stanice}_${ST}.csv

## hourly withouth weather symbols and withouth collum sorting:
URL="http://www.ogimet.com/cgi-bin/gsynres?ind=${ST::5}&lang=en&decoded=yes&ndays=${X}&ord=DIR&ano=${Y}&mes=${M}&day=${D}&hora=${H}"
## wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/<[^>]*src=\"\/sigw\/\([^>\"]*\)\"[^>]*>/\1/g" | sed "s/<td>/\t/g" | sed "s/<[^>]*>//g" | tr "\n" " " | sed "s/  *sw_nsw/ sw_nsw/g" | sed "s/    */\n/g"
## hourly with weather symbols and withouth collum sorting:
wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/png/png \n/g" | sed "s/sw_/^sw_/g" | sed "s/\">/^/g" | sed "s/<\//^/g" | sed "s/png/png^/g" | cut -d "^" -f 2-5 | sed "s/\/sigw\//\n/g" | sed "s/@sw/@^sw/g" | sed "s/TD>/^TD>^/g" | sed "s/-----^/^-----^/g" | sed "s/TABLE>/^TABLE>^/g" | sed "s/^@/@/g" | sed "s/^^/@/g" | sed "s/^^/@/g" | sed "s/@^/@/g" | sed "s/sw_/^sw_/g" | sed "s/@^/^/g" | sed "s/@/^/g" | sed "s/<tr>/tr>/g" | sed "s/tr>/^@^/g" | cut -d "^" -f 2 | sed "s/TD>//g" | sed "s/TR>/@/g" | sed "s/TABLE/@/g" | tr "\n" " " | sed "s/@ @/@/g" | tr "@" "\n" | grep "[0-9]" |  wget --user-agent="Mozilla/5.0" -qO- "${URL}" | sed "s/png/png \n/g" | sed "s/sw_/^sw_/g" | sed "s/\">/^/g" | sed "s/<\//^/g" | sed "s/png/png^/g" | cut -d "^" -f 2-5 | sed "s/\/sigw\//\n/g" | sed "s/@sw/@^sw/g" | sed "s/TD>/^TD>^/g" | sed "s/-----^/^-----^/g" | sed "s/TABLE>/^TABLE>^/g" | sed "s/^@/@/g" | sed "s/^^/@/g" | sed "s/^^/@/g" | sed "s/@^/@/g" | sed "s/sw_/^sw_/g" | sed "s/@^/^/g" | sed "s/@/^/g" | sed "s/<tr>/tr>/g" | sed "s/tr>/^@^/g" | cut -d "^" -f 2 | sed "s/TD>//g" | sed "s/TR>/@/g" | sed "s/TABLE/@/g" | tr "\n" " " | sed "s/@ @/@/g" | tr "@" "\n" | grep "[0-9]" | gawk /-----/ | tr " " ";"  | grep "${M}/" | grep "/${Y}" | sed "s/png;;/png /g" | sed "s/sw_bruma_noche.png/X/g" | sed "s/sw_bruma.png/X/g" | sed "s/sw_nsw_azul.png/XX/g" | sed "s/sw_nsw_azul_noche.png/XX/g" | sed "s/sw_cubierto.png/Z/g" | sed "s/sw_cubierto_noche.png/Z/g" | sed "s/sw_lunanub.png/PJ/g" | sed "s/sw_nuboso.png/PJ/g" | sed "s/sw_luna.png/J/g" | sed "s/sw_sol.png/J/g" | sed "s/sw_lluvia.png/DE/g" | sed "s/sw_lluvia_noche.png/DE/g" | sed "s/sw_lluvia_d.png/DEl/g" | sed "s/sw_lluvia_d_noche.png/DEl/g" | sed "s/sw_llovizna_noche.png/DEm/g" | sed "s/sw_llovizna.png/DEm/g" | sed "s/sw_nieve.png/SN/g" | sed "s/sw_nieve_noche.png/SN/g" | sed "s/sw_nieve_d.png/SNl/g" | sed "s/sw_nieve_d_noche.png/SNl/g" | sed "s/sw_aguanieve.png/DE_SN/g" | sed "s/sw_aguanieve_noche.png/DE_SN/g" | sed "s/sw_lluvia_eng.png/DE_LD/g" | sed "s/sw_lluvia_eng_noche.png/DE_LD/g" | sed "s/sw_chu_lluvia.png/DP/g" | sed "s/sw_chu_lluvia_noche.png/DP/g" | sed "s/sw_chu_nieve.png/SP/g" | sed "s/sw_chu_nieve_noche.png/SP/g" | sed "s/sw_aguanieve_gran.png/DE_SNk/g" | sed "s/sw_aguanieve_gran_noche.png/DE_SNk/g" | sed "s/sw_cb_sct.png/CB/g" | sed "s/sw_cb_sct_noche.png/CB/g" | sed "s/sw_relampago.png/BL/g" | sed "s/sw_relampago_noche.png/BL/g" | sed "s/sw_tormenta_noche.png/Bourka/g" | sed "s/sw_tormenta.png/Bourka/g" | sed "s/sw_niebla.png/FG/g" | sed "s/sw_niebla_noche.png/FG/g" | sed "s/sw_niebla_eng.png/FG_Freezing/g" | sed "s/sw_niebla_eng_noche.png/FG_Freezing/g" | sed "s/sw_cencellada.png/FG_Ice/g" | sed "s/sw_cencellada_noche.png/FG_Ice/g" | sed "s/sw_cri_hielo_noche.png/SN_Cristals/g" | sed "s/sw_cri_hielo.png/SN_Cristals/g" | sed "s/sw_nieve_gran.png/SNk/g" | sed "s/sw_nieve_gran_noche.png/SNk/g" | sed "s/sw_calima.png/Opar/g" | sed "s/sw_calima_noche.png/Opar/g" | sed "s/----//g" | sed "s/---//g" | sed "s/;--;/;;/g" | sed "s/;-;/;;/g" | sed "s/;Tr;/;0.04;/g" >> ${Stanice}_${ST}.csv
## sed "s/png;/png /g" | sed "s/png;/png /g" | sed "s/png;/png /g" | sed "s/png;/png /g"

## for GSOD data from station:
## URL="http://www.ogimet.com/cgi-bin/gsodres?lang=en&mode=0&state=Czec&ind=${ST::5}&ord=DIR&ano=${Y}&mes=${M}&day=${D}&ndays=${X}"
## wget -qO- "${URL}" | sed "s/<img/<TD @/g" | grep "TD" | sed "s/font color=/@/g" | sed "s/>----</><@----><font></g" | sed "s/ align=\"center\" bgcolor=/><@></g" | cut -d "@" -f 2 | sed "s/src=\"\/sigw/\"\#000000\">/g" | sed "s/\/sw_/sw_/g" | sed "s/\" alt=\"F\">/<\/font><\/TD>/g" | sed "s/TD align=/font/g" | sed "s;\(/[0-9]*<\);\1font><;" | sed "s/  align=\"center\" colspan/><@></g" | sed "s/ bgcolor=\"/<\/font><\/\&TD>/g" | grep font | sed "s/><\"\#/\&/g" | cut -d "<" -f 1 | sed "s/\&\([A-Z]*\">\)/>\&/g" | cut -d ">" -f 2 | tr "\n" ";" | tr "\&" "\n" | tail -n+2 | sed "s/sw_niebla.png;/FG /g" | sed "s/sw_lluvia.png;/DE /g" | sed "s/sw_nieve.png;/SN /g" | sed "s/sw_tormenta.png;/TS /g" | sed "s/sw_nsw_azul.png;//g" |  sed "s/  ;/%/g" | tr "%" "\n" >>  ${Stanice}_${ST}.csv

## for GSOD from lisl of whole state for one day:
## URL="http://www.ogimet.com/cgi-bin/gsodres?lang=en&mode=1&state=${ST}&ind=&ord=DIR&ano=${Y}&mes=${M}&day=${D}&ndays="
## wget -qO- "${URL}" | sed "s/<img/<TD @/g" | sed "s/comienzo del epilogo/@\"\#000000\">>TD<\/font><\/TD>/g" | grep "TD" | sed "s/font color=/@/g" | sed "s/>----</><@----><font></g" | sed "s/ align=\"center\" bgcolor=/><@></g" | cut -d "@" -f 2 | sed "s/src=\"\/sigw/\"\#000000\">/g" | sed "s/\/sw_/sw_/g" | sed "s/\" alt=\"F\">/<\/font><\/TD>/g" | sed "s/TD align=/font/g" | sed "s;\(/[0-9]*<\);\1font><;" | sed "s/  align=\"center\" colspan/><@></g" | sed "s/ bgcolor=\"/<\/font><\/\&TD>/g" | sed "s/<\/a>/<\/font>/g" | grep font | sed "s/;\">/@\">\%/g" | sed "s/><font>/@/g" |  sed "s/><\"\#//g" | sed "s/\"\#/@/g" | cut -d "@" -f 2 | cut -d ">" -f 2 | cut -d "<" -f 1 | tr "\n" ";" | tr "%" "\n" | sed "s/.png;;;/.png;\%/g" | tr "%" "\n" | tail -n+2 | sed "s/sw_niebla.png;/FG /g" | sed "s/sw_lluvia.png;/DE /g" | sed "s/sw_nieve.png;/SN /g" | sed "s/sw_tormenta.png;/TS /g" | sed "s/sw_nsw_azul.png;//g"

## Parsing page with extremes - ogimet data ranging, ,rank 1 or n (first tip XXX and Q), or reap all walues in the page with extremes,  maximumis  512  (second tipt XXX and Q)
## XXX=`cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>;@;g' | sed 's;<[^>]*;@;' | sed 's;[()/,];;g' | grep "^ *1 " | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' | sed 's/ ; /;/g' | tr " " "_" | sed 's;_@;@;g' | tr "@" ";"`
## XXX=`cat ${day}.htm | tr "\n" " " | sed 's; *<tr;\n<tr;g' | grep Td | cut -d\> -f 3,6,11 | sed 's;<[^>]*>;@;g' | sed 's;<[^>]*;@;' | sed 's;[()/,];;g' | cut -d "&" -f 1 | tr "\n" ";" | tr "/" " " | sed 's/  */ /g' | sed 's/ mm//g' | sed 's/ ; /;/g' | tr " " "_" | sed 's;_@;@;g' | tr "@" ";" | sed 's/;;/;/g' | sed 's/;1;/;@;1;/g' | sed 's/_1;/@;1;/g' | sed 's/;_/;/g' | sed 's/_;/;/g'`
## Q=`echo "${out}@${XXX}" | sed 's/@_/;/g' | sed 's/\;\;/;/g'`
## Q=`echo "${out};${XXX}" | sed 's/\;\;/;/g'`

sleep ${Pause}

done ## hours
done ## days
done ## months
done ## years
## fi
sleep ${Pause2}
CU=`date -d today`
echo "Ready ${Stanice} ${ST} $CU"
done ## all stations ready

echo 'READY CZECH AND SLOVAKIA'

Offline TommyAst

  • Junior Member
  • **
  • Wiadomości: 160
  • Reputacja: 18
Odp: Ogimet reap from html to txt or csv
« Odpowiedź #3 dnia: Maj 19, 2015, 18:50:45 pm »
Ogimet SYNOP reap for whole period hour by hour (1.1.1999-15.5.2015) ready.
Because of you have to wait about 220 before request, it takes very long time.

Dataset aviable here:

http://uloz.to/xgRj1dkU/ogimet-synop-by-termins-hourly-7z

Ogimet SYNOP data hour by four from whole aviable period (1.1.1999-15.5.2015):
Folders:   6194
Files:      143509
Size:             8415755333
Compressed:   1020407924
Format - TXT undecoded SYNOP