bulk downloading www.expedia.com

www.expedia.com is a nice site where you can find maps of the whole world, However, you can not download them - you must browse them online with very uncomfortable web application made by some retard. This is why I tried to download their maps and view them offline. I almost succedeed, here is how I tried to do that that.


First probe. Right click on the image and try to open it in a new window.

Take a look at the URL of image. It must contain the position of the region to display. Manipulating this URL I could view different regions and then automatically download all of them.

But the image has some stupid URL. Apparantely key variable (B08240F0BA8020D3DF16) is somehow encoded position of the region to display. Or maybe - even worse - the position of the region to display is stored server side, and the the key is just an index to the server side stored array.

Ok, try if the key will change if I reload the site. I reload the sait, then open image in new window again...

The key has changed. So most probably the position is stored on the server. So I can not manipulate image's URL to view and download different regions.


Second probe. Again I take a look at the page with my map.

The URL is interesting: http://www.expedia.com/pub/agent.dll?qscr=mmvw&lmap=2&hlbl=0&xofs=3.3333330407003814e-7&yofs=0&clvl=4&msds=EX01228A527Di%24B9%24D3%24B2p%24B9%24D3%24B294001%21701000%214%24FF%2150%21Q%24FF0%218%24FF%24190K%24C3%24A1nmbm%242C.Moumnvfjm%242C.Fwmjo%212%24FF70K%24C3%24A1nmbm%24B23%2436I%24EE%245BD%2440%2414%247BJ%24F4K%24AE%2411%24C0w0001000%214%24FF%24BCK%2150%212%24FF0000%212%24FF%216i%24EE%243F%2414000%216%24FF%21I0&cbak=1. There are two interesting variables: xofs and yofs. Their names sound like the position. I try to change them to (100, 100) and open the page http://www.expedia.com/pub/agent.dll?qscr=mmvw&lmap=2&hlbl=0&xofs=100&yofs=100&clvl=4&msds=EX01228A527Di%24B9%24D3%24B2p%24B9%24D3%24B294001%21701000%214%24FF%2150%21Q%24FF0%218%24FF%24190K%24C3%24A1nmbm%242C.Moumnvfjm%242C.Fwmjo%212%24FF70K%24C3%24A1nmbm%24B23%2436I%24EE%245BD%2440%2414%247BJ%24F4K%24AE%2411%24C0w0001000%214%24FF%24BCK%2150%212%24FF0000%212%24FF%216i%24EE%243F%2414000%216%24FF%21I0&cbak=1

Fine! The map has moved. This way I can view any place in the world. Now I have to automate it with wget.

$ wget -r -l2 'http://www.expedia.com/pub/agent.dll?qscr=mmvw&lmap=2&hlbl=0&xofs=100&yofs=100&clvl=4&msds=EX01228A527Di%24B9%24D3%24B2p%24B9%24D3%24B294001%21701000%214%24FF%2150%21Q%24FF0%218%24FF%24190K%24C3%24A1nmbm%242C.Moumnvfjm%242C.Fwmjo%212%24FF70K%24C3%24A1nmbm%24B23%2436I%24EE%245BD%2440%2414%247BJ%24F4K%24AE%2411%24C0w0001000%214%24FF%24BCK%2150%212%24FF0000%212%24FF%216i%24EE%243F%2414000%216%24FF%21I0&cbak=1'

Filename too long. Try it different way.

$ wget 'http://www.expedia.com/pub/agent.dll?qscr=mmvw&lmap=2&hlbl=0&xofs=100&yofs=100&clvl=4&msds=EX01228A527Di%24B9%24D3%24B2p%24B9%24D3%24B294001%21701000%214%24FF%2150%21Q%24FF0%218%24FF%24190K%24C3%24A1nmbm%242C.Moumnvfjm%242C.Fwmjo%212%24FF70K%24C3%24A1nmbm%24B23%2436I%24EE%245BD%2440%2414%247BJ%24F4K%24AE%2411%24C0w0001000%214%24FF%24BCK%2150%212%24FF0000%212%24FF%216i%24EE%243F%2414000%216%24FF%21I0&cbak=1' -O test.html

Done. Now I'll take a look at it and try to find the imagename in it.

$ less test.html

Scripting must be enabled before you can continue - hey, I can't enable javascript in wget!


So I have to automate opera somehow (it has javascript) and use it to view the map piece by piece and download it to my hd. Then I will use some smart script to connect that pieces together.

I can instruct the opera from command line to open some URL - just:

$ less test.html

But I can't instruct it to save the site or the image.

Well, but the Opera downloads the images it opens - in its cache! So I wrote such dirty script:

#!/usr/bin/perl

sub i2a {
  ($i) = (@_);
  $a = $i;
  if ($i < 1000000) { $a = '0' . $i}
  if ($i < 100000) { $a = '00' . $i}
  if ($i < 10000) { $a = '000' . $i}
  if ($i < 1000) { $a = '0000' . $i}
  if ($i < 100) { $a = '00000' . $i}
  if ($i < 10) { $a = '000000' . $i}
  $a;
}

#for ($y=600; $y<5000; $y = $y + 200) {
#for ($x=-1000; $x<9000; $x = $x + 200) {
$x=100; 
$y=200;
$url = "http://www.expedia.com/pub/agent.dll?qscr=mmvw\&lmap=2\&hlbl=0\&xofs=$x\&yofs=$y\&clvl=8\&msds=1DAC2E74%24845Ew%2480%2427Ew94002%21701000%214%24FF1%21O0%214%24FF000%218%24FF%24180M%24E1laga%242C.Andaluc%24EDa%242C.Spain00%24230Second.Level.Component.Capital.City%24197%24D1%2439%24C3%245CB%2440%24B8%240F%242F%243E%24BB%24B2%2411%24C0%24C8%2170%243E000%24BCM%2170%2414000%214%24FF%21K0\&cbak=1";
#$url="http://www.expedia.com/pub/agent.dll?qscr=mmvw\&lmap=2\&hlbl=0\&xofs=$x\&yofs=$y\&clvl=4\&msds=EX01D9C703C4i%24B9%24D3%24B2p%24B9%24D3%24B294001%21701000%214%24FF%2150%21Q%24FF0%218%24FF%24160Fmbspf%242C.Imsg%242C.Wgsyvbmn%212%24FF60Fmbspf%24186%24E6Yk%2481D%2440%2437zF%24DA%24DE%24E2%2421%24C0.0001000%214%24FFEiz0000%212%24FF0000%212%24FF%216i%24EE%243F%2414000%216%24FF%21I0\&cbak=1";
system("rm -rf /home/piotr/.opera/cache4/*.gif");
system("opera '$url'");
sleep(25);
open(LS, "ls /home/piotr/.opera/cache4/*.gif |");
foreach $fname (<LS>) {
  chomp($fname);
  ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat($fname);
#  print("$fname: $size\n");
  if (($size > 4000) && ($size < 90000)) {
    $xa = &i2a($x+1000);
    $ya = &i2a($y);
    system ("cp $fname mapa-$xa-$ya.gif");
  }
}
print("x=$x y=$y\n");
#}
#}


It clears opera's cache and then open URL. Then it looks on the cache for the gif of the size between 4000 and 9000 bytes and asumes it is the map. This way the script downloads any given ($x and $y variables) pieces of the map.


Now I should download a lot of pieces and then connect them. The problem is that - as I found out - the value that I have to change x to get the exactly neighbouring pieces (without overlapping and blank spaces) is different for different ys. So I gave up, cause I had to go to sleep. But maybe someone will start where I stopped and will successfully download the whole world from www.expedia.com?