ArcGIS Map Server Cache in Amazon S3
Amazon S3 Tile Cache loads quicker for everyone
We’ve been loading our tile caches in Amazon S3 for quite some time now and it looks like others are trying to take advantage of the service. I’ve come to the conclusion that using S3 for your tile cache makes a ton of sense for performance and reliability issues. Our S3 tile caches are more reliable than our file servers in serving up the tiles and do it so much faster. Is anyone else noticing the benefits of S3 or has is been problematic for you?


S3 is great for tile caches, the only problem with it is that the 2 connection limit of most web browsers ensures that only 2 tiles are loaded at a time. There is no mechanism with S3 to “alias” several domain names to the same bucket (e.g.: tile1.example.com, tile2.example.com, …) to get around this browser limitation. Maybe someday Amazon will allow that.
James, have you done any experimentation with CloudFront?
http://aws.amazon.com/cloudfront/
This would allow for faster access for global customers, but it would also work around the multiple “alias” restriction that Matt brought up, as you’re able to define up to 10 CNAMEs for each distribution.
Been using S3 for a while now for tile caching and very happy with the results. The only complaints I have heard have been from users in areas where Amazon doesn’t have infrastructure nearby (i.e., not North America or Europe). Seems there is latency and what not.
http://www.bitcurrent.com/amazons-new-cdn-more-than-just-footprint-in-asia/
CloudFront seems to have promise, but file updates to the service are not automatically propagated throughout to all of the servers…at least not yet according to Amazon.
http://paulstamatiou.com/2008/12/08/how-to-getting-started-with-amazon-cloudfront
“The next issue with CloudFront deals with origin to edge server communication. CloudFront grabs files from the origin server (S3) when it sees a new file that the edge servers don’t yet have, but other than that it won’t necessarily update all edge servers the instant a file is modified (and retains the same name). It can take up to a day for all edge servers to have the same file. As Wayne Pan mentioned, the best solution is to version your files and give your application the logic it needs to be able to change the files it uses, rather than rely on the same file and same file name and have different CloudFront edge servers potentially serve up different versions of the same file.”
Hey,
I have been using Google App Engine for tile storing for 4 months. I’m very happy with the performance of tile serving, but I do not know the performance of S3 to compare.
I have also created 2-4 instance of same tile set and serve from different services (or urls). This handles the browser limitations.
hey all
@alper – you can assign multiple cname entries for a single GAE application. you do have to register and associate a domain first but then you can add multiple sub-domains that all point at your main GAE app (say tile1.yourdomain.com, tile2.yourdomain.com etc). this allows you to have a single cache with multiple subdomain entry points to get around the browser limits
http://code.google.com/appengine/articles/domains.html
http://www.google.com/support/a/bin/answer.py?answer=91080&hl=en
S3 is great for tile caches (Arc2Earth will automatically publish to an S3 account) but I’ll agree with Matt and Jason on the single sub-domain access. I asked them about this when S3 was first available, I think there was a good security/technical reason why they did not offer multiple subdomain and did not plan to in the future. (wish I could remember it right now
CDN – full auto update of the cache would be great, in the mean time you could setup a simple EC2 app to publish against. It writes to S3 and then manually informs the CDN about updates (not just newly added tiles)
cheers
brian
Hey Brian,
Thanks for tip. There is also another reason to use multiple application, which is not to pass the CPU and other limits
By the way, we are looking forward to see the new version of Arc2Earth, “Cloud Services”. Is there any released date for beta version?
Thx.
I am migrating my SVG mapping applications to an EC2 machine running TileCache. The tiles are stored on various S3 buckets and so far the performance is better than our server. What would be nice is for ArcMAP to connect to something like WMS-c or TMS directly.
@alper – yea, they are rolling out the GAE billing now (beta) so quota limits won’t be an issue going forward. but a clever idea anyway
A2E Cloud Services are coming along very well, I have about 5 blog posts piled up that I will get out next month.
@bruce – tilecache on EC2 is great. fwiw, although its not built in to ArcMap, Arc2Earth V2.1 Map Tile Layer allows you create your own tile layer now (before it was hard coded to VE, Yahoo etc). So you can add any url template and it will download/cache/display the tiles as long as they conform to the commonly used world mercator spec. (anything you can display in GM/VE)
We have been doing this for quite some time now for our clients and it has been working out great. Check out my blog post where I outlined a interesting case study. We use bucket explorer to upload our caches and it allows you to set permissions on the entire bucket and then everything put in that bucket inherits those permissions.
http://www.roktech.net/devblog/index.cfm/2008/8/14/Map-Cache-Tile-Hosting
I’m not sure how many pipes the js api will handle, but we have used 4 ’tileservers’ with no problem. So, we use our servers in conjunction with S3, make for very fast and reliable tile loads.
Just as a note, you can get at least two domains with s3 proper, making for 4 synchronous requests from IE7 and FF2 (and 12 from most modern browsers — see http://stevesouders.com/ua/)
To get both urls going with default s3, just follow the instructions at http://docs.amazonwebservices.com/AmazonS3/latest/index.html?VirtualHosting.html for linking up your cname to your bucket, and then use both it and the s3.amazonaws.com domain.
This assumes, of course, that you don’t mind your clients potentially noticing one of your tile urls having s3 in it for the extra push.
I’m great to see this post. I have never got chance to store caches in S3 or any other external servers My question is how much safe (i.e legally) for our data. Sorry for this naive question but still like to know more about this.
Anyone with advice on how to integrate tiles hosted on s3 into an application built on the esri javascript api and dojo?
I’m certain that some of our departments would be interested in this idea, but we are just barely digging into the javascript api at this point.
To focus my previous question… anyone know how to set up the rest endpoints on s3?
I realized that once this is done, you just have to set the tileserver options when you add your tiled layer and you are all set.
Brett,
We ended up writing a windows program that takes a series of parameters (folder on the server, location on Amazon, etc.) and then converting the disk based folder structure to the rest endpoint folder structure, and load it all into s3 at once. The only difference in the column, row folder/file names is: The rest folders are decimal values, and the disk folders are in hex.