The GIS Interchange File

All too often we have to request people resend datasets to each other because they get blocked by email, one important file gets left off or systems just don’t recognize a file type. I’ve run into a problem today where a company FTP site is rejecting a shapefile because it doesn’t recognize the .shp, .shx, .dbf extensions. I thought I could get around by zipping the data, but it appears to scan the zip file for extension types. So the “solution” was to zip the shapefile, change its extension to .doc and tell the recipient that they need to change the extension back to .zip.

This kind of stuff happens way too often. Personal Geodatabases have the problem of the .mdb extension that is rejected outright by most email systems and other formats aren’t readily usable by folks systems. The “old days” were easy because we all used coverages and shared them via the .e00 format that was almost always acceptable by everyone. Amazing how we take such steps back over time and you’d think data sharing would be easier than it was in 1995.

How do you folks share data? KML, GML, Etch A Sketch, e00, zip, web services, etc?

Update: Jason Birch has some ideas about using SQLite as an interchange format. Well worth the read.


36 Comments

  1. AA says:

    I usually post them on a website (unlinked) and then provide the user with a direct link to the file being transferred. Files can easily be zipped if necessary. They can just grab it using http.

  2. James Fee says:

    That is a safe way of getting around anything.

    My only thought on is that there has to be a better way to exchange GIS files than zipping them up. I know most modern operating systems allow opening of zip files, but it still requires an extra step to use them.

  3. Mac says:

    How about exporting it out as an xml file? The file size may be large, but it will be one file. And you can specify just the table structure(schema) and/or data.

    Mac

  4. Lefty says:

    The problem with XML is that you need something to translate it back and forth. If both don’t have the ability to do so, then its a dead file.

    I think this stuff usually happens when one of the parties of the transfer isn’t technically savy. Most of us reading this blog can probably figure out a way to get around it, but when sharing with non-technical folks, it becomes problematic.

  5. Jeff Konnen says:

    XML Workspace is a good approach but is unfortunately reserved to ArcEditor users. ArcView users cannot use the Geodatabase-XML…

    The future is probably not in sending data but putting up a service as you can do with FME Server, where the user can spefify his format, extent, coord sys and data model in which he wants to download data.

    The data can then be streamed to the user or he may choose to CLIP, ZIP and SHIP the data to get a local copy in the structure of his choice.

  6. Bill says:

    We have taken to zipping up data and changing the extension to “piz”. I then created a file extension handler to handle “piz” files easily.

    It’s like pushing water with a broom. Whatever you pick will soon be filtered by the IA types anyway. It’s their entertainment.

  7. Allen says:

    I’m really interested in hearing what people do to solve this problem. Our organization has had much problems sharing data because of problems.

    I’d love to just share a PDF and have the users be able to import it into their GIS system. We looked at GeoPDF, but it requires too many things to fall into place to work well.

  8. James Fee says:

    Bill, I tried that but the FTP site didn’t recognize piz as a file extension. Why IT folks want to luck things down that much is beyond me.

  9. JC says:

    Outgoing=ZIP; I.T. set us up with our own folder on the FTP server, and all staff have permissions to post; then we just send a link; the only problems we’ve had with this method (unzipping) are with real estate agents that fashion themselves as GIS savvy (but they can’t unzip??), in which case I’ll send a KMZ.

    Incoming=Still a problem… The SPAM catcher routinely rejects various filetypes, including ZIP; but I.T. has added a rule to route GIS-bound rejects into a “quarantine” folder, and we can request the data out of that from I.T…. Still a hassle, and you don’t always know when data was sent to you…

  10. Ryan Arp says:

    I usually FTP to a site or use yousendit.com which emails a link that you email to your email recipient. It has worked pretty well for me so far, but I still end up zipping the file.

    Long live the e00 file..

  11. rmcculley says:

    The problem, as I see it, is that everyone’s needs are different. My share method is determined by what the receiver wants. All to often I wind up exporting to .dxf as the common ground. Otherwise zipped shapefiles (usually uploaded to the webserver like AA). I have only once had someone request gml.

  12. Gretch says:

    My share method is determined by what the receiver wants.

    Yea, but I think that is the problem. There are just way too many file formats out there for no reason. I mean really how many vector formats does GIS really need (I won’t step into raster for now, but even there we’ve got too many)?

    I am hopeful that GML will be that interchange format, but until ESRI gets serious about supporting it, it will just be a bizzare file format.

    Shapefiles really need to be put to death. DBF was a great format in its time, but not in 2008.

  13. Matt Perry says:

    The foolproof combo is zipped shapefiles over authenticated https. (Assuming you have a web server configured for it.)

  14. Evan Brown says:

    I’ve faced many of the challenges listed here already with the same frustration and disillusion. With GIS services being touted so heavily, my organization is looking at a server or services based solution for data sharing. An interesting possibility is the TITAN product from ERDAS.

    The TITAN client software enables users to publish data through the TITAN network. GDAL is used to publish a wide variety of raster and vector formats. Published data can be pushed out to TITAN clients or to other clients via WMS or KML. Users can share data with permissions – either publicly to everyone on the TITAN network or only to specified users or workgroups. I hear that 2009 calls for plans to download data in its native format – that will be very cool. By the way, the TITAN client also consumes OGC WMS WCS and ECWP for imagery.

    The price sure is right – the client is free. Users can publish and share data up to a limit. Check it out at http://www.erdas.com/erdasProductsconnect.aspx

    My organization is really interested in experimenting with TITAN. Such technology eliminates the need for specialized GIS software for our collaborators. The familiar globe user interface used by the TITAN client is becoming ubiquitous. Plus we choose what data is shared and with whom we share it. I think services GIS has great potential for data sharing.

  15. Russ says:

    sneakernet. Anybody worth sharing data with is worth putting it on a cd or dvd, going to visit them and drinking coffee and eating donuts.

  16. zac spitzer says:

    Mapguide has the concept of MGP’s which are basically zip files which back up both the XML and the data (if you stored the datafiles inside the repository)

    They make transferring mapguide stuff around really simple

  17. Jason Birch says:

    Sorry, but take off your sepia-tinted sunglasses :) e00 sucked too back in the day. Compression and endianness? Inter-version compatibility? Double vs. Single precision (PC Arc/Info)?

    Back when e00 was less-than-perfect, you also had to deal with a lot more small GIS packages that had reasonable pockets of penetration (in BC, anyway). Import/export to things like TerraSoft, Intergraph, PaMap, MapInfo and others was a royal pain.

    Maybe there was a golden age in between then and now where everyone was using ESRI and the e00 format was perfect.

  18. Jarlath says:

    What we need is an industry giant like Microsoft to come along, drive everyone out of business, and push us to one format. Remember the days before MS Word, there were a half a dozen word processing formats.

    On a more serious note our university has a nice little file transfer system, it allows a user to upload a file up to 1GB and notifies the recipient of its availability via email and provides a URL to download. Unfortunately, with the size of LIDAR and orthophotos I often end up mailing around external hard drives as a way of sharing data.

  19. Chris M says:

    I know this only works for ESRI 9.X users but the filebased databases seem to be taking on the role formerly played by .e00 files. They are platform independent which is a big draw back to personal GDBs. It also looks like they will be getting some new features come 9.3 as far as replication goes. I am not real sure how it would fare against most e-mail filters.

  20. thrtruth says:

    fax machine and crayons.
    seriously though how about a rar file? tar ball maybe? i usually do the same as AA. e00… yuck!

  21. rmcculley says:

    I think it isn’t possible to have the perfect data exchange format. There are too many variables. Spatial data isn’t just shared amongst GIS software anymore. There’s all the virtual globes, various CAD packages, and graphics packages each with it’s own specific needs.
    I would argue that the problem with GML3 is the complexity involved with trying to be all things to everyone.

  22. AlbertW says:

    rmcculley: Why does it have to be so complex? Vector data should be easy and shapefiles are pretty much that data interchange format (but of course the 3+ files are a problem).

    Maybe KML is that future interchange format, especially since they already have a compressed version (KMZ). Seems logical to me, of course we need more support for KML/KMZ, but with OGC maybe we are there.

  23. ChrisW says:

    Russ – sneakernet is OK so long as your data is not too confidential, but: http://news.bbc.co.uk/1/hi/england/manchester/7269965.stm

    As for interchange mechanisms, er, isn’t that what WMS/WFS etc are supposed to be for?

    Right now we still seem to be at the “downloading the internet to my PC” stage.

    Darn – they’ve just updated the Dilbert page – gotta go download the web again….

  24. Jason Birch says:

    I’d agree that KMZ has the most legs right now, but I’d be reluctant to use it as a definitive exchange format because of the common representation hacks and the inability of most GIS packages to understand the extended attribute data.

    I would love to see SQLite used for this kind of interchange format. It’s got all of the benefits of SHP (open, easy) and none of the drawbacks (multiple files, limited field widths, proprietary projection extension).

    There’s been some recent work in the OGR and FDO open source projects to settle on a common specification for representing spatial data in SQLite. This doesn’t help the proprietary users much, but it’s a step in the right direction.

    I tried to pingback on my recent entry on this, but either James has this turned off or my pingback capabilities are broken. Here’s the URL if you’re interested:

    http://www.jasonbirch.com/nodes/2008/05/06/184/sqlite-for-fdo-with-sugar-free-ogr

  25. James Fee says:

    I turned of the pingback because of spam. I grew tired of it. I’ll throw that link up in the main article Jason.

  26. j says:

    Currently, most of our work is being done in spatial databases, so a lot of it is sending ldf’s and mdf’s back and forth. We also do Oracle occasionally, but SQL Server is dominant in our client base. We also send/receive Manifold .map files to/from clients using that.

    Luckily – we control our own FTP site. This means that we can post what we want and direct clients to the site, or tell clients how to upload to it.

    Cheers.

  27. AA says:

    Seems like everyone is thinking about it:

    http://www.gcn.com/print/27_10/46221-1.html

  28. George Silva says:

    rapidshare outright. safe and easy.

  29. mark says:

    .rar is the best, unfortunately not everyone has winrar. Splitting the files up into several smaller chunks is probably the best way of handling large volumes of data.

  30. Steve Grise says:

    Every ArcGIS Desktop User has “Quick Import” and “Quick Export” tools in the ArcGIS Toolboxes. This allows import/export of Simple Features GML.

    It is not shown by default in the list of Toolboxes but it is there under “Data Interoperability Tools” (not part of the extension so perhaps confusing). With the Data Interop extension you can work with those files directly in ArcMap etc. – and the performance is pretty good.

    For me this is much better than shapefiles because I can do xml/xslt work with GML. I’ve been doing some geology/GeoSciML work lately and SF GML is handy and strangely fun once you get used to xslt.

    ArcGIS 9.3 has some good capabilities on client and server side for SF GML and WFS.

    The other obvious ArcGIS format is FileGDB. You end up with a lot of files within a zip file but a compressed FileGDB is amazingly small. I haven’t seen an email server that strips them.

  31. Robyn says:

    In the spirit of honest blogging- I’m in marketing at YouSendIt and caught Ryan’s comment on my google alerts.

    Technically speaking you guys are WAY over my head- but I do know how to send big files, easily.
    Our desktop application (available, free at http://www.yousendit.com/cms/applications) lets you to just drag and drop files or FOLDERS of any type. (the folders just automatically zip up and send)

    Pretty simple!

    Use the promo code “RHORBP21″ for a free one month upgrade. Feel free to shoot me an email if you get stuck…
    Robyn@yousendit.com

    Good Luck!

  32. jesse says:

    Thanks for the spam, Robyn.

  33. Robyn says:

    Hey, really sorry if that seemed spam-ish.
    I was really just trying to be helpful… thought I was pretty honest about where I was coming from. Just wanted to give you guys access to something that might help… no worries if you’re not interested. And really- if you’ve got any questions fee free to contact me directly.

  34. Gummibärli says:

    Using an interchange, neutral format as Interlis (http://www.interlis.ch) is worth a look, but it goes far beyond only exchanging geodata (modelling, check, etc.).
    It’s pure genius, but trying to replace the ubiquitous Shapefile through Interlis is like promoting esperanto (or interlingua) in international business, nice idea, but hopless…

  35. Doug Nebert says:

    Sending files as email attachments seems an awkward way to go, especially in the days of large data volume and the potential for custom retrieval. The use of OGC Web Feature Service (WFS) is a way to request and retrieve, via HTTP GET or POST, GML features. The Simple Features GML encoding, mentioned earlier, is roughly equivalent to a shapefile, whereas full GML 3.X will allow you to encode more complex structures analogous to a geodatabase. Also, if one sets up the hosting webserver to perform compression, a client that can uncompress on-the-fly can request it that way, or without compression if not. Most GIS packages can now communicate with WFS using GML (and even KML sometimes) to get custom retrievals of vector data.

  36. Matthew Snape says:

    The defacto interchange format is shapefile. The point of an interchange format is that everyone can read it, and support for shp is better than any other format.