Sharing Data – Downloads Are The Key

Last week, Directions Magazine had a podcast about sharing data. The question was APIs or downloads. Now of course I’m partial to data downloads as I work for WeoGeo, but before I worked here, I was a big proponent of raw data. Personally, I believe it is one of the best ways for citizens to keep track of their government (local to federal). Not that I’m wearing a tin foil hat, but stats are built to lie and APIs tend to deliver what their “owners” want them to do. Raw data means everyone has an opportunity to check each other’s work. Of course, raw data can be manipulated as well, but it is harder to obscure.

Mike Daisey

Data in APIs can make a nice story, but they don’t always tell the truth

But APIs do serve a purpose, they allow developers to work with data that they might not understand. Let’s say for example that there is a great dataset of Superfund cleanup locations that is in Esri File Geodatabase format. Most of us reading this blog post know exactly what to do with that, either load it into our Esri tools, or use OGR to convert it to another format. But Joe Developer wouldn’t know that you need to use either and even if the instructions are there, GIS tools are very difficult to use. Thus even if you have raw downloads, they might not be useful at all for anyone.

One of the biggest reasons I joined WeoGeo was to work on how we could allow organizations (commercial or government) to share data in raw formats, but allow the users of the data to convert them into formats that are actually useful for their needs. Thus a user presented with an Esri FGDB could deliver that in a KML or CSV where they would then be able to use tools that they are familiar with to get the data ready for their use. Because this happens on the server side of WeoGeo, the end user doesn’t worry about what the native format is, just the format/formats they care about.

Content is King

Consuming Data

We’ve worked really hard with our partners to enable the power of enterprise strength ETL. We use our APIs to deliver raw data downloads, not formatted structured data that you must shoehorn into your workflows. This is important because it gives you the power to use the data as you see fit, not as how some developer of the APIs thinks you should. Clearly we at WeoGeo are focused on location data, but there are tons of other datasets that should also be available as bulk downloads by organizations.

Now I don’t want to make too much of a stink about APIs. They do serve a purpose and there is nothing wrong with having them if the data is available for download first. In fact it is probably a great idea to offer both, data downloads for those who want to work with the data and APIs for those who want to stick points on a map.

If you are looking at a great and simple way to deliver data downloads on multiple platforms, you can get started with WeoGeo and share your data right away.

  • Casey McLaughlin

    James — what part do you think usable meta-data plays in raw data?  I’ve seen lots of data used incorrectly, especially when locational accuracy isn’t factored into an analysis.  One issue with dumping raw data out there is people taking without knowing its limits or purpose.  I see crowd-sourced data very similarly in that its limits aren’t known.  Modelers can cry “garbage in, garbage out” but what comes out when you don’t know what you’re putting into your process?  Again this SHOULD be the responsibility of the data consumer but what part should/can the data owner play in making sure data is used within the proper context?  I’m all for (okay, mostly for) just putting data out there, but what “costs” come with that philosophy?

  • James Ashton

    Timely comment for work we are planning around here.  I am glad you seem to have settled on both – as there are usually distinct and very different clients for the data.  Accommodating everyone, or to maximize uptake will take extra effort.

    There are those that need the raw data for their purposes but we cannot forget that most ordinary citizens just want a map of the data, not the data.  Let face it most people are not GIS analysts, and users can be lazy,  and they want it now…. it pays to automate how you feel your clients should see in the data.

    Although the Open Government pushes in North America are advocating making things available (raw, or ESRI centric formats complete with necessary but nauseating ISO metadata) perhaps it will be the “popular” consumable  already generated webmaps of the open data  that can help justify capability and presence to inform citizens.

    James