Is FTP Access the Best We Can Do?
I can almost predict that every conversation about data sharing will have one person stand up and declare, “Just give me FTP access and I’ll be fine”. I used to think that way and while I probably still would like file based access to datasets, I just can’t see FTP being a viable data transfer method anymore. Just it makes it easy to grab a data dump, but there isn’t anything that allows users to know if the data has been updated (other than I suppose checking the metadata). So many times I see people using old data because they have no idea data has been updated. Personally I don’t like the idea that I’m offering up spatial data web services for data I don’t control and most others should be worried as well. Users want to grab data from the source, not some middle man who probably knows less about the data than the creator.
There has been a huge jump into SDI since the pork bandwagon started up in Washington and I’ll be honest… I haven’t paid much attention. One thing I am sure of is I don’t want to see something introduced that has two choices, WxS and FTP. Data needs to be both discoverable and usable and I’m not sure WxS and FTP get us there. WxS no matter what defenders might say is not discoverable and FTP is not secure and has no method of tracking changes.
AtomPub to me looks like the best method of publishing and sharing datasets. There is a huge risk here of inventing something new when a superb solution already exists. Workflows change quickly and WxS/FTP can’t adjust sprightly enough. Read “How to GET a Cup of Coffee” and think about how easier this could all be.

Gatekeepers want to limit you to FTP/WxS so that you can't change the world.


From what I saw at the ESRI PUG, if you are talking about sharing Geodata then ESRI has created a better way to share data with ArcGIS 9.3.1. They are releasing a Layer Packages. These are like lyr files that contain the data too. These Layer Packages can then be uploaded to the new ArcGIS Online, sort of a geodata sharing network that can then be retrieve by invitation or by open access to others.
When someone downloads the data they get the layer data and symbology as the data creator intended it. The only issue with this is you still will have the disconnected of edits. But if you go and download the layer package when the creator updates it you can stay somewhat up to date.
Layer Packages don’t do anything more than FTP does for data sharing. You still can’t get updates pushed out.
James, thanks for the “GETTING” coffee link. That explains to me more about what AtomPub is than I ever knew. I can totally see how it is the solution.
Good stuff:
You might also like Webber’s “Does my bus look big in this” talk: http://www.infoq.com/presentations/soa-without-esb.
Interesting, so if AtomPub is as good as it appear, why isn’t anyone using it?
Those nobodies in Redmond are: Windows Live Platform News: Microsoft Standardizes on AtomPub for Web Services and Other Stories.
While we are peanuts compared to the MS-olith we’ve been using AtomPub for federation and data sharing in GeoCommons with great success.
I’m sure “layers” will be very cool when it is launched, but is anything going to be able to read it other than ESRI products. I can’t think it would be anyone’s best interest to create a data sharing destination that is only usable by one vendor. Well of course unless ESRI is building it – i.e. GeoNetwork
The future is certainly not in sharing entire files via FTP, but in setting up services to share data.
FME Server can be used to stream data in whatever format you like so that the users do not have to care about updating their data because it will always be up-to-date.
It can be WMS/WFS but might be any other format you need … It can be geoRSS, geoJSON, geoWhatever ..
Layer packages seem a bit yesterday to me
Let’s all go ahead and email layers around the world …. why not serve them as a datastream rather than emailing them?
I hope these layer packages are only a first step in ESRI’s data sharing efforts.
Jojo: “Interesting, so if AtomPub is as good as it appear, why isn’t anyone using it?”
Google’s data APIs (http://code.google.com/apis/gdata/) are based on Atom too. But what do they know, right?
But I don’t see how Atom gets you round the problem of dealing with local data sets that are no longer in sync with their original source. It doesn’t matter how you request the data extract – Atom REST API, OGC W*S or whatever – or what format it’s in – ESRI geodatabase, GML etc – because once the data is extracted to your local site, you still don’t know if the source has been updated since you grabbed it, unless you explicitly check for this in some way.
And as an ex-database developer new to GIS, it strikes me that this problem is worse in GIS, because of the apparent dependence on big flat file snapshots of data that get passed around, rather than only querying the data you want from a central database when you want it.
But what do I know, right?
Chris,
You are exactly right. Historically the default mode of operation is that Joe “I’m doing a one-off project” GISer is grubbing for any data in any format to get a project done. It is this that creates preconceived notions on how to accomplish data sharing (or more ideally, information sharing).
Sean Gorman: “…of course unless ESRI is building it – i.e. GeoNetwork”
Sorry to seem obtuse, but what’s the link between ESRI and GeoNetwork? I thought GeoNetwork was open-source from the UN FAO.
ChrisW, for the integrity of your enterprise’s data a central database can’t be beat. But we can’t very well put the web, or even significant pieces of it, in one central database. Decentralization is a feature of the Web — not a bug, but the consequence is that you can never take data currency for granted. Your applications have to cache like a browser does.
I have to follow the implications of the INSPIRE directive (not James favourite subject though i’m keen to hear his expanded comments on its pros and particularly its cons!) for my organisation and the country as a whole. The INSPIRE ‘download services’ includes the option for simple download of datasets through http/ftp etc alongside the queryable WFS option. I’m all for supporting AtomPub (looks interesting) and ESRI layer packages but its important we provide various options to potential users some of whom will be more technically inclined/GIS literate and will naturally gravitate towards the new exciting means of discovering/accessing/downloading data, however I suspect the majority of users will be more than happy with a simple file download – it’s important we don’t abandon the simple/accessible solution even if its clearly not the best!
Sean Gillies: “we can’t very well put the web, or even significant pieces of it, in one central database. Decentralization is a feature of the Web”
Not suggesting otherwise – last time I tried downloading the web I just couldn’t get that little hourglass thingy to go away…
I know there are lots of good reasons why GIS people use flat file formats in certain circumstances, even if these are originally extracted from a DB. But from what little I’ve seen, I wonder if this is always necessary or appropriate, and if it isn’t sometimes just a case of “we’ve always done it that way”? It reminds me of a place I worked some years ago, where a major utilities company kept most of their data on spreadsheets scattered throughout the company, and only decided they needed a more disciplined, DB-based approach when they realised they didn’t know how many customers they had and which ones owed them money. The mainstream IT world has been forced through this process of working out where to centralise its data resources, how to manage the use of snapshots etc in a disciplined manner with the available technology, and generally weaning itself away from the idea of using local spreadsheet silos by default. There is a difference between decentralisation and chaos, after all. All I’m saying is that to this outsider it looks like this is still more of a real issue in GIS, perhaps exacerbated by some of the technology which seems to encourage silos, and other comments by more experienced GIS people seem to recognise the same problem. So if my database hammer won’t work on these nails, what’s your solution?