Let’s Save Metadata
Metadata
When you see the word metadata I’m sure you begin to sweat. You get that lump in your throat and suppressed memories bubble to the surface (none of which are good).

Now it isn’t hard to think about why, metadata as we’ve been exposed to is just not human readable and thus barely human usable. Working in the government sector as a consultant exposed me to the worst two words that any DoD consultant can be exposed to; “metadata required”.
We deal with four letter acronyms all the time right? FGDC Even the website is built on Plone which of course feels more like Ivy League research project than the traditional SharePoint website we’d all expect from a government website. One should be scared navigating it and trying to find information. Anyway what about metadata as we’ve been utilizing it (FGDC or ISO) is just so painful?
Machine Readable vs. Human Readable
So FGDC or ISO metadata is complex, but there could be good reasons for this. They both try and address every conceivable possibility that might need describing in geo-data. If both were primarily designed for allowing servers to talk with each other, I’m not sure any of us would have any problem with it (nor would we really be looking for it). But servers rarely read and write metadata on their own without human interaction. Thus the reality of the situation is we poor humans have to ingest and parse metadata regularly.
<XML>Yikes</XML>
Well this brings me to what I see as the biggest problem with metadata. It is almost always in XML format. Now don’t get me wrong, XML does have its purpose. In fact I could list probably thousands of times that XML is the right answer. Sometimes it works and works well, other times you end up with a whole bunch of brackets and text that blends together. With a good eye you can parse out what you need, but there is so much noise there that it almost feels like a “Where’s Waldo” exercise. But XML does do a good job of organizing data for machines, but it doesn’t do it in ways that are easily readable.
What Human Readable Metadata Should Focus on
So some person sends you a dataset for a project you are working on. There are some questions you want answered before you commit to using the dataset:
- Who is responsible for the dataset?
- What is the dataset representing?
- When was it created?
- Where are its extents (projection, datum, etc)?
- How was it created?
- Why was it created?
The problem with metadata today is those questions are hard to parse out of metadata. If you know what to search for you might be able to find it relatively quickly, but the simple fact is that if I want to see the those answers above for a dataset, they should be exposed to me first.
Metadata Style Sheets
One way people have tried to make FGDC metadata (and ISO to some extent) more readable is through the use of style sheets. Many ESRI users are exposed to this inside their ArcCatalog. That drop-down list that lets you choose different ways of viewing the metadata is a style sheet selector. This means that you can take that ugly XML metadata and parse it out in ways that are easier to read. I’ve not seen much in the way of usability improvements on this front. At WeoGeo we offer human readable metadata on our dataset information pages. Others are doing it as well, but there is really no standard as to how this should be organized.
So Who Cares About FGDC/ISO?
Honestly you really shouldn’t care. You should care though about getting information describing the data you are working with. I think most of the issue with both metadata standards is that they are just too hard to input data into and too hard to get out the relevant information. Committee designed standards such as these always end up being way too much for real world use. We need to make sure we get the who, what, when, where, how and why of the dataset and to do this we need to look at the geo-data creation tools and how they help us input metadata. Data creators should have an easy time filling out those 6 things about their data. The issues are in the weeds of the metadata standards. But out on the fringes of the metadata requires, creation tools (such ArcCatalog) can help us manage things. Databases should be tracking who created the data (their name/address/etc), when it was last modified, any look up tables, aliases for field names, links to additional information and anything else that is being used for that dataset. Not having to track all that down gives the creator of the data enough focus to make the who, what, when, where, how and why so much better than they would if they had to enter everything.
And on the display end of things, I’d like to see UI experts work at creating better human readable metadata style sheets that hide the details that you don’t need to see at first glance and expose what we as uses of data need at first glance. It is easy enough to expand the details “below the fold” of a metadata page.
What Now?
It is up to all of us. We are stuck with the metadata standards so changing them at this point isn’t feasible. At WeoGeo we’re committed to working on bringing complex/detailed FGDC/ISO metadata to users in easy to digest methods. What I’d like to hear though is from others trying to crack this same nut and see if we can collaborate on this more and in this age of NSDIs still have usable metadata for people to make decisions.


So, you shouldn’t care about the standards unless you’re required? I can buy into that, if it’ll get more people authoring metadata. I’d be thrilled if we could get the average user to fill in a few fields (accuracy, contact, vintage, restrictions) in the ArcCatalog metadata editor.
But how are we going to get the authorship rate up without requirements? Pro-metadata propaganda? A worm that infects GIS users, preventing them from sharing data without well-formed metadata?
I complete agree that the UI needs to be improved. Even a stylesheet with just those key elements would be an improvement.
I think some simple like the “who, what, when, where, how, why” is how we start. Those are easy enough to fill out, but you have to work though the tough ArcCatalog interface to get there (or use another metadata authoring tool).
I’m open for suggestions because I think the basic stuff is easy enough to fill out.
That interface kills any amateur attempts at filling out the basics. I can never remember where to put the contact information so that it shows up in the correct spot. If ESRI can make a Metadata Lite editor (using the 5Ws) I guarantee the authorship rate will dramatically increase.
…and it needs to be easy to access, like an option as you right click on the layer.
Couldn’t agree more! If it was EASY ACCESS from ArcMap people would fill it out more often!
Last week it was The Who at halftime, this week a photo reference from Godfather I: I, for one, appreciate the pop culture references aimed directly at middle-aged males.
As for thinking about metadata on a holiday Monday–I’ll get right to it after I get my fill of Olympic Pairs Skating short-program analysis.
BT
Wow this is awfully pragmatic for someone as disruptive as you James.
I do like the idea of using stylesheets to help simplify the process and expose the users to only the stuff that is needed at a minimum. Anything is better than nothing and unfortunately nothing is usually what we get with metadata (or even worse, inaccurate).
Long time ‘listener, first time caller’ here. I think you have a great idea. I would see as more of a tiered metadata approach. There would be maybe 3 tiers of metadata: Simple (your basic columns), intermediate (this would get into more depth) and then fully compliant (FGDC or ISO).
The simple could easily be part of the creation of a shapefile, you know when you define the projection you fill in 6 basic fields. This would suit 90% of the applications.
That would work and with the style sheets you could “convert” the ISO/FGDC compliant metadata into simple very easily and still have the detail for those pencil pushers in Sector 7G.
Actually “compliant” metadata is actually not that many fields and answers most of Jame’s questions. The real issue is that most FGDC editors are built so you can enter ALL the fields. There is no way to make a nice editor that does all the fields to the spec. If someone just pulled out the required fields and their 80% usage we could probably get a decent editor.
Instead of Who, What, When, Where, How, Why, I think a more pragmatic approach is required:
dodge, duck, dip, dive, and dodge
If you can dodge a wrench then you can dodge the responsibility of filling out metadata!
One of my bitches about Metadata is that it’s not really tied closely enough to the data. Metadata should be embedded in the data at the lowest level possible. For instance, I have run into far too many instances where the extent listed in the Metadata is way greater than the actual extent of the data (e.g., try getting some hydro polys for Denver from geodata.gov – you have to sort through every quad sheet in Wyoming because the extent was published incorrectly).
My two cardinal rules of metadata:
Finally, metadata that doesn’t point to a URL/URI to get the actual data should be banned from the Interwebs. It’s meaningless. I’m probably not going to pickup a phone and try to track down some professor from East Bumblefrick State College.
My point exactly.
I think data and metadata should be connected, and ESRI (all others as well!) should build their software thus it demands the information before it alows you to store it.
If not, to many possibilities arise to duck the issue.
I disagree, there’s a number of reason I’d prefer users contact me on some data. Like it’s too large to put on my server, it’s proprietary, highly technical, needs a data sharing agreement, and I know your not going to read the metadata, so I’m going to review it with you.
I like this idea.
The tricky part is getting metadata to travel with the data it describes. Instead of using feature level metadata, perhaps it would be more practical to have tools that convert back and forth between metadata and normal data.
For example, say I concatenate features from different featureclasses each having different “Who” elements. I’d like an easy way to add a Who field and calc it beforehand. Going the other way, I’d like to summarize the Who field into a Who metadata element. I guess then the problem is knowing whose Who.
Maybe you hate XML (or RDF), but it would make sense to have a “namespace” level metadata and an instance-level metadata. I mention RDF because it is already a standard that allows for both. And, for a namespace to be valid, it has to be a resolvable URI.
Heck, even for things like contact info, one could simply use their FOAF URI.
Internally we’ve tackled this issue by building our own little metadata editor which focuses on those same key questions and a few more specifc to our industry.
One trick we did add was the ability to import metadata from another dataset so that previous information is retained and the lineage is documented. It also saves the user a lot of time!
I’d second Cor’s point that if GIS software forced/encouraged metadata entry when creating data we’d see a lot more useful metadata. Even better would be if metadata was maintained through data processing so that if I download a dataset and clipped a portion out, or reprojected, the metadata should be retained from the downloaded data with additional data to document those operations.
The “Who” should be easy. If some of the major vendors coughESRIcough would allow you to enter you contact info when you first run the software, then apply this “Who” data to all new datasets created, then you have the first question already answered. This also goes with “When”.
My problem with metadata is that the main standards seem to have a ton of data redunancy. If I enter myself as the Point of Contact, I still have to enter myself as orignator, metadata Contact. There should be a way to group all three of these. In addition, alot of the fields in the standards are confusing. What I might enter in Logical Consistency Report will be different than what the guy next to me will enter.
The way current metadata works breaks basic database normalization rules. My contact info is stored in every metadata file I’ve created but I can’t change it easily if/when it changes. Using a URI that points to an external identity, via FOAF or OpenID or whatever, eliminates this problem. So I guess this leads to a third rule for metadata:
In the case of URIs, the Semantic Web already has a simple scheme called “sameAs” which can be used to dereference changing roles. For instance, if I get a promotion and find myself in charge of the USGS NHD data, there may be a URI that points to the role of “NHD Steward” that is a simple sameAs triple. By changing it to point to my FOAF rather than who ever is currently NHD Steward, any NHD metadata points to me.
< snark on >
You totally lost me at RDF. What the hell folks – this is path that led us to FGDC. Ready, if we want human readable how about something like:
Name: My frickin name
Address: My address
URL: http//lame0.com
contact address: same as above
Now how hard was that to read. Let’s start easy covering 80% of the use cases and then go from there.
< /snark on>
The usual definition of metadata that it is “data about the data” becomes problematic, because it is often seen as an after-thought of data creation. Workflows with inherent metadata creation in all stages of geo-work should be encouraged if not required.
For my own datasets, I simply add a text file with the following info:
Title
Abstract
Keywords
Identifier/URI
Publisher/Originator
Publication Date
License
Extents
Projection
Format
Scale
Distribution
This may not be machine readable, but I prefer my metadata to be human readable. Machine readability comes secondary, IMO.
I include a pdf version of my metadata with the same name as the data and zip it all up in one bundle
Forget about data exchange – how hard is it to currently keep metadata with internal datasets?
Thank goodness ESRI has now implemented storing geoprocessing logs in the metadata similar to the Workstation logs, but its still very hard to find which month a dataset was created. I still have to use the system date stamps on files and hope I haven’t done a Windoze copy which resets them all. (Only xcopy can save the original dates).
If you create a new dataset, then it would really help if a metadata file was opened with that action recorded, and a who, but it isn’t automated.
Since all the XML formats have extensions, there would be no barrier to adding tags for the rest of us to use.
Has anyone looked at a GPX file’s metadata? All the tags are there, but various translators and exporters just ignore them. Its not just users that are at fault, the software should honour them too.
Metadata again and again… Do we REALLY have a problem? What are the problems? Complexity? Organization with the geodata? Technical issues with the metadata editors? No one likes it?
For me the problem is: Metadata today is complex stuff created from computer nerds for computer nerds. They’ve screwed it up and discussed it to death. The data maintainer has lost the sense for metadata. It’s a “must-do” not a “must-have”.
Unfortunetaly there is no way out. You can begin with easy metadata, for sure. Good luck. But the politics behind the standards pushed the data maintainers the last years to the most complex profile of the most complex ISO-format. Simplicity is futile. You have to fill out everything! It’s bureaucratic.
We (EDINA, The University of Edinburgh) have created a geo-portal and metadata editor tool for UK academia; we are now migrating towards making both support INSPIRE-compliant metadata, basically a life-line from the EU which mandates that data creators create metadata.
Academia a difficult challenge to date despite all the arguments for metadata creation in support of data discovery and data management. We have created a metadata editor tool (GeoDoc) which supports the export of metadata into ISO 19115, FGDC, DDI, Dublin Core and UK GEMINI formats, the latter an ISO 19115 compliant profile for the UK. This includes the functionality to export metadata records into our UK AGMAP (UK academic ISO/UK GEMINI compliant profile) format, which is presented in a user friendly pdf file; we will be extending this to export all standard formats into pdf as part of enhancement of GeoDoc to support INSPIRE metadata, and UK GEMINI 2, an INSPIRE-compliant version for the UK.
A complete nightmare to update UK AGMAP to be INSPIRE and UK GEMINI 2 compliant. INSPIRE is not completely ISO 19115 compliant and UK GEMINI 2 not completely INSPIRE compliant, yet UK AGMAP 2 will comply with INSPIRE, UK GEMINI 2 and ISO 19115.
This reflects the reality of politics and all the rest played within these ‘standards’ committees- they seem oblivious to the GI community? Word is out that there are issues with ISO 19115 and 19139, so goes to show that even the experts are lost! Not sure if all this could implode with the release of the North American Profile (NAP, which will supersede FGDC, but my impression from examining NAP is that it will lead many out of the darkness, plus most countries will use NAP; implementing ISO 19115 requires considerable expertise and time, so not an affordable option for many, especially to create profiles for their specific communities. As noted, it has been a challenge for us, and we anticipate that GeoDoc 2 will be the first INSPIRE-compliant tool to be made available in the UK, but the challenge remains to encourage academics and researchers to create metadata; considerable paranoia surrounding IPR and data sharing, not to mention data derived from the Ordnance Survey products, but this (OS data) under review- more might be made available for free or at a significant reduction in cost. Most importantly for metadata and data sharing, the issue of residual data rights won’t have as great an impact, but still the mindset to change as revealed with this row at the University of East Anglia and the climate data there.
Anyway, a perspective to offer from the UK. Previous experience suggests that the public sector less problematic and more accepting of metadata; private sector recognises the value as well to some degree, but we still face challenges in academia, but at least with OS spatial data review and INSPIRE, we hope that the Go-Geo! portal service and GeoDoc simplify the process to encourage more metadata creation. GeoDoc simplifies the process with drop down lists and map interface to capture bounding box coordinates, plus simple interfaces to export metadata and submit for QC and publication on the GoGeo portal. The mandatory elements presented in UK AGMAP suffice for discovery level metadata, but extended for those wishing to use tool for data management.
Whoever takes up the Sisyphean task of “fixing” metadata must take a page out of GEICO’s book. Simplicity is the key. Unless a caveman can do it, users won’t read or write meaningful metadata. And relevant metadata must be stored and travel with the data.
FGDC to GIS community:
“SO EASY even a Caveman can do it!!”
GIS community:
“Unbelievable! Where’s my coat? …suede, with fringe?”
Atanas,
Very true…as it is the core belief by many who are “not required” to create metadata. There are other reasons why some choose not to create metadata and to distribute it (legal issues, accuracy concerns, wrong data use). Considering everything, we as GIS professionals must realize it is mandatory (like a map maker must provide a legend) for GIS data. Your Simplicity statement is also true. For what TurboTax has done to the tax creation process, we need a tool that will do the same for the metadata creation process. Hopefully, EME is a step in the right direction. Congrats EPA!
Metadata, schmededata
Notwithstanding the faultless reasoning of our august brethren in the Federal Government Department of Casuistry, for the most part most folks don’t share data beyond their own worlds. And if you’re not really sharing or publishing data, then you don’t need to hassle with bureaucratic irrelevancies like FGDC compliance. [as if anyone outside the Federal gov't gives a rip about the FGDC, anyway...but that's another story entirely]
For the most part, metadata is organic – provenance lives in the head of the person working the program. That knowledge is passed along to apprentices or close partners and friends (it’s a social thing) or it goes away.
If a metadata falls in the forest, does anyone care?
Archie,
Imagine if the money to create data was coming out of your pocket. Data is created by one of your employees…they leave…along goes the metadata in their heads. Sometimes no apprentices are there to take over before they leave. Also picture yourself as the person hired to fill in for that person. “WTF is this data and how accurate is it” might be your first response.
“If a metadata falls in the forest, does anyone care?”…. I personally do. Take a step back and look at metadata and it’s importance to data. If you need further influence on how valuable it can be, watch Antique Roadshow…you’ll notice items with documented history go for a lot more money.
Sorry, modification to Archie statement:
“If a metadata falls in the forest, does anyone care?”
Yes…if me and my data are directly under it.
My comments are not refuting the reasons why metadata should be used [it's essential, really] but are a realistic view from the general state of practice across the industry as a whole.
Pockets of intelligence exist, and generally within workgoups or well-organized enterprises.
But SHARE that metadata? I think not.
Not unless someone pays to have it published and maintained and can resolve the issue of accessing my data to which it applies.
The real story here isn’t metadata so much as it is sharing and publishing data. Metadata is good, but in practice it’s little used because most folks don’t publish their data very widely.
The FGDC is proposing a new tag for compliant metadata that should easily relay the salient points about the data. The tag is <WTF>Enter WTF info</WTF>. The committee is said to be very excited about this new tag.
Annnnnd…full circle.
The original FGDC spec, circa ’94-’95 did not specify any type of machine-readable format. You might recall that the spec was the “Content Standards for Digital Geospatial Metadata.” When they said “content,” that was precisely what they meant. Valid metadata could be in plain text, rtf, .doc, xml, etc.
This had the twin effects of making it nearly impossible to validate metadata, but also making it such that you could throw the few items of required metadata into a text file and be in compliance with with the 4/94 Executive Order (12906) mandating inclusion of metadata with geospatial data.
Of course, things have changed dramatically since then, particularly once ISO came along.
But the goal should always be to make your data more usable, both internal to your organization and external for sharing purposes.
Very odd revisionist thinking here. Yes FGDC said it was a content standard. But the standard specifies structural relationships among the data elements, so it was not as much of a free-for-all as you say. Moreover the community of people who actually did work on this stuff found, by experimenting, that it did make sense to standardize on a couple of parseable formats (first indented text or SGML, then later XML).
Most importantly, this all happened way before ISO came along, this started in 1995 with the availability of mp.
Three things ESRI could do to help:
use the FAQ style sheet as an input template option, rather than as just an output display option
have a “Frequent Contacts” feature in ArcCatalog that would allow storage of the users’ personal and professional data contacts so that that info can be easily clicked to be embedded into any metadata being written. This is the most repetitive part of the current process, and should be easy to fix
Improve how the metadata Title is constructed. This sounds so basic, but it’s really important over time. Somehow require the user to create a human friendly Title for their data right up front, so that the metadata doesn’t default to some cryptic file name. The human friendly Title could then have two things appended to the end (using commas): Owner/Organization Acronym, and Date. Over time, displaying your data lists using this type of format for the title is very user friendly and informative….
Arc 10 should address some of this (or so i’ve been told). The “3 Tab editor” extension has been a real timesaver for me and is a wonderful tool, but i don’t think it will work post 9.3.1.
re: making good titles
This is a critical point, and not just for metadata. It’s not uncommon for me to encounter a scientific report whose title simply doesn’t give me enough information to know what the report is about. Often the cause of the problem is that the person writing the title assumes that the reader is working within the same context, so for example “this is in a collection of digital orthophoto quads” is an unstated assumption, and the title becomes something like “Lexington, KY”.
Regarding TC Haddad’s reference to the following:
”“Frequent Contacts†feature in ArcCatalog that would allow storage of the users’ personal and professional data contacts so that that info can be easily clicked to be embedded into any metadata being written.”
We have this functionality built into the GeoDoc metadata editor tool which I mentioned in previous posting. GeoDoc also validates records and refers metadata creator to fields which need to be addressed. GeoDoc also requires the user to enter a minimum of 100 characters in the abstract field. Probably one of the more important elements for discovery level metadata, and cannot be automated, this can ensure that the metadata creator is prompted to provide more information than a repeat of the dataset’s title.
Getting back to automating the ‘contact details’ information, these details are generally mandatory, so anything to hasten the process and minimise the pain of creating metadata!
agree with the importance of the Abstract… my point about good Titles was that if properly constructed they function as mini-abstracts, giving a future user the “what-where-who-when” portion of the equation very concisely. “Title” was intended to be human friendly – which is why defaulting to a file name is not a great idea (filename belongs as part of a URI). The software could help the original author construct this and the results would be much improved future user experience.
for example:
why are we distributing extended metadata with the data anymore?
what other systems use such a strategy?
for example most open source software changes frequently, so its never my first choice to go to the docs distributed with it. instead i go to the docs online for that software package. so all i need is the bare bones of metadata locally including a link back to the online source. whats more if you had a GIS data package management system it could go fetch that bare bones material from the online source which could be a wiki page with some FGDC tags associated with the divs (something like that).
Well, if the data you have in hand is not the same as the latest online docs, then you are not using the correct info. I think if you are distributing a set of file based data then a static metadata record to go along with it makes more sense. That was the state of the data when it was sent/downloaded/whatever.
i hear ya, so store the metadata wiki in a revision control system (or build it around a RCS like Trac) that way you can reference the version of the metadata relevant to your version of the data. plus you will be able to then compare what you have to what is the latest greatest.
As a way around this problem (versioning aside!) the simple approach we used was to use ArcCatalogue style sheets which are exported directly into our Open Source Portal for external access & recovery. Parallel to this (capture once use twice) is exporting parsed extracts into HTML for our internal users i.e attached to layers in intranet application
Example of simplicity below:
[Cadastre]
LAYER/S NAME:
Cadastre (multiple)
DATA DESCRIPTION :
District Parcel boundaries, Rail Segments, Road Segments, Hydro segments, Road Centrelines
SOURCE DATE:
~14th of every month
SUPPLIER:
SKM (sourced from LINZ CRS)
CAPTURE METHOD:
Surveyed boundaries and digitised road centrelines
ACCURACY:
Boundaries generally <=2m rural (but can be up to +/- 40m), generally <=0.5m urban (but can be up to +/- 2.5m)
CUSTODIAN/CONTACT:
GIS Team, Ph Ext 8473
OWNERSHIP:
Corporate Services as Stewards
COPYRIGHT:
Crown Copyright Reserved
DISCLAIMER:
Standard one. Optionally can add accuracy information
USAGE:
Can supply Parcels etc. to 3rd Party ONLY if copyright is acknowledged.
MAINTENANCE/UPDATE:
Monthly
COMPLETENESS:
100%
COMMENTS:
LandOnline contains surveying accuracy data used as source.
Parcel layer is combined with property and rating database into the Property layer.
RE: ”my point about good Titles was that if properly constructed they function as mini-abstracts, giving a future user the “what-where-who-when†portion of the equation very concisely. “Title†was intended to be human friendly…….”
You’re absolutely right – very important point. How often I have seen the filename as a title and we provide a 200 page guideline document which provides examples, but it seems many metadata creators ignore this doc. We even provide direct links from each element in the GeoDoc form to that element’s information section in the guidelines, but no one seems to bother.
Anyway, TC Haddad’s point with regards to ‘title’ is spot on! I was suggesting that often the abstract is written more like a title than a description, hence the reason the abstract will fail validation if fewer than 100 characters entered.
If there is any interest in our work- critical comments welcomed, you can find via link through my name or http://www.gogeo.ac.uk
Unfortunately, GeoDoc requires a username and password and currently available only to those in UK academia, though we want to make it open access. I alluded to the problem amongst those in academia who are concerned about releasing any information about research they are conducting, even if describing a dataset created as part of research, so with this authentication service in place, this allows a user of GeoDoc to create metadata which they can store in a private directory. We even offer to set up institutional nodes and considering peer level nodes which allow metadata creators to publish their records for only those with the same affiliation to search and access records.
Now that I recall, I just checked GeoDoc and we do impose a character limit on title as well, though not significant, but metadata creator must enter at least 10 characters for title or GeoDoc will not validate record. Perhaps this should be more, but anyway, it isn’t just imposed on the abstract field, but also title. We have an element for ‘alternative title’ too.
An excellent topic for discussion for the GIS community. First off, a statement. If a map maker produced a map without a legend, that mapmaker would be out of business pdq. Why is it that some GIS organizations think metadata is something that is optional? With the complications of data sharing and how data is used, metadata MUST be part of the data creation process.
I have been involved with the creation of FGDC compliant metadata for over 12 years. I’ve taught users on the easiest way possible to create metadata according to the standard as well. I can go on and on about this subject and it’s critical importance to GIS data, but I will be brief and cause as little pain as possible.
What works? My agency requires all users to create FGDC compliant metadata for any spatial data that is accessable to all on our internal data drives and for anything posted on the web for download. Mandates by the top officials to protect their data holdings is paramount!! It works when you take the “choice” away from the people you employ. If you owned a company that produced a product, would you distribute it without the instruction manual?
After the buy-in from the organization, making the process of creating metadata and distributing it so the GIS community can acquire the data is also critical. This works vice-versa too!! The route I took when creating training materials (using ESRI’s ArcCatalog) is to implement “templates” that are already populated with generic information common among ALL data sets (the stuff that doesn’t change all the time, for ex: company required access/use constraints, basic contact info, etc. Why enter these things each time? Then, you can use the auto-capture capabilities of ArCatalog (other tools have this feature as well) to fill in information about the data set automatically upon importing the template, for ex: spatial domain and coordinate system info, attribute labels, object counts, etc.
The trick is the interface. The FGDC editor in ArcCatalog is intimidating in itself. Some many tabs, so many fields to fill in. We address this with a GIS checklist telling users what tabs to go to and what fields are mandatory, and especially how to fill in the critical “who, what, where, when and why” of the data in the editor. Basically, we wanted to take the decision-making out of the hands of the metadata creator…since they are the creator (most of the time)…they have all we need in their heads…it’s getting it into the editor and in the proper place that’s the challenge. Let the metadata work for you, get as much into the metadata as possible so no question is left for someone to question “what about this?”. Creating compliant metadata according to the federal standard not only protects the organization’s data holdings for years upon years, but makes the data more valuable…period.
From day one (and from what I’ve read on this thread) is a simplified tool is necessary that incorporates “saved/stored” metadata in a database and cut’s out all the fields that are not mandatory to GIS data (though some may argue all the fields are) the Feds do suggest for each organization to go thru the standard and prioritize what’s applicable to your data (besides the mandatory stuff, of course). I had wished ESRI would build this tool, but I believe one is just about ready for all to use. Please try the EPA’s Metadata Editor 3.0 http://www.epa.gov/geospatial/eme.html
It will be embedded in ArcCatalog when you load it. It’s a choice in the editor selection. If you are starting from scratch, it’s the way to go. If you already have metadata on layers and are looking for a user-friendly editor, be careful as some paramaters you set can have a dramatic impact on some of the content already in the metadata…specifically having the EPA synchronizer on. Create a sample personal or file gdb and test it there to see how the tool works. Some modifications are coming and I soon will be implementing this tool. Why? Because everyone in GIS should take metadata seriously as a “best practice”, it’s a critical component of the data set and it’s creation is integrated with the data with some metadata tools (ArcCatalog, Intergraph, EME). I get very angry when I hear some “advanced” GIS professionals say “Oh, no one reads the metadata anyway”. To me it’s a cop-out for a professional because it basically tells me “I don’t know how to create it …so I’ll attack it.) With a tool like EME and loading as much data in before hand, there should only be a handful of fields to fill in if you learn the tool (it may take time, but it will save time and lots of money) With the template I use, we only really need 12 data specific fields to be filled in…the template and auto-capture provide the rest. With a tool like EME, I envision cutting this down even further given the integration of databases.
Don’t worry about ISO yet (it’s largely based on FGDC). The N. American profile isn’t complete yet. If you create metadata using the above tactics, you will be way ahead of the game. There will be a convertor available that will make all your metadata ISO compliant …if you so choose.
Lastly, think of the big picture. GIS technology is advancing and incorporating metadata (written in xml) in ways never envisioned. Metadata isn’t about data documentation anymore, it’s being used as a data discovery mechanism in online mapping services/applications. Be professional and do the right thing. It’s not the horses head in the bed anymore….it’s like having something better in bed with you…okay…I’ve gone overboard now and crossed the metadata geek threshold.
PS: Never read an xml formatted metadata record. Any FGDC compliant tool with also export html files and text files that are much easier to read. Try exporting the FGDC FAQ htm for the many novices out there. It puts all the metadata in a question and answer format that better addresses the “who, what, where, when and why” that we all know and love.
[...] Fee has launched a discussion entitled “Let’s Save Metadata“. Â Unfortunately, metadata mostly gets the short end of the stick by many people developing [...]
You can’t build a simple editor for a metadata spec as ‘rich’ as ISO/FGDC with the ability to essentially repeat an entire metadata doc in itself (CI_Citation) and a plethora of profiles that add descriptive elements. There is an audience for this rich metadata. The EPA metadata editor is simpler because it focuses on a subset of FGDC and limits repeatability of elements.
Dublin Core covers the essential metadata while satisfying Jason’s dodge, duck, dip, dive, and dodge desire. Try this simplified metadata at the Geoportal sandbox. A title and URL are enough for discovery purposes. Or register the ArcGIS or OGC service or KML/RSS link via its URL and the Geoportal will extract whatever it can find from the service directly. No manual entry needed.
To me these self-describing resources (W*S, KML, RSS, ArcGIS REST/SOAP, …) is the way to go as we’re evolving to a web of linked content.
I don’t want a simple editor for ISO/FGDC. I want a simple editor that writes out useful metadata and leave the complex stuff for others.
The problem with supporting ISO/FGDC is you cause people to not fill out ANY metadata. Most of the time we end up with empty or very incomplete metadata which isn’t useful for anyone. Giving a simple option to folks will increase adoption of useful metadata.
Like a TurboTax for Metadata (hoping you find the TurboTax UI at least somewhat simpler than the ArcCatalog ISO editor)? I’ll throw this in the discussion:
A simple editor for metadata as comprehensive as FGDC/ISO will not result in more people completing more metadata.
Metadata (just like code documentation) is and will always be an afterthought.
I don’t accept that at all. People are happy tagging their photos, filling out titles and descriptions without thinking twice about it. Why is that?
I think it is for two reasons:
Solve these two easy problems and we’ll get more people filling it out. Comparing metadata to code documentation is exactly how we got into this in the first place. Metadata is easy to provide, it isn’t because our only choices are FGDC or ISO in many tools. Have a pragmatic choice and I’ll bet you things change.
Why is that? because they only have to fill in the title and some tags (and even those are optional). Hence my statement that completing metadata doesn’t become easier when the requirements from the spec are comples: simplify the metadata requirements. Another reason: ego/pride. they want people to find their photos/videos (whether for fame or increased likelihood of getting ad revenue). That is not the mindset of GIS professionals. They create their content for themselves not thinking about what others may do with it.
Are you the Marten who works on geodata.gov?
If so, I can only say your responses don’t surprise me in the least bit. Totally out of wack with reality.
The point many are making here I think is clear, we need to break out from this failure of metadata and do what feels right, add basic data to existing metadata standards and screw the “required”.
I like your thinking here James. Supporting FGDC/ISO standards underneath will make this simple meta readable by most GIS packages out of the box. I’m going to experiment a little with this next week with some ISO metadata and see how well this concept works.
I’m saying screw the spec. Throw FGDC/ISO out for most uses. Just because a committee says these are standards, that doesn’t mean we as users need to support them, they of course are non-binding.
Why can’t all those points in your second reason be applicable to geospatial data?
In principle they can and for example separating metadata specs between discovery and detailed definition makes sense as well. This view was generated from an auto-generated Dublin Core metadata record for the WMS service when someone registered it with the Geoportal sandbox I mentioned earlier.
Registration is simplified, evaluation for usefulness is supported through interacting with the service directly.
This argument has been around for a while. see my opinion from 2006 (previous life).
http://geoinfo.dlsi.uji.es/geodatos/meta_intv5i7.pdf
http://geoinfo.dlsi.uji.es/geodatos/metapart2_intv5i8.pdf
Main point is that archival, fitness-for-purpose, and discovery metadata are very different animals. Also the world has changed in the past decade: now Google and ArcGIS Online finds your resources without huge user documentation exercises.
Think of all the metadata you could have filled out rather than arguing about it. Better to light a candle than curse the darkness
We have a simple system at work to ensure metadata is created:
We still have old datasets weit patchy metadata but we are doing well with all new creations.
Plus any data that is not modelled (we use a case tool), peer reviewed and metadata’d is approved for immediate destruction.
People soon learn to behave …
Primary source data should have formal metadata. Derivate works require less formal documentation. Project data needs next to nothing as the project itself is the metadata.
Primary data like survey control, aerial photography, parcel base data, etc. better have darn good metadata because so much further data is based on it.
Derivative data and project data can get by with much less. Typically within the course of the project, there methodology they use to come to the conclusion (or data) is the metadata itself.
I hope most people would agree primary source data better have some good documentation that would qualify as metadata. I think most of the gripes about metadata have to do with the feeling that formal metadata needs to be created derivate and project data.
Under “What human readable metadata should focus on” you need to add a biggie: an explanation of field names! Even datasets that have metadata will often forget this.
What do the field names mean, what sort of values should the users expect to see there (and by implication, what should be regarded as indicating a problem), what are the units where the field values are numeric, and what do abbreviations or codes mean. To say, as some here argue, that nobody needs to know this information is pure nonsense.
My sentiments exactly Mr. Schwitzer. Perhaps those that argue against standardized metadata should apply the same logic to the standarized forms when doing their taxes. See where that gets them with the IRS. As long as the “money” to create the data isn’t coming out of their pockets, there will be resistance from those unwilling to do the right “professional” thing and document their data according to the federal standard. Using templates and the capabilities of some of the metadata tools out there, the process to document and produce a robust, detailed metadata record is not has hard as some on this board think.
Under “What human readable metadata should focus on†you need to add a biggie: an explanation of field names! Even datasets that have metadata will often forget this.
Yes, important, and seems to be an issue with regards to ISO and this emphasis on feature catalogues, but then those in respective disciplines must agree and provide, but obviously a slow and painful process with posturing coming into play for defining features.
At least FGDC allows for attribute description; I believe NAP retains this as well?
I think James’ blog sank too low on Hitwise so he launched a metadata thread to get us geo-dorks all spun up and commenting.
It’s a cunning plan, Mr. Fee, but we’re on to you now.
I’ve been involved in some recent work to make metadata easier. There are 3 parts:
A definition of Essential metadata. Some people working on cadastral datasets (primarily PLSS and related content) are defining a spec based on the Content Standard for Digitial Geospatial Metadata. I can’t locate a final document right now but a draft is available here. It’s a pragmatic approach that supports the standards.
Enter common information at the Geodatabase level. While people end up publishing metadata for each dataset, most organizations manage many datasets in a Geodatabase. One good solution is to manage point of contact information and other common elements at the Geodatabase level, and manage the entity and attribute descriptions at the dataset/field level. The only real trick is to select a Geodatabase (top-level, the silver can) in ArcCatalog and open the FGDC metadata editor to enter the common information.
Tools that create html reports, and also individual xml documents for each dataset for publishing purposes (you need the complete metadata doc for each dataset when publishing). We’ve been working on a Cadastral Metadata Reporter that accepts a Geodatabase XML Workspace document as input. Towards the bottom of the page you can see some output from the tools. I wrote a brief description here.
The group is currently working on guidelines and training for Cadastral data publishers. Imagine just having 1 set of phone numbers, dates, etc to manage in your metadata…
Definitely interested in your comments and feedback on the approach.
Leave the blogosphere for a few days and miss all the fun…
As someone that’s written (hopefully) FGDC compliant metadata for dozens of layers, I’d say the biggest problem is the editing tools. There’s nothing wrong with XML on the back end, as long as the authors don’t have to see it.
Automating as much of the process is good, except there’s a lot that ArcCatalog records automatically that I don’t want the public seeing (like my database and server names) that aren’t editable in the default editor, so it’s off to hack the XML manually. You’d be surprised at what you might be distributing in your metadata XML that doesn’t display in any of the default style sheets.
I’m probably wasting my time anyway, as I still get dozens of calls from so-called GIS professionals asking me questions that are clearly answered in the metadata.
Those of you that are 9.4/10 beta testers and haven’t looked at the metadata, or aren’t beta testers, are probably going to be very unhappy when 10 comes out and are faced with ESRI’s “ArcGIS Metadata” and no(?) tools for updating/exporting FGDC. But apparently you can roll your own if you know XAML and ArcObjects…
The editing tools need an overhaul definitely. last I asked, ESRI told me that customers didn’t think metadata was that important so they didnt put much effort into it.
However, metadata also needs to be managed better and hiding it in SDE is a horrible way to do it for those of us in the dark ages still using middleware. Nobody has stepped up and made a small portable XML editor and manager.
We’ve been working with them on that – ArcGIS 10 will have an FGDC export capability (a download, but we will keep pushing them for it to be included in the inevitable 10sp1).
There will also be Python tools to manipulate metadata in scripting (long needed!)
Import/Export capability is key – ESRI can’t provide the functionality we want (refresh bounding box etc) without extending the XML pretty widely, which means to make XML usable by others you need to export to meet a standard like ISO. The new “ArcGIS metadata” is still XML and will be extensible. They have also made some efforts toward a metadata interface being exposed to users casually so there is more hope they will be entering basic information describing the data, and this info is written into the complex XML under the hood without them knowing it.
We have some real issues with abandoning FGDC at this time – the ISO 19139, even if you roll in 19115 to document tables and fields, still is missing some of the good stuff (especially source documents) that FGDC provides. FGDC says they are working on these issues as they finalize a NAP that they will recommend for adoption to replace FGDC.
Metadata in Plain Language articulates a set of questions that metadata (FGDC) records can store. Turning the questions around provides a format in which the metadata can be presented as an FAQ. This work was done in 1998, by the way.
When counseling (consoling?) people who are writing metadata for the first time, I usually point to those questions and ask them, honestly, which of the questions do users of their data not need to know. Sometimes there are things that can be left out–that’s no disgrace. But any real user of the data needs to know, at some point, a lot of this information.
When mp introduced the FAQ output format, I argued that this was not the only friendly presentation of FGDC metadata, but that other people could and should devise additional presentation formats that emphasized some aspects of metadata and de-emphasized others. Even with XSL this hasn’t happened much, though I think our friends at ESRI have done some interesting work within the ArcCatalog viewer interface.
[...] more free flowing approach, then it becomes way less hard. Take metadata for example. James Fee recently wrote about metadata and the challenges of how to make it accessible. When asked about metadata in the [...]