If there's one truly ahead-of-the-game area that librarians and archivists can claim, it's data consistency. Consistency and name authority files will propel libraries into deeper semantic engagement. Museums, archives & historical societies will vary and, in some cases, be more difficult, and possibly impossible (???) to engage with.
But, the value of museum & archive collections is HUGE.
That's the interest of Linked Open Data - Libraries, Archives, Museums (LOD-LAM).
The work of @musebrarian and the good folks building the Open-Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH) could be one answer, but moving everyone into that standardized framework might be like pushing a boulder up a cliff.
How can we work with collections as they are?
Google Scholar abandoned OAI-PMH (a traditional library standard for integrating collections) in favor of using sitemaps and metadata embedded in HTML.
But, this is just one approach to organizing data and doesn't include crawling, indexing and ranking or other SEO & data-mining tactics.
My two cents in reply to @musebrarian's questions re: should we build LOD onto of OAI-PMH, I say absolutely. We should leverage all work that's been done and look for solutions to engage the wonkie, disorganized collections.
After working with data from many county historical societies and normalizing it to be exposed in a single portal, I can't help but wonder, is there a solution that doesn't require normalization of collections?
Hi Jenel,
ReplyDeleteCan you say a little more about the kind of normalization you are thinking about? IMLS DCC does some "normalization" of data values for indexing purposes (dates, geographic coverage). But there are other ways to interpret that.
The question may be not "whether" normalization is necessary, but who does it, and for what purposes.
p.s. and at what cost?
ReplyDeleteHi Richard,
ReplyDeleteNearly all the county historical societies digitized collections that I've worked with are not following metadata standards. I was lucky if they were using spreadsheets. Lots of old versions of Word docs that used a return to indicated the next record. But, these documents contain valuable data. Mostly vital statistics (birth & death certificate indexes, marriage records, cemetery indexes), but also object & monograph collection descriptions.
The way we were ingesting the data required 'normalization', which required me to dump the data into a spreadsheet and clean it up so that we could index it in our search engine.
So, and perhaps I'm out in left field here, but I'm wondering if there's a way to maintain the relational aspect of the data without spreadsheeting it out through XML and web services or something and then apply semantic technologies.
I see the main cost as a loss of the relationships in the data, which would pretty much make the data meaningless.
Your thoughts?
Thanks!
jenel