[personal profile] lehser
After reading Google Books: A Metadata Train Wreck on Language Log, I suddenly realized that I've already put in my time listening to complaints about the quality of Google Books metadata.  Only, I put in my time before it actually happened.

I've got (I think) a pretty good case that I know how a bunch of those errors crept in, and a much weaker circumstantial case that beginning to negotiate with Google about scanning the library's books caused the circumstances that inevitably produced those errors.  Let me see if I can make those cases.


Her name was (not) Margaret.  She came over most Sundays to play music at my house.  She'd worked for more than 20 years cataloging at the University of Michigan library (and had finally attained a postdoc's salary).  She mostly worked in one of the special collections.  But suddenly, either very late in 2003 or early in 2004, her usual work was brought to a screeching halt.  Apparently, completely computerizing the card catalog had just become a class-one, all-hands emergency.  According to what I heard from Margaret there was no explanation given, just pressure to get it done fast.

"What's the all-fired hurry, anyway?" grumbled Margaret, week after week, for half an hour on end.  "My boss looks at the boxes on my desk, and *screeches* at me to go faster.  My co-workers get through more boxes, but they don't care *at* *all* about getting it right!"  Other rants came from an involuntary move from hourly pay (with paid overtime) to salaried (with mandatory, uncompensated) overtime.  (Later on, I think I remember some references to a related "top-secret" project, but I may be inserting that.) 

Eventually, my 50-plus-year-old friend took a second job, tending the gas station at a Meier's south of town.  She worked evenings and/or weekends, leaving her exhausted during the week.  During which time, of course, the pressure to enter cards from the card catalog into the computer never relented. 


Now, Margaret is only one person, at one university library, and she's a careful person who cared about the quality of her work.  Presumably she couldn't have single-handedly introduced "hundreds of thousands" of errors into the Google Books metadata.  But I suspect her circumstances were far from atypical.   (And, frankly, I haven't gone through the estimation process, to consider how many cards she *might* have handled, what a reasonable error rate might be, etc.)  And bear in mind, this is all second-hand knowledge.

For one thing, the project of, essentially, producing the UofM's share of that metadata has all the hallmarks of killing quality.  The classic project quadrangle is budget, schedule, scope, and quality, of which it's possible to constrain (at most) three.  Budget: check--mandatory, uncompensated overtime and no new librarians.  Schedule: check--unreasonable deadlines and all.  Scope: check--digitize all of the card catalog.  Quality: ends up in the dirt, or worse.

The best(?) part is that I suspect that the Google Books scanning project caused all this fuss to begin with.  Wikipedia (http://en.wikipedia.org/wiki/Google_Book_Search#2004) reminds us that Google announced a partnership with UofM (among others) to digitize all their books over the next several years.  I have no knowledge, but strongly suspect that negotiations had been going on for some time ahead of that.  My common sense also tells me that the money that Google was giving to the university (or the library, but money is fungible) would start flowing sooner or faster if more of the books were ready to scan.  Which means that their metadata was available digitally.

So if Google Books metadata is a train wreck, well, the UofM set the switches, and set the train's speed, but Google incentivized running on time, not safety.  And now it has to clean up the mess, pretty much by itself. 



June 2010

131415 16171819

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 25th, 2017 11:27 am
Powered by Dreamwidth Studios