Resource Discovery at UW Libraries: March 2008

Monday, March 31, 2008

Generic View from Nowhere

I stumbled upon this article written by Andrew Abbott of the University of Chicago at Library Juice. So, as is my wont, I will share some of the passages that resonated with me. Again this isn't an exhaustive critique of the paper, just some of the passages that struck me as important to the Resource Discovery Task Force. Note that Abbott speaks only of humanist and social science researchers not scientific researchers. Quotes from the paper are italicized.

Central to that investigation [Future of the Library Task Force] was a study of digital versus physical use of library materials, an analysis which showed clearly what we should have guessed ahead of time -that students who are heavy physical users of the library are also heavy electronic users and vice versa. The idea that electronic research was actually replacing physical research - at least at the expert end of the scale- proved wrong.

I think that this is something to bear in mind. I often fall into the trap of digital versus physical, but perhaps I should really think about heavy versus light users of libraries. Is format even an issue? Will tools librarians' build help?

More broadly, that library researchers have projects with clear designs is a myth. A few library researchers may actually have such clear designs. And the rest of us pretend to have had them after the fact.

Abbott underlines the fact that humanistic research in libraries is a very organic endeavor. There is no clear path through the literature. Browsing and reading are part of the process. A part that librarians, for the most part, are not privy to.

Not only is known item searching a relatively minor part of expert library research, precisely structured research questions are also a relatively minor part of expert library research.

Again, Abbott points to the importance of the practice of browsing for any tools librarians provide.

Everything I could find out about stack behavior in the 1950s indicated that faculty and graduate students weren't using catalogs, even for known-item searches. Nor were they using most of the wonderful apparatus I had written about, built for them by Wilson and ALA and the library profession. They were just wandering into the stacks and trolling. They were indeed standing in the stacks and reading whole chapters, then pulling something else off the shelf and reading that.

Is there any chance researchers will use tools librarians build? If Abbott's research is any indication scholars disengaged from librarians in the 1920s for a variety of reasons. In a large part because librarians represent what Abbott calls a universalist approach as opposed to scholars inclination for a partial or specialty approach to subject access.

But the message was everywhere the same. Faculty and graduate students got their references either from hearsay or from other people's footnotes or reference lists, just as - in fact - I was doing myself.

Now if faculty and graduate students were getting their research bibliography via hearsay or other professionals' published work, why were they doing this? The answer, at least theoretically, seemed obvious. What these sources had that the general bibliographical tools lacked was selectivity.

In my opinion, this is a major problem with bibliographic tools. Quality isn't addressed in any but a cursory fashion. Catalogs don't tell a researcher what the best book on Joyce is. And that in many instances is exactly the information library researchers need.

Finding something is easy. It's knowing you ought to be looking for it that is hard.

It was the librarians' contention that there ought to be one master index, but the research scholars always want partial indexes, indexes slanted their way, organized by their way of seeing the world, not by a generic view from nowhere.

library researchers started withdrawing from this universalist project in the 1920s and gradually erected a system of specialty tools and a set of research practices that enabled them to bypass the hugely inefficient searches that were the only possibility under the universal bibliographical system.

That's all for now. Back to building the master index.

Thursday, March 27, 2008

Faceted or guiding searching question

I have a question for you. Back a bit more than 2 years ago North Carolina State University came out with a major new look for a library catalog interface. See: http://www.lib.ncsu.edu/catalog/ This interface is based some products from Endeca which facilitate ‘guided navigation’. While the look has changed a bit since it was first deployed, the ability to suggest to patrons ways they may want to focus or refine their search using a faceted display of key subject, format, or dates off to the side, was a major library catalog innovation. Now instead of having folks refine their search query up front, you quickly gave them many ways to go to continue their search.

Now just about every other major library citation and catalog software quickly developed this type of faceted browsing off to the side, including MetaLib and Primo from ExLibris, WorldCat Local from OCLC, Encore from Innovative Interfaces, etc.

I’m wondering how much you really see patrons using these guided navigation aids that are off to the side after they do a search. We have this now in our MetaLib quicksearches, available right on the main page of our library website at http://www.library.wisc.edu/ . What’s your take on how much these are used?

A nice overview of faceting is in Wikipedia.

Friday, March 14, 2008

Notes on Information behaviour of the researcher of the future

Notes on Information behaviour of the researcher of the future - Executive summary

As I read through the Executive summary of "Information Behaviour of the Researcher of the Future," I noted passages that would be of interest to the Resource Discovery Task Force. I offer these notes below with some explanation of why I believe they are important for the Task Force to consider. These notes are not exhaustive, and I encourage others to read the article and offer their take on the report in future blog posts. The passages I comment on below are only the passages that caught my eye, so to speak. Quotes from the report are in italics. A link to the study's project page is at the end of the post as well.

they [Google Generation] exhibit a strong preference for expressing themselves in natural language rather than analysing which key words might be more effective. (12)

I feel this finding is very significant. Many library workers, including myself, enjoy a powerful advanced search. That said, many research studies, the rise of Google, and my experience at public service desks, all point to the fact that I don't see much power searching--keywords reign. I remember one reference meeting at which a librarian unveiled the top ten search queries from a database. I only remember the top search query, but the other top searches were just as unimpressive. The top search query that fateful month was protein. Yes, a single ubiquitous word, at least in this engineering database, stole the honor. So much for sophisticated searching!

CIBER’s considered view is that the real issue that the library community should be concerned about is the rise of the e-book, not social networking. (17)

This is a timely finding with the release of the Google Book Search API . With more and more books being digitized by a variety of entities, a challenge for any resource discovery tool will be to point users to possible print as well as digitized versions of a text.

for library interfaces, there is evidence that multimedia can quickly lose its appeal, providing short-term novelty. (19)

I think we all know this fact, but boy is it hard to resist some of these bells and whistles. This brings to mind, to me at least, Aquabrowser. In my humble opinion, I don't think that the visual interface would prove useful to me as a searcher. The Resource Discovery Task Force has demo'd some implementations of Aquabrowser, if you are curious:

Aquabrowser
Columbus Metropolitan Library implementation
Oklahoma State University implementation
University of Chicago implementation

But there is no evidence in the serious literature that young people are expert searchers, nor that the search skills of young people has improved with time. (22)

This finding definitely bucks the trend of most media coverage of the Google Generation. That said, this finding does coincide with my experience in library instruction and public service. Libraries offer a complicated information landscape with unmarked borders. Students typically (I'm generalizing here, I know) don't have a firm understanding of what a library catalog IS, never mind how to search a catalog effectively, nor do students have an intimate understanding of the composition of the information landscape before them. An intimate understanding would include: the publication process, awareness of aggregators, licensed versus purchased content, etc. Without this understanding students and other users are at a distinct disadvantage compared to library workers. We are the insiders. I don't say all this to toot the library worker horn, but this "tacit knowledge" that we possess as library workers does, I believe, enrich our search behavior. Even simple tactics such as double-checking the accuracy of our systems give us library workers a leg up--I know I don't believe SFX all the time.

Students usually prefer the global searching of Google to more sophisticated but more time-consuming searching provided by the library, where students must make separate searches of the online catalog and every database of potential interest, after first
identifying which databases might be relevant. In addition, not all searches of library catalogues or databases yield full-text materials, and NetGen students want not just speedy answers, but full gratification of their information requests on the spot. (31)

The above quote reminds me of a presentation that Steve Frye gave at the Reference Retreat in January 2008. He showed QuickSearch sets in comparison to Google Scholar. This made me think, as does the above quote, that a fruitful avenue for the future would be to develop QuickSearch sets with certain users in mind (personalized search). The Library has already developed some QuickSearch sets, but if we could improve the variety and usefulness of the QuickSearch sets, I think this would be a helpful service to users. I realize there are technical issues and performance issues to consider, but for now I can dream.

From the report "power browsing" is an information seeking behavior that the new discovery tool should address in order to be useful. The authors define "power browsing" new form of online reading behaviour is beginning to emerge, one based on skimming titles, contents pages
and abstracts: we call this `power browsing’. (8, 19, 31)

The authors of the study seem to denigrate "power browsing" at least that is my initial impression. To my mind, power browsing is just efficient searching behavior. User want to quickly ascertain whether an article or book is relevant to their project. Nothing wrong with that. For the Resource Discovery Task Force this behavior underlines that a resource discovery tool should lend itself to power browsing. In other words, a searcher should quickly and easily access: digitized content, full-text, reviews, book covers, table of contents, indexes, tags, etc.

The significance of this for research libraries is threefold:

•they need to make their sites more highly visible in
cyberspace by opening them up to search engines

•they should abandon any hope of being a one-stop
shop

•they should accept that much content will seldom or
never be used, other than perhaps a place from which
to bounce (31)

making simplicity their core mission. (31)

personal/social searching guidance offered so successfully by Amazon for many years? (33)

Finally, the authors leave us with these conclusions. More food for thought. Simplicity is an elusive goal in my opinion. Resources change, interfaces change.... I do think whatever resource discovery tool we adopt it should have some sort of recommendation system akin to Amazon's: Customers Who Bought Items Like This Also Bought. Well, I'm running out of steam, but I'm anxious to here others' thoughts on this report and resource discovery.

Jon Udell offers some further analysis and criticism of the report at his blog.

Google Generation Project page

Tuesday, March 11, 2008

Vendor – Supplied Versus Open Source

Vendor – Supplied Versus Open Source

The products you saw Friday 3/7/08 and at our last meeting do not replace our library catalog MadCat. Instead they require us to export data from MadCat, process it, possibly merge it with data from one or more other sources, then feed that data on a regular basis into a powerful indexer and search environment for our patrons to use.

So a main goal is faster, easier, more powerful and more flexible searching and retrieving of data. Another goal is having the ability to change as new features and new ideas and new methods of presenting data become available.

Are we going to be stuck with a rigid look and feel that is simply ‘newer’? Or could we output our data in a way that, as new ideas come along or new mobile devices become more available, our data can be readily adapted to have a new look and work in a different way that suits our rapidly changing needs?

And can our output of data be pre-sorted and relevancy-ranked according to criteria we possibly have control over?

So, the question is, do we pay up front for a vendor-supplied solution where these “paths” of exporting data have already been set up for us, and the look and feel is only moderately under our control? Or do we use vendors who provide the infrastructure but use API’s to let us design the exact interface we want. An API application programming interface is a source code interface that an operating system or environment provides to support requests for services to be made of it by computer programs. (This is a wikipedia quota from a Computerworld article.)

AquaBrowser, Primo, Endeca and WorldCat Local, are all vendor-supplied, and all have the ability and are already tested in large institutions like ours. The Digital Library Federation is also working on a list of features any API from an ILS should be prepared to support.

Another option is to use WorldCat Local’s API (which gives us the ability to use WorldCat Local’s underlying structure, but write our own public interface using ‘calls’ back to the data. David Walker showed at Code4Lib very recently his interface code based on the WorldCat Local API, so this capacity is functioning at some level right now. Clearly OCLC is recognizing the importance of offering multiple options for differing types of organizations and the importance of allowing local control and innovation using a stable underlying base.)

Either way we choose, we’ll need staff to set up the processing from MadCat and other sources. And if we go the Open Source route, which could potentially offer us the most flexibility, we have to make a staff investment to be able to make the changes and implement new features. Some of this cost of change might be lessened with a vendor-supplied solution—but then, depending on the vendor, we could be right back to where we’re at right now which is running on a dinosaur catalog infrastructure while the web-world changes so rapidly around us.

If we go the Open Source route, Steve Meyer recently reminded me of a quote by Richard Stallman: “Think free as in free speech, not as in free beer.” (I should add to this that I am somewhat mis-using Stallman’s quote here. He considers the Open Source movement a very watered down version of the Free Software Movement and he really wasn’t a supporter of it—he wants software to really be free for ethical reasons, not just the practical reasons behind the Open Source movement.)

The main point I want to make is that whatever solution we take is going to cost people, $$, and time. So the most important decision I think we can make is to choose a platform and a path where we keep the doors open to make at least look and feel and even underlying structure changes very easily and very rapidly. The data needs to be in our control. I mean, aren’t you sick of, as you say ‘can’t we do xxxx with MadCat?’ someone like Curran Riley or Edie Dixon saying ‘No, we can only change yyyy, not xxxx.’? :-)

But one thing to keep in mind is our size. It’s far more work and effort to do this level of relevancy ranking ‘on the fly’ on large sets of data. And that’s why either a vendor or Open Source solution really needs us to export and pre-process the data. We have on order of 6 million bibliographic records and we want to mesh this with data from additional sources. Steve Meyer was recently telling me that he had learned from our very knowledgeable DoIT LIRA staff that once you get over about 1 million records, the processing and work needed to do the indexing of this number of records is a completely different beast and FAR more complicated.

VUfind, the only Open Source project we’ve demo’d so far in the last forum, at this point in time hasn’t handled this large -- millions of records -- technical issue yet. It is currently indexing well under 1 million records. However, it is built on an underlying Open Source structure (using Lucene and Solr from the Apache foundation) which has the ability to handle larger size databases. We do have excellent technical staff here at DoIT, but we’ll need more if we choose undertake this level of work.

And there are other Open Source projects are also coming along using an underlying structure that can support our needs. One of which is the eXtensible Project, as Karen pointed out.

Thank you.

What Should the "Catalog" of the Future Include?

I scanned the front sides of the blue sheets so you can see how your colleagues reacted to this question. I've also pulled out some common themes in case you prefer Cliff Notes. Is anything missing from this list?

Emphasis on Local Holdings:

"Somewhere in-between. Emphasis needs to be on authoritative - hard to find (not easily googlable)."
"Only stuff we license and own because no one starts a search at the library for things that google can find faster."
"Catalog is good for managing physical content...the electronic "stuff" should be discoverable through general search tools that normal people use."

Librarians Should Select:

"No, there should be values applied."

"Everything in the info universe should be eligible for inclusion. Subject experts should continue to decide what to include."
"It should include everything we - or our faculty - think is relevant/useful for research & instruction here."

It's Really Up to the Users:

"What do users expect of the Libraries catalog - that should drive the answer."
"I hope that it will include all types of information - or somehow seamlessly integrate campus and non-campus resources. Why? I think that users would appreciate 'one-stop shopping.'"

"I do wonder if patrons expect the catalog to be a finding aid for library items."

Access and Findability are More Important than Scope:

"Only link to things that students/faculty/staff could access readily."

"I think that the 'concept of scope' is not as important as the ease of finding information."

"The question should be: from the library's single search interface, what resources in addition to the things owned or leased should the searcher be able to access. That single interface should search multiple discovery tools, of which the catalog is only one of many."

Monday, March 10, 2008

Open Forum Recap 3/7

We started last Friday's Open Forum by asking attendees to respond to the following questions:

"In the future, what should be the scope of the Libraries' online catalog - that is, what kinds of information should it contain? Should it only include library owned or leased items or should everything in the information universe be included? Why?"

We received some very thoughtful responses and I will post about those a little later.

Next, we demonstrated two next generation resource discovery tools. They were Ex Libris' Primo and OCLC's WorldCat Local. Following our quick presentations of these products, we asked you to vote on which one you preferred. 18 were for WorldCat Local, 3 people liked Primo, and 3 of you didn't like either of them :).

You can try searching Primo and WorldCat Local on your own at the following sites:

Primo
University of Iowa Libraries
University of Minnesota Libraries
Vanderbilt University Libraries

WorldCat Local
San Mateo County Library
University of Washington Libraries
(Remember, anyone can create a WorldCat account to see personalization features.)

Karen also discussed an open source project centered at the University of Rochester called EXtensible Catalog. They do not have a product to demo at this point in the project.

Finally, Sue closed the session by reading a wonderful piece she wrote about why we might want to consider exploring open source options. She is going to post the text of that document in a separate entry.

It certainly was a very full hour. We hope you didn't find it too overwhelming! Thanks again for your continued support.

Resource Discovery at UW Libraries