Resource Discovery at UW Libraries

Thursday, June 5, 2008

Executive Summary of Our Final Report

LSC accepted our final report and is currently considering our recommendations!

Below is the executive summary of the final report. The entire report, including appendices, is available here: http://staff.library.wisc.edu/rdetf/RDETF-final-report.pdf.

Stay tuned for our final open forum. We haven't set a date yet, but it is likely to be in July due to vacation schedules.

Thanks again for all of your feedback and support! Your participation at the open forums and all of the articles and ideas you shared with us really enriched this experience.

Executive Summary and Conclusions

This report recommends that the Libraries decouple the discovery interface from the ILS and implement a discovery interface that is aligned with user behaviors and expectations. It also recommends investigating the feasibility of replacing WorldCat FirstSearch with WorldCat Local to facilitate resource discovery beyond local collections. Additional conclusions drawn from environmental, user and product scans include redoubling efforts toward developing a single sign-on across library and UW resources, implementing direct linking in SFX and supporting a culture of assessment in order to better understand our users and be present in the online and physical spaces in which they work and play.

The conclusions in this report are tuned toward the library catalog because the library has other projects underway for improving and implementing access to non-catalog data. It is expected that these recommendations will apply to additional types of data beyond the library catalog.

The new discovery environment must:

Decouple the interface from the ILS so that it is sleek, lean, and enabled for rapid change.
Maintain complete control over the discovery interface, data, and index. Nothing should be unchangeable.
Emphasize simplicity in the interface. As Lorcan Dempsey noted: "'simple search' but supported by smart results and rich browse" (single search box, single sign on, clean layout).
Include sophisticated search and result functionalities (faceted browsing and/or topical clustering, natural language, obvious relevancy ranking, searching within results, clarity via FRBRization).
Seamlessly integrate and deliver UW collections and resources at the campus and at the system level (library catalogs, library web sites, digital collections, museums, archives).
Adapt to user behaviors and expectations (personalization, recommendations, "did you mean?" functionality, internationalization).
Encourage personalization and customization of the discovery environment in MyUW and course management systems, including Learn@UW and Moodle.
Deliver library search functionality, links and services where our users work and play, including off-campus resources (Amazon, iGoogle, Facebook, WorldCat).
Compare well in design and user experience to popular Internet destinations. Resource discovery in the libraries must become Fast, Smart, and Engaging to compete in the current and future information marketplace.
Be staffed for excellence and continuous change (developers, graphic and interaction designers, and public services staff). This includes collaboration and leadership within the Open Source community.

Recommendations

Implement a decoupled interface for resource discovery (library catalogs, library web sites, digital collections, museums, archives) that meets the requirements of our vision.
Enhance current discovery environment by:
1. Continuously assessing, analyzing, and developing new tools and functionality for discovery.
  1. Maintain current awareness of browser extensions and library toolbars (LibX, Conduit.com).
  2. Widgets for personalized web pages (iGoogle, Facebook, NetVibes, PageFlakes).
  3. Promote use of information management tools (such as Zotero, Google Notebook, deli.cio.us, RefWorks, EndNote, Papers (Mac only)).
2. Promoting library data reuse by exposing all freely available library metadata to direct harvesting by indexers.
3. Using NetID for My MadCat Account and Library Express instead of the eleven digit ID.
4. Enabling a persistent sign-on into library and campus resources using NetID.
5. Finding ways to be more social and expand beyond work and research needs to encourage inquisitive exploration of all types.
6. Encouraging Ex Libris to fully realize the DLF ILS Discovery Task Force API recommendations (referred to as the "Berkeley Accord") to allow the development of local discovery applications using library data.
7. Implementing direct linking to full-text content wherever possible; in particular, within FindIt and MadCat.
8. Allocating staff time to analyze and improve the accuracy of FindIt linking. Determine when and why FindIt fails and if there is anything we can do to make this better. Are there highly requested journals which we should license or obtain faster?
9. Identifying and addressing discovery needs via mobile devices as soon as possible.
10. Putting a single search box to the Libraries Web site with target selections for MadCat, the Libraries Web site, QuickSearch for Articles (i.e., http://www.lib.virginia.edu/).
11. Strongly considering implementing WorldCat Local as a public interface for the UW System OPAC and WorldCat FirstSearch.
12. Improving MadCat now by:
  1. Including value-added information, such as book covers, sample passages of text, reviews, and RSS feeds of journals' tables of contents. Adding and enabling user-generated content, like LibraryThing for Libraries.
  2. Making relevancy ranking the default search results display for more than just a 'words anywhere' search.
  3. Investigating adding MadCat to the MetaLib General Resources QuickSearch set.
  4. Linking to Google Books through the Google Books API and/or linking to CIC Google Collections Archive (in progress).
  5. Adding icons to indicate format in results lists.
  6. Relabeling fields to make them meaningful to patrons (subject links become "Find more like this").
  7. Improving the call number browse.
  8. Providing direct export to RefWorks and other citation managers.
  9. Displaying persistent links on brief and full records.
  10. Exploring linking to Amazon, Wikipedia, etc. for contextual information.
  11. Highlighting searched keywords in results.
  12. Enabling automatic stemming/truncation, if possible.

Thursday, May 15, 2008

Two weeks of online library interaction

You Must See This

The Resource Discovery Task Force (RDTF) tested 28 library homepages for two weeks, using the online tool CrazyEgg (CE) to record user site interaction. We collected a lot of data, which you really *must see* to begin to understand the significance of the study:

How It Works

Using a tiny bit of javascript, CE records the operating system, browser, referrer, associated search terms, window size and time to click for every user-generated click on the webpage (technically, the 95% of the world with javascript enabled). This data is aggregated into multiple views to help you understand how users interact with your site--where they click, how long they take to find a link, etc.

The screen-captures of the heatmaps we've collected show which spots on the pages were most frequently clicked--the hotter the area the more clicks.

Analyzing the Results

Individually, libraries can learn a lot about which links on their site are popular and which links are not. It's simple to see the results, change a few links, reorganize your page a bit and retest. In a few iterations of your design, you'll greatly improve the usability of your site.

On our campus, we have a common library site template. Looking across all the libraries using the template, here's what I believe to be true:

Headers - institutionalized and standardized content works well. Our headers see very consistent use across all the implementations. I believe this means they are well designed and very effective.

Databases - if you look at the Business library's results, you'll see their users really want simple access to database links. Looking across all the homepages, it turns out that quick, homepage level access to subject specific database links is the right way to go.

Search - our library template buries the optional search box in the footer of the design. When elevated, such as Wendt Library's search box, users opt for search with much more frequency. Having a consistent and comprehensive search solution across the campus library websites would be a major boost to usability.

Serial Content - many library sites have "dynamic" content indicating news and events or recent additions to the collection. These links are not frequently clicked, which makes me think we need to consider the staff cost of generating this content. Certainly, we should strongly consider downsizing the footprint of this content on our homepages.

Usability and Maturity

What do we do with all this new information about user interaction on our sites? Answer: we begin to shape a better, more user-friendly web presence for our libraries. Jakob Nielsen wrote a classic pair of web posts on the 8 stages of usability maturity. These posts are a great read and help illustrate the difficulty of achieving great usability in any corporation.

The stages:

Hostility Toward Usability
Developer-Centered Usability
Skunkworks Usability
Dedicated Usability Budget
Managed Usability
Systematic Usability Process
Integrated User-Centered Design
User-Driven

I think our libraries are somewhere between stages 3 and 4 at the moment. We have a few large projects (such as the RDTF) in the works to measure and recommend usability enhancements. There is a formal staff-time commitment (LWS Web Site Committee) towards improved design and functionality across our library system.

Gathering data to improve user interaction is critical to improving usability. This study should be seriously considered by every person who is a webmaster inside our libraries. We've done some good work towards improving our sites, but we have a lot of work left to do to make them great. I hope this study leads to more and continued user data gathering across campus. I also hope our future LWS brownbags lead to a greater sense of web-development community within the libraries.

Your Turn

If you made it to the brownbag Wednesday, you saw me demo the "confetti" view CE provides. This is were the CE data truly shines. Unfortunately, I cannot give everyone on campus the password/login to the CE site itself to produce these reports... having that information would allow you to add/delete tests or cancel our account altogether.

This was a one-off study, so all we can make available are the screenshots and data collected during the testing phase. However, you and your library *should* strongly consider signing up for your own CE account (their product is amazing so be nice, purchase a paying account!). Running tests across many of your pages helps gain a better perspective on how your site could be improved to better service your patrons.

If you have any questions about buy or using CE, just let me know. BTW--I'm not paid by CE is any way, I am just a big fan.

Questions? Comments?

Please let me know what you think of these screenshots. I would love to see many people comment on their reactions to seeing this data.

Cheers,
- Eric for the RDTF

Monday, April 21, 2008

Recheduled May Open Forum

The Resource Discovery Exploratory Task Force open forum that was set to occur on Friday, May 2 is being RESCHEDULED for

Wednesday, May 14
Noon to 1:00 pm
Memorial 126

Please move the May 2nd forum in your personal calendars to this new May 14th date.

Much of this forum will be devoted to discussing information received through our online user survey, focus groups, and web site data gathering tests. A more detailed agenda will be forthcoming.

Thanks for your continued support!

Monday, April 7, 2008

Open Forum Recap 04/04

At last Friday's open forum, we showed several innovative tools and initiatives that are just getting started or are not quite scalable to our environment. We asked forum attendees to pay special attention to the features offered by each and to contrast them with the features offered by the commercial products demonstrated at the March open forum.

We began the by asking the following question: "Describe the best web-based service you've experienced. What made it excellent? What distinguished it from the crowd?" Leave a comment describing the best web-based service you've received. I'll post about the responses from the forum crowd a little later.

Next, Sue Dentinger demoed University of Virginia's Project Blacklight. Blacklight is a prototype of a faceted discovery tool for catalog data and beyond. So far, they have indexed 3.7 million MARC records, a 500 text object subset from their digital collections repository, and 320 Tang Dynasty Chinese poems. Blacklight, like VuFind, is based on Solr/Lucene.

Our second demonstration, presented by Allan Barclay, was LibraryThing for Libraries. LibraryThing for Libraries enriches the catalog by drawing on content contributed by the collective intelligence of LibraryThing members. LibraryThing for Libraries adds book recommendations, tag clouds, and links to other editions and translations of a work to the OPAC. Allan showed us LibraryThing for Libraries implementations at the Danbury Public Library and San Francisco State University.

Albert Quattrucci showed us Scriblio, an open source OPAC based on WordPress, a blog publishing platform. Scriblio includes several innovative features, like Google Book Search integration and a "text this to your cellphone" option. Albert showed us Plymouth State University's implementation of Scriblio.

Finally, I showed the demo version of the Open Library. The Open Library is an open source project of the Internet Archive and is financially supported by Brewster Kahle. Aaron Swartz, who co-authored RSS when he was 14 years old, is the project's leader. The goal of the Open Library is to create at least one wiki-like Web page for every book ever published. The Open Library will be truly free to the people in that everyone will have the ability to create, catalog, and contribute content.

The forum ended with a brief discussion of the scope of resource discovery. How can we make other UW-Madison collections, like museum holdings and departmental resources, more findable and accessible?

Monday, March 31, 2008

Generic View from Nowhere

I stumbled upon this article written by Andrew Abbott of the University of Chicago at Library Juice. So, as is my wont, I will share some of the passages that resonated with me. Again this isn't an exhaustive critique of the paper, just some of the passages that struck me as important to the Resource Discovery Task Force. Note that Abbott speaks only of humanist and social science researchers not scientific researchers. Quotes from the paper are italicized.

Central to that investigation [Future of the Library Task Force] was a study of digital versus physical use of library materials, an analysis which showed clearly what we should have guessed ahead of time -that students who are heavy physical users of the library are also heavy electronic users and vice versa. The idea that electronic research was actually replacing physical research - at least at the expert end of the scale- proved wrong.

I think that this is something to bear in mind. I often fall into the trap of digital versus physical, but perhaps I should really think about heavy versus light users of libraries. Is format even an issue? Will tools librarians' build help?

More broadly, that library researchers have projects with clear designs is a myth. A few library researchers may actually have such clear designs. And the rest of us pretend to have had them after the fact.

Abbott underlines the fact that humanistic research in libraries is a very organic endeavor. There is no clear path through the literature. Browsing and reading are part of the process. A part that librarians, for the most part, are not privy to.

Not only is known item searching a relatively minor part of expert library research, precisely structured research questions are also a relatively minor part of expert library research.

Again, Abbott points to the importance of the practice of browsing for any tools librarians provide.

Everything I could find out about stack behavior in the 1950s indicated that faculty and graduate students weren't using catalogs, even for known-item searches. Nor were they using most of the wonderful apparatus I had written about, built for them by Wilson and ALA and the library profession. They were just wandering into the stacks and trolling. They were indeed standing in the stacks and reading whole chapters, then pulling something else off the shelf and reading that.

Is there any chance researchers will use tools librarians build? If Abbott's research is any indication scholars disengaged from librarians in the 1920s for a variety of reasons. In a large part because librarians represent what Abbott calls a universalist approach as opposed to scholars inclination for a partial or specialty approach to subject access.

But the message was everywhere the same. Faculty and graduate students got their references either from hearsay or from other people's footnotes or reference lists, just as - in fact - I was doing myself.

Now if faculty and graduate students were getting their research bibliography via hearsay or other professionals' published work, why were they doing this? The answer, at least theoretically, seemed obvious. What these sources had that the general bibliographical tools lacked was selectivity.

In my opinion, this is a major problem with bibliographic tools. Quality isn't addressed in any but a cursory fashion. Catalogs don't tell a researcher what the best book on Joyce is. And that in many instances is exactly the information library researchers need.

Finding something is easy. It's knowing you ought to be looking for it that is hard.

It was the librarians' contention that there ought to be one master index, but the research scholars always want partial indexes, indexes slanted their way, organized by their way of seeing the world, not by a generic view from nowhere.

library researchers started withdrawing from this universalist project in the 1920s and gradually erected a system of specialty tools and a set of research practices that enabled them to bypass the hugely inefficient searches that were the only possibility under the universal bibliographical system.

That's all for now. Back to building the master index.

Thursday, March 27, 2008

Faceted or guiding searching question

I have a question for you. Back a bit more than 2 years ago North Carolina State University came out with a major new look for a library catalog interface. See: http://www.lib.ncsu.edu/catalog/ This interface is based some products from Endeca which facilitate ‘guided navigation’. While the look has changed a bit since it was first deployed, the ability to suggest to patrons ways they may want to focus or refine their search using a faceted display of key subject, format, or dates off to the side, was a major library catalog innovation. Now instead of having folks refine their search query up front, you quickly gave them many ways to go to continue their search.

Now just about every other major library citation and catalog software quickly developed this type of faceted browsing off to the side, including MetaLib and Primo from ExLibris, WorldCat Local from OCLC, Encore from Innovative Interfaces, etc.

I’m wondering how much you really see patrons using these guided navigation aids that are off to the side after they do a search. We have this now in our MetaLib quicksearches, available right on the main page of our library website at http://www.library.wisc.edu/ . What’s your take on how much these are used?

A nice overview of faceting is in Wikipedia.

Friday, March 14, 2008

Notes on Information behaviour of the researcher of the future

Notes on Information behaviour of the researcher of the future - Executive summary

As I read through the Executive summary of "Information Behaviour of the Researcher of the Future," I noted passages that would be of interest to the Resource Discovery Task Force. I offer these notes below with some explanation of why I believe they are important for the Task Force to consider. These notes are not exhaustive, and I encourage others to read the article and offer their take on the report in future blog posts. The passages I comment on below are only the passages that caught my eye, so to speak. Quotes from the report are in italics. A link to the study's project page is at the end of the post as well.

they [Google Generation] exhibit a strong preference for expressing themselves in natural language rather than analysing which key words might be more effective. (12)

I feel this finding is very significant. Many library workers, including myself, enjoy a powerful advanced search. That said, many research studies, the rise of Google, and my experience at public service desks, all point to the fact that I don't see much power searching--keywords reign. I remember one reference meeting at which a librarian unveiled the top ten search queries from a database. I only remember the top search query, but the other top searches were just as unimpressive. The top search query that fateful month was protein. Yes, a single ubiquitous word, at least in this engineering database, stole the honor. So much for sophisticated searching!

CIBER’s considered view is that the real issue that the library community should be concerned about is the rise of the e-book, not social networking. (17)

This is a timely finding with the release of the Google Book Search API . With more and more books being digitized by a variety of entities, a challenge for any resource discovery tool will be to point users to possible print as well as digitized versions of a text.

for library interfaces, there is evidence that multimedia can quickly lose its appeal, providing short-term novelty. (19)

I think we all know this fact, but boy is it hard to resist some of these bells and whistles. This brings to mind, to me at least, Aquabrowser. In my humble opinion, I don't think that the visual interface would prove useful to me as a searcher. The Resource Discovery Task Force has demo'd some implementations of Aquabrowser, if you are curious:

Aquabrowser
Columbus Metropolitan Library implementation
Oklahoma State University implementation
University of Chicago implementation

But there is no evidence in the serious literature that young people are expert searchers, nor that the search skills of young people has improved with time. (22)

This finding definitely bucks the trend of most media coverage of the Google Generation. That said, this finding does coincide with my experience in library instruction and public service. Libraries offer a complicated information landscape with unmarked borders. Students typically (I'm generalizing here, I know) don't have a firm understanding of what a library catalog IS, never mind how to search a catalog effectively, nor do students have an intimate understanding of the composition of the information landscape before them. An intimate understanding would include: the publication process, awareness of aggregators, licensed versus purchased content, etc. Without this understanding students and other users are at a distinct disadvantage compared to library workers. We are the insiders. I don't say all this to toot the library worker horn, but this "tacit knowledge" that we possess as library workers does, I believe, enrich our search behavior. Even simple tactics such as double-checking the accuracy of our systems give us library workers a leg up--I know I don't believe SFX all the time.

Students usually prefer the global searching of Google to more sophisticated but more time-consuming searching provided by the library, where students must make separate searches of the online catalog and every database of potential interest, after first
identifying which databases might be relevant. In addition, not all searches of library catalogues or databases yield full-text materials, and NetGen students want not just speedy answers, but full gratification of their information requests on the spot. (31)

The above quote reminds me of a presentation that Steve Frye gave at the Reference Retreat in January 2008. He showed QuickSearch sets in comparison to Google Scholar. This made me think, as does the above quote, that a fruitful avenue for the future would be to develop QuickSearch sets with certain users in mind (personalized search). The Library has already developed some QuickSearch sets, but if we could improve the variety and usefulness of the QuickSearch sets, I think this would be a helpful service to users. I realize there are technical issues and performance issues to consider, but for now I can dream.

From the report "power browsing" is an information seeking behavior that the new discovery tool should address in order to be useful. The authors define "power browsing" new form of online reading behaviour is beginning to emerge, one based on skimming titles, contents pages
and abstracts: we call this `power browsing’. (8, 19, 31)

The authors of the study seem to denigrate "power browsing" at least that is my initial impression. To my mind, power browsing is just efficient searching behavior. User want to quickly ascertain whether an article or book is relevant to their project. Nothing wrong with that. For the Resource Discovery Task Force this behavior underlines that a resource discovery tool should lend itself to power browsing. In other words, a searcher should quickly and easily access: digitized content, full-text, reviews, book covers, table of contents, indexes, tags, etc.

The significance of this for research libraries is threefold:

•they need to make their sites more highly visible in
cyberspace by opening them up to search engines

•they should abandon any hope of being a one-stop
shop

•they should accept that much content will seldom or
never be used, other than perhaps a place from which
to bounce (31)

making simplicity their core mission. (31)

personal/social searching guidance offered so successfully by Amazon for many years? (33)

Finally, the authors leave us with these conclusions. More food for thought. Simplicity is an elusive goal in my opinion. Resources change, interfaces change.... I do think whatever resource discovery tool we adopt it should have some sort of recommendation system akin to Amazon's: Customers Who Bought Items Like This Also Bought. Well, I'm running out of steam, but I'm anxious to here others' thoughts on this report and resource discovery.

Jon Udell offers some further analysis and criticism of the report at his blog.

Google Generation Project page