Tuesday, March 11, 2008

Vendor – Supplied Versus Open Source

Vendor – Supplied Versus Open Source

The products you saw Friday 3/7/08 and at our last meeting do not replace our library catalog MadCat. Instead they require us to export data from MadCat, process it, possibly merge it with data from one or more other sources, then feed that data on a regular basis into a powerful indexer and search environment for our patrons to use.

So a main goal is faster, easier, more powerful and more flexible searching and retrieving of data. Another goal is having the ability to change as new features and new ideas and new methods of presenting data become available.

Are we going to be stuck with a rigid look and feel that is simply ‘newer’? Or could we output our data in a way that, as new ideas come along or new mobile devices become more available, our data can be readily adapted to have a new look and work in a different way that suits our rapidly changing needs?

And can our output of data be pre-sorted and relevancy-ranked according to criteria we possibly have control over?

So, the question is, do we pay up front for a vendor-supplied solution where these “paths” of exporting data have already been set up for us, and the look and feel is only moderately under our control? Or do we use vendors who provide the infrastructure but use API’s to let us design the exact interface we want. An API application programming interface is a source code interface that an operating system or environment provides to support requests for services to be made of it by computer programs. (This is a wikipedia quota from a Computerworld article.)

AquaBrowser, Primo, Endeca and WorldCat Local, are all vendor-supplied, and all have the ability and are already tested in large institutions like ours. The Digital Library Federation is also working on a list of features any API from an ILS should be prepared to support.

Another option is to use WorldCat Local’s API (which gives us the ability to use WorldCat Local’s underlying structure, but write our own public interface using ‘calls’ back to the data. David Walker showed at Code4Lib very recently his interface code based on the WorldCat Local API, so this capacity is functioning at some level right now. Clearly OCLC is recognizing the importance of offering multiple options for differing types of organizations and the importance of allowing local control and innovation using a stable underlying base.)

Either way we choose, we’ll need staff to set up the processing from MadCat and other sources. And if we go the Open Source route, which could potentially offer us the most flexibility, we have to make a staff investment to be able to make the changes and implement new features. Some of this cost of change might be lessened with a vendor-supplied solution—but then, depending on the vendor, we could be right back to where we’re at right now which is running on a dinosaur catalog infrastructure while the web-world changes so rapidly around us.

If we go the Open Source route, Steve Meyer recently reminded me of a quote by Richard Stallman: “Think free as in free speech, not as in free beer.” (I should add to this that I am somewhat mis-using Stallman’s quote here. He considers the Open Source movement a very watered down version of the Free Software Movement and he really wasn’t a supporter of it—he wants software to really be free for ethical reasons, not just the practical reasons behind the Open Source movement.)

The main point I want to make is that whatever solution we take is going to cost people, $$, and time. So the most important decision I think we can make is to choose a platform and a path where we keep the doors open to make at least look and feel and even underlying structure changes very easily and very rapidly. The data needs to be in our control. I mean, aren’t you sick of, as you say ‘can’t we do xxxx with MadCat?’ someone like Curran Riley or Edie Dixon saying ‘No, we can only change yyyy, not xxxx.’? :-)

But one thing to keep in mind is our size. It’s far more work and effort to do this level of relevancy ranking ‘on the fly’ on large sets of data. And that’s why either a vendor or Open Source solution really needs us to export and pre-process the data. We have on order of 6 million bibliographic records and we want to mesh this with data from additional sources. Steve Meyer was recently telling me that he had learned from our very knowledgeable DoIT LIRA staff that once you get over about 1 million records, the processing and work needed to do the indexing of this number of records is a completely different beast and FAR more complicated.

VUfind, the only Open Source project we’ve demo’d so far in the last forum, at this point in time hasn’t handled this large -- millions of records -- technical issue yet. It is currently indexing well under 1 million records. However, it is built on an underlying Open Source structure (using Lucene and Solr from the Apache foundation) which has the ability to handle larger size databases. We do have excellent technical staff here at DoIT, but we’ll need more if we choose undertake this level of work.

And there are other Open Source projects are also coming along using an underlying structure that can support our needs. One of which is the eXtensible Project, as Karen pointed out.

Thank you.

What Should the "Catalog" of the Future Include?

I scanned the front sides of the blue sheets so you can see how your colleagues reacted to this question. I've also pulled out some common themes in case you prefer Cliff Notes. Is anything missing from this list?

Emphasis on Local Holdings:
  • "Somewhere in-between. Emphasis needs to be on authoritative - hard to find (not easily googlable)."

  • "Only stuff we license and own because no one starts a search at the library for things that google can find faster."

  • "Catalog is good for managing physical content...the electronic "stuff" should be discoverable through general search tools that normal people use."

Librarians Should Select:
  • "No, there should be values applied."
  • "Everything in the info universe should be eligible for inclusion. Subject experts should continue to decide what to include."

  • "It should include everything we - or our faculty - think is relevant/useful for research & instruction here."

It's Really Up to the Users:
  • "What do users expect of the Libraries catalog - that should drive the answer."

  • "I hope that it will include all types of information - or somehow seamlessly integrate campus and non-campus resources. Why? I think that users would appreciate 'one-stop shopping.'"
  • "I do wonder if patrons expect the catalog to be a finding aid for library items."

Access and Findability are More Important than Scope:
  • "Only link to things that students/faculty/staff could access readily."
  • "I think that the 'concept of scope' is not as important as the ease of finding information."
  • "The question should be: from the library's single search interface, what resources in addition to the things owned or leased should the searcher be able to access. That single interface should search multiple discovery tools, of which the catalog is only one of many."

Monday, March 10, 2008

Open Forum Recap 3/7

We started last Friday's Open Forum by asking attendees to respond to the following questions:

"In the future, what should be the scope of the Libraries' online catalog - that is, what kinds of information should it contain? Should it only include library owned or leased items or should everything in the information universe be included? Why?"

We received some very thoughtful responses and I will post about those a little later.

Next, we demonstrated two next generation resource discovery tools. They were Ex Libris' Primo and OCLC's WorldCat Local. Following our quick presentations of these products, we asked you to vote on which one you preferred. 18 were for WorldCat Local, 3 people liked Primo, and 3 of you didn't like either of them :).

You can try searching Primo and WorldCat Local on your own at the following sites:

Primo
University of Iowa Libraries
University of Minnesota Libraries
Vanderbilt University Libraries

WorldCat Local
San Mateo County Library
University of Washington Libraries
(Remember, anyone can create a WorldCat account to see personalization features.)

Karen also discussed an open source project centered at the University of Rochester called EXtensible Catalog. They do not have a product to demo at this point in the project.

Finally, Sue closed the session by reading a wonderful piece she wrote about why we might want to consider exploring open source options. She is going to post the text of that document in a separate entry.

It certainly was a very full hour. We hope you didn't find it too overwhelming! Thanks again for your continued support.

Tuesday, February 19, 2008

Why do people use a university's library catalog?

So what do people come to MadCat, our library catalog at the University of Wisconsin - Madison, looking for? Do they already have a book/journal or whole bibliography of items in mind? Are they very familiar with this topic and want to see if there's anything new or anything they missed? Did a friend mention an interesting book they had just read at last night's party? Or are they brand new to a topic and just want to educate themselves on it.

How do we make a catalog of what our library owns, can get you to the full text of, or has a license for, work the best for all these different needs. Our library catalog, although very impressive, is not comprehensive. We don't own everything or have access to everything available. However, unlike most online bookstores, it does go quite far back in time and provide a wealth of material no longer available on the open market. Even as books and newspapers become digitized and available electronically, we'll still need to identify and track what is physically present in our collections.

But what does our public need when they search for something, and what suits their needs best? How can we get them to what they want with the least amount of effort yet also suit our need to track and know where something is physically at any moment. And how can we allow our collection materials to easily mesh with all the other materials a researcher may have collected from other sources if that is what they want, or simply deliver the one item they need at that particular moment. Is there one interface that can suit a variety of needs whether you know exactly what you want or you just want to browse a topic?

As we search for the best software being developed to hopefully improve our patrons library catalog experience, we'll be asking patrons--what is it you like or dislike about online bookstores or catalogs like Amazon. And how does searching Amazon compare with doing the same search in a library catalog with new features, such as the catalog at the University of Washington using WorldCat Local or the University of Iowa using Primo.

So please join us determining some interface differences. Find a favorite title in all three sites above. Now that you've found that title, misspell or perhaps 'rearrange' the words of that book or item so that your query won't exactly match. How well does each interface do? Are there differences in handling something that doesn't quite match-up right? What do you like or dislike about each of these interfaces? Let us know!

Tuesday, February 5, 2008

Responses from Open Forum Questions

At last Thursday's open forum, we asked you to answer the following questions:
  • What will the library search experience be like in five years?
  • What are the first things you'd change?
We collected 27 cards from you. Here is a ranked list of your responses to our first question. In five years, we hope that our search experiences will include:

  1. Ability to search large numbers of databases at once– 17
  2. Improved online browsing (facets, related results) – 11
  3. User-enriched records (tags, reviews, comments) – 9
  4. Customization, different views of the same database (including the ability to create lists and store records) – 9
  5. Intuitive searching – 8
  6. Spell correction and suggestions for alternate search terms – 7
  7. Visual-based searching – 6
  8. Focus on user-centered design – 6
  9. Integration into users online work environment – 6
  10. Fast searching – 2
  11. Removal of library lingo – 2
  12. Full-text everything / full-text searching – 2
  13. Relevancy Ranking – 2
  14. All formats in a single record – 1
What are the first things you'd change?

Most of you didn't indicate what should be changed first. Those that did mainly suggested that the inclusion of spell checking, faceted browsing, and enriched records should be priorities.

Is there anything missing from this list? Is the ranked list an accurate depiction of your change priorities?

Thursday, January 31, 2008

Open Forum Recap 1/31

A special thanks to everyone who made it to the Resource Discovery Open Forum this afternoon!

For those of you who couldn't make it, here is what we covered. Allan Barclay demoed Aquabrowser and Eric Larson showed us VuFind.

Aquabrowser
Columbus Metropolitan Library implementation
Oklahoma State University implementation
University of Chicago implementation

VuFind
Live Demo
Brochure

We asked everyone to answer the following questions at the beginning of the session:

"What will the library search experience be like in five years?"
"What are the first things you'd change?"

If you weren't at the forum, leave a comment with your answers to these questions. I'll summarize the responses from forum attendees in an upcoming post.

Tuesday, January 22, 2008

Where Do We Start and What Do Our Users Really Want?

Thanks to everyone for commenting. It is important to acknowledge that people begin searches differently and that these processes change according to the task at hand. Searching for information is personal and so we definitely want to select tools that can be customized by our users.

I asked you to describe how you begin a search for information to stress the importance of focusing on the user and making decisions based on their needs. The comments revealed two common threads in how you search for information. Many start their searches with a search engine and many value recommendations from trusted peers in social networks. Let's take a look at some recent studies to give us a rough idea of how our students might compare.

According to OCLC's 2005 report, College Students' Perceptions of Libraries and Information Resources, 89 percent of college student information searches begin with a search engine. Library Web sites were selected by only 2 percent of students as a place to begin an information search. Search engines were rated higher than libraries in the areas of reliability, cost-effectiveness, ease of use, convenience, and speed. So it seems that many of our users might agree with Sue in that whatever they use "it's got to be simple and FAST."

Only 2 percent of college students are starting searches at library Web sites, but this doesn't mean that the aren't using libraries. The recent PEW Internet study, Information Searches that Solve Problems, found that young adults (18-29) are the heaviest users of libraries when looking for information to solve problems. They are also the most likely visitors to the library for any purpose and especially value access to computers and the Internet.

So, if we know we have young adults in our libraries (I sure know they are in College Library!), how do we get them to use library resources? Well, we can start by asking them how they would improve library tools and by paying attention to what they already use.

Researchers at Idaho University Libraries conducted focus groups with undergraduate library users and asked them to describe a "dream information machine." The students imagined a machine that was a "mind reader," that was "intuitive," and could determine information needs without them having to verbalize them. The "dream machine" would be able to solve all of their information needs by searching a comprehensive collection of information resources. The ideal information source would also be portable and always available. What would your ideal information source look like?

Where does social networking fit in to all of this? Many of you reported that you start a search for information by consulting with a respected peer either in person or online. Does this mean that librarians should be in social networking sites or that social networking should be in the catalog? There will probably never be a consensus about whether or not librarians should be in Facebook and MySpace. The recent University of Michigan Library Web Survey found that 23% of library users would be interested in contacting a librarian in Facebook or Myspace, nearly half wouldn't be interested in contacting a librarian this way, and the rest don't use social networking sites. Many librarians in the blogosphere took this to mean that we shouldn't be in social networking sites. To me, this means that I should be in social networking sites for the 23% interested in talking to me (as long as I'm not stalking those that don't). Let's wait and see how our users feel about enriching the catalog with user reviews and ratings before we make any assumptions there.

The Resource Discovery Task Force does plan to conduct user surveys and focus groups to get a better idea of what our users value. Let's talk more about user needs and how to fill them at our open forum on Thursday, January 31 from 12:30 - 1:30 in Memorial 126.

Happy first day of class!