Tag Archives: online searching

corporatizing copyright

Ursula Le Guin has some stones.  This whole Google digital books settlement is a bit complicated, but it boils down to something more than opting in and out for the authors.  It’s about signing away your authorship, and forcing companies like the great and powerful Goog to negotiate with you before you do so and not after they’ve been caught.  Le Guin says it better:

The “opt-out” clause in the Settlement is most disturbing:

First, it seems unfair that, by the terms of the class-action settlement, authors can officially present objections to the Court only by being “opted in” to the settlement and thereby subjecting themselves to its terms.

Second, while the “opt-out” clause appears to offer authors an easy way to defend their copyright, in fact it disguises an assault on authors’ rights. Google, like any other publisher or entity, should be required to obtain permission from the owner to purchase or use copyrighted material, item by item.

The free and open dissemination of information and of literature, as it exists in our Public Libraries, can and should exist in the electronic media. All authors hope for that. But we cannot have free and open dissemination of information and literature unless the use of written material continues to be controlled by those who write it or own legitimate right in it. We urge our government and our courts to allow no corporation to circumvent copyright law or dictate the terms of that control.

Google has some stones as well, dictating the terms of their own settlement to authors of works they’ve digitized without consent.  Perhaps Google is trying to claim some sort of perverted sense of fair use by chumming with libraries to assist in their digitization without bothering to negotiate with authors and forking out the dough to buy the item they want to scan from Amazon or AbeBooks.

media specialists and college librarians the same?

Here’s a recent article from the NYT, talking about information literacy among elementary school students, and the work it takes for media specialists to break through to their “patrons”.  It’s amazing how the perceptions of information literacy and web habits among children mirror those of college students.

It’s an interesting article that details the gamut of issues that librarians are facing, including:

  • Budget cuts
  • Librarians on the front lines battling info illiteracy
  • Dealing with outdated collections and limited funds
  • actually making a difference

Here’s the scary part:

Even teachers find that they learn from Ms. Rosalia. “I was aware that not everything on the Internet is believable,” said Joanna Messina, who began taking her fifth-grade classes to the library this year. “But I wouldn’t go as far as to evaluate the whole site or look at the authors.”


During a lunch period earlier this month, Gagik Sargsyan, 13, slunk into the library and opened a laptop to research a social studies paper on the 1930s and 1940s.

“Have you looked at any books?” Ms. Rosalia asked.

A look of horror came over Gagik’s face. “No,” he said.

Not that surprising, really. But it’s self-evident to regard the stagnation of info seeking behavior among students of all levels, the OPACs, catalogs, databases that aren’t primarily utilized.  It does seem that students are taught little more than to fill in bubbles when not surfing the web. 

the incidental opac

conceptLibrarians, I’ve come to understand, facilitate things.  Just like those late-night, seedy, ever anonymous entrepreneurs on streetcorners and in beer gardens possessing the ability to procure certain items on short notice for other unnamed yet interested parties, librarians too, embrace their responsibility of passing on their coveted contraband of information or that of retrieving such information.

And considering information retrieval, I’m incessantly perplexed with the utter obliviousness users have toward their library catalog.  It’s as if users take pride, relishing a certain sense of entitlement in their lack of curiosity toward navigating library resources.  Hence, the librarian is forced to find new ways to shuffle these students like cattle through the  slaughterhouse of information literacy or competency.

I’m not all that surprised that we now are induced to a vomit-inducing display of flashing lights and multimedia just to get students’ attention.  Should users actually spent five minutes exploring their OPAC (or listening to their librarians), they might actually learn how supremely practical subject headings can be.

Take for example, aquabrowser, a different kind of OPAC designed to display relationships based on searching terms.  My local public library uses it, along with the option of using a more traditional OPAC.  Aquabrowser uses a visual diagram of one’s search terms, highlighting possible misspellings, relationships, translations or thesaurus terms for one’s search.

I personally like it, however I feel it’s designed for the user who has no idea what they’re looking for, wherein I posit the hypothesis that those users are for the most part uncommon. Traditional OPACs will get the user to their items just as fast if not faster assuming they know what they’re looking for.

Users want to know if their materials are already checked out before they want to know what you have.  Therefore, the fact that you have an OPACs is incidental and it will be used primarily when one’s primary request has become unavailable.

Egads, you may be thinking…what is my point anyways?  Having OPACs that visually diagram your search, all supplemental and wondrous as they may be, may not necessarily be more useful than the standard OPACs, though less “dynamic” in the Web 2.0 sense.

Users, particularly college-level users mind you, aren’t familiar with their collections, and thus their OPACs.  I suppose that’s part of what makes us librarians freaks…we willingly, involuntarily befriend our collection regardless of whether a copy of Mall Cop has already been ordered and is on its way. Getting users to use the catalog for its own sake is herculean.

resisting google: not so futile

Not too long ago I mused upon the idea of how some search engine companies are trying to provide more  human interaction when one has an online reference question, by either doing the searching or providing suggestions on how to perform the search.  This quasi virtual reference seems to be catching on, and librarians are suddenly becoming more recognized for the credibility they provide in their reference work.

This sentiment is the impetus for a new project that aims to compete with likes of the great goog, Reference Extract.  The project, an ever-increasing collaboration of libraries, aims to differ from Google in the credibility taken from the shrewd linkages that librarians provide in applying sound information literacy principles. Said better than myself:

Users will enter a search term and get results weighted towards sites most often referred to by librarians at institutions such as the Library of Congress, the University of Washington, the State of Maryland, and over 1,400 libraries worldwide.

The issue of credibility is interesting when compared to the measure of relevancy and popularity Google bases its index on.  The issue of credibility is more fully explained:

In essence linkages between web pages by anyone is replaced by citations to web pages by highly trained librarians in their daily work of answering the questions of scholars, policy makers and the general population. Instead of page rank, the team refers to this as “reference weighting.”

That is to say, it is no great leap to believe that working one-on-one with a librarian would yield highly credible results, but it also appears that gathering the sites librarians point to across these one-on-one interactions and making them searchable continues to yield highly credible results. Further since the librarians answer question on very wide range of topics, their answers can be applied to a general purpose search engine.

I find it clever that the organizers of RefEx measured their index by using the custom search engine provided by Google…beating it at its own game perhaps.

It is important to note that by using the Google Custom Search Engine service the exact same technology was used to search and rank the results, the only thing that varied was that one was an open web search, and one was limited to only those pointed to by reference librarians. So, even outside of the library website context the credibility of librarians is retained.

We may index less pages, but the ones we point to are more informationally literate. One question to walk away from with this: does less material indexed = more reliable?  Philosophically speaking, words like popular, relevant, and usefulness will cause debate; academically speaking, this justifies the librarian’s attempt to wean those frothing, zombie-like patrons away from The Google and more toward our subscribed databases, online resources and guides.  And with RefEx, Google’s helping us do it.

search engine overload…or overlord?

Seems like search engines have been springing up all over the place.  Soon enough there will be needed search engines to search search engines (oh wait…we already have those). In any case, the emergence of new breed of mechasearchers has me intrigued whether or not Google might be spreading itself a bit too thin with all their gizmos in development.  I’m curious about the avenues that these particular developers are taking so that they just might be the one to slay the great Goog.  Three current avenues are particularly intriguing.

Preserve what little humanity we have left with ChaCha

ChaCha is a company that is building on the idea that it is not so much the technology that is delivering your indexed content as it is the humanoids manipulating the technology.

Thus Spake Zara-chacha:

ChaCha is conversational, fun, and easy to use. Simply ask your question like you are talking to a smart friend and ChaCha’s advanced technology instantly routes it to the most knowledgeable person on that topic in our guide community. Your answer is then returned to your phone as a text message within a few minutes.

Not that it’s necessary to use a live guide as their search engine works perfectly fine, but hooking a live one can be helpful especially if you’re not near a pulsing box of pixellation and you have your phone with you.  Texting your searches seems like all the rage, but mind you, standard rates may apply.

Make it sound as human as possible with Powerset

Taming the beast is the aim of Powerset, the beast being the search technology that cannot understand our queries.  So like ChaCha, there is nothing wrong with us, but that blasted speech sytnax that computers simply can’t understand.  Powerset writes it out for us:

Powerset’s goal is to change the way people interact with technology by enabling computers to understand our language. While this is a difficult challenge, we believe that now is the right time to begin the journey. Powerset is first applying its natural language processing to search, aiming to improve the way we find information by unlocking the meaning encoded in ordinary human language.

So with the intent of not having to resort to technical, complicated search strings, Powerset wants our search results directly related to the flow of our informal speech patterns.  In its infancy, Powerset currently indexes only articles submitted to Wikipedia, though containing several viewing options, references, and citations one would expect from a typical wikipedia entry.

Index early, index often with Cuil

And then there’s Cuil. Apparently created by defectors from the great Goog, these two have started their own search engine, and though like Shaquille O’Neal running a not-so-fast break, it’s definitely gaining momentum. So much so that it boasts possessing the world’s biggest index:

The Internet has grown exponentially in the last fifteen years but search engines have not kept up—until now. Cuil searches more pages on the Web than anyone else—three times as many as Google and ten times as many as Microsoft.

Rather than rely on superficial popularity metrics, Cuil searches for and ranks pages based on their content and relevance. When we find a page with your keywords, we stay on that page and analyze the rest of its content, its concepts, their inter-relationships and the page’s coherency.

Then we offer you helpful choices and suggestions until you find the page you want and that you know is out there. We believe that analyzing the Web rather than our users is a more useful approach, so we don’t collect data about you and your habits, lest we are tempted to peek. With Cuil, your search history is always private.

Very interesting claim as well that Cuil has no interest whatsoever with collecting user data or the habits thereof and indexing by popularity.  In any case, Cuil certainly intends to raise the stakes.

Three different philosophies, three different search engines.

privacy: or, google, thou art now a ‘brary

Still trying to wrap my head around this whole YouTube ruling.  Especially in light of what Congress, now with a single digit approval rating, has just decided with respect to telecom immunity.  What is clear, infopeeps, is its similarity to the situation faced by libraries since 9/11: requests for patron circulation habits and records.

Quick…blame Jon Stewart!

The order comes as part of a $1 billion copyright infringement lawsuit brought against YouTube’s owner, Google, by Viacom, the media company that owns large cable networks such as MTV, VH1 and Nickelodeon. Viacom alleges that YouTube encourages people to upload significant amounts of pirated copyrighted programs and that users do so by the thousands, profiting YouTube and Google. It wants to prove that pirated videos uploaded to the site — video clips of Jon Stewart‘s “The Daily Show,” for instance — are more heavily viewed than amateur content.

The article goes on to mention the staedfast assurances that Viacom will not go after individual users, but rather compare copyrighted vs. non-copyrighted content, Google has taken the library stance:

“We are pleased the court put some limits on discovery, including refusing to allow Viacom to access users’ private videos and our search technology,” Google senior litigation counsel Catherine Lacavera said in a statement. “We are disappointed the court granted Viacom’s overreaching demand for viewing history. We will ask Viacom to respect users’ privacy and allow us to anonymize the logs before producing them under the court’s order.”

Nice middle road approach, I suppose.  But how reliable is that anonymizing?

But making the records anonymous is not fail-safe. In 2006, an AOL researcher inadvertently posted three months’ worth of searches typed in by 650,000 anonymous AOL users. Although their identities were masked — each user was given a randomly generated unique identification number — the search terms, which included names, home towns and interests, could be collated and used to identify a person, as an enterprising New York Times reporter showed.

Forget IP information, user addresses, and unique logins for a second. Did anyone even consider that one’s search terms can be studied to understand online research/viewing patterns?  Subtle and far fetched, perhaps, but it is indeed a behind-the-scenes method for piecing together a user’s zigsaw puzzlery of searching patterns.

Disturbing thought…will this decision get society, and ultimately librarians (preferably in reverse order) to shift this debate focusing on clarifying copyright before it’s necessary to invade privacy?  We’re a motley lot, us ‘brarians…do we have the attention span for that? I guess it’s just easier for Viacom to get the data than think about fair use.  Does ALA have a response to the ruling?

Meld this to your minds, infomaniacs…is it not ironic how fake news so powerfully informs people to the extent they are frenzied to embrace such online havoc?

Jon Stewart is powerful indeed.