Friday, January 20, 2006

Trusted Search Sites

When doing your research finding web sites with a search engine like Google or Yahoo or Wikipedia leads to the question of which results can you trust and actually use in your paper. Here are two short articles from Librarians who have the same problems:

January 2006
How Does Google Determine Which Web Sites Are the Most "Trusted"?
In the debut issue of the Google Librarian newsletter, we published an article by quality engineer Matt Cutts explaining how Google collects and ranks search results. The most common question we heard in response was "How does Google determine which web sites are the most 'trusted'?" Here, his reply:

This question goes to the heart of what we do. You already know the short answer: Google uses more than 100 different factors, including the PageRank algorithm, to determine whether a site is trusted or reputable. If you think of the internet as a democracy, a web page that links to another page is "voting" for the value of the page. As we explain in our Technology Overview, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives. But that's not the end of the story. If Page A itself has more votes from other pages, the vote carries more weight. Or to put it another way, if more people trust your site, your trust is more valuable.

In addition to using the PageRank algorithm, we automatically analyze the content of pages we crawl. This goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags. We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.

The long answer is more complicated. Since how we determine search results is the core of our business, there are some ingredients in our "special sauce" that we can't share. In addition, it goes without saying that we're on constant guard against people exploiting the information to achieve artificially high placement in our search results. At the same time, Google was born in a university research environment, and there is a large and growing body of academic work exploring and analyzing our technology. That includes the grand-daddy of them all, The PageRank Citation Ranking: Bringing Order to the Web, the original Stanford University paper by Larry Page, Sergey Brin, Rajeev Motwani and Terry Winograd. If you'd like to take a look, Google Scholar is a good place to start (especially if you click on the citations as well as the papers themselves).

Finally, you might also want to check out this link, which takes you to a collection of technology papers written by people now at Google. It contains oldies-but-goodies like the Stanford paper on PageRank, but also brand new research about everything from algorithms to artificial intelligence. Enjoy!

January 2006
Beyond Algorithms: A Librarian's Guide to Finding Web Sites You Can Trust
Okay, so your favorite search engine has turned up thousands of web sites on the topic of your choosing. Which ones should you trust?

As a librarian who runs a web site catering to people with a hunger for authoritative resources, I'm often asked that question. As a result, my colleagues and I have developed a five-point system for separating the wheat from the chaff. While we pride ourselves on our small but well-groomed collection of reliable, trustworthy, librarian-selected web sites, there's really no magic to what we do. It's simple methodology you can use at the reference desk or any other place you find yourself staring at a page of search results and wondering where to begin.

Whether we're selecting new web sites for our newsletter or deciding whether to toss or keep sites already in our collection, we rely primarily on what we call the "big five show-stoppers": availability, credibility, authorship, external links and legality.

1. Availability

Is the site up and running? Is the information freely available?

The first question can be answered fairly easily — either the web site is there, or it's not — but the second question is more complicated. Many web sites put information behind walls of one sort or another. It may be worth it for you to pay a fee or register to gain access to a web site, but at the Librarians' Internet Index (LII), we pass along only freely available sites because our working assumption is that when you're hunting for information, either for yourself or for a library patron, free access is good. In addition, we are leery of sites that require registration to view most or all of the site, since it's often unclear how your personal information will be used.

Of course, this isn't a hard-and-fast rule. We don't reject a site just because it has some information behind a wall. For instance, Open WorldCat is a terrific database for locating books in libraries worldwide, and if the book is available for purchase, you'll find at least one link to an online store. There's no harm in that (except to bookaholics like us, where it's dangerous to our pocketbooks). But if most or all of WorldCat site was fee-based, it wouldn't be very useful to anyone who isn't a subscriber.

Shortcut: To determine if information "behind the wall" is worth your time and/or money, skim the web site's mission statement, "About" page, or registration sign-up page. For example, the Ellis Island Foundation makes it clear that by registering for free, you'll be able to take full advantage of the site's functionality.

2. Credibility

Does the web site contribute current, accurate information? Is the site author(s) qualified to present the content provided?

In reviewing the sites we've rejected for LII in the past six months, we found that the majority had credibility problems. Either the content was clearly substandard (including, for instance, recipes that misstated quantities, or definitions we knew to be wrong) or the author lacked the credentials to present the content on the site.

We don't rule out personal web sites, but we scrutinize them carefully. Sometimes we select sites maintained by hobbyists when the content is fun or recreational, such as Patently Absurd, a web site featuring weird patents. Sometimes we select sites when we can use our own subject knowledge to assess the content, as we did when we chose the yummy web site, Tiramisu: Heaven In Your Mouth. But personal web sites aren't always what they seem, and we wouldn't want anyone following health advice from a quack, or using a knitting pattern that results in the proverbial sweater with three arms.

We're always surprised when potentially good web sites don't provide information about the author's credentials right up front. If we aren't sure about a site, we write the author. If they don't respond, or we're not convinced of their credibility when they answer, we reject the site.

Shortcut: Look for an "About" page or an author biography.

Shortcut: There are some sources that you can nearly always trust. Many librarians busy helping patrons at the desk, over the phone, or in instant messaging sessions use Google searches limited to the .edu or .gov domains to quickly winnow the search to sites known to be authoritative. For example, a Google search for "breast cancer site:gov" will yield high-quality web sites.

3. Authorship

At LII we're very skeptical of web sites with more than a couple of typographical or grammatical errors. In addition to how poorly it would reflect on us to point someone to a grammatically challenged web site, it's a big hint that the content on the site is generally not up to snuff.

We do make some exceptions for web sites translated from languages other than English, if we can find someone to verify that the content in the original language has correct spelling and grammar. The English is a little rocky on the lovely web site, Paris at the Time of Philippe Auguste, but we have it on good word that the original French is très bien.

Shortcut: If you think a web site has more than the average number of typos, copy a representative page and dump it into a Word document for a spell-check.

4. External Links

Nothing kills a web site's reputation faster than broken links leading elsewhere.

Broken links are a flag that the author is not paying attention to the content. Give web sites some latitude, though; there was a time when one broken link among many would cause us to reject a web site, but it's more common nowadays for people to move content to another URL, making it difficult for even the most fastidious webmasters to keep up. If you spot a broken link on a site you like and use, let the webmaster know; we appreciate these tips, and so do people at the sites we communicate with. But if you see many broken links rather than just a few, that's a cue to pass the site by.

Shortcut: Look for evidence that the web site maintains its links, such as notes indicating when a page was last updated, and beware of student project web sites and personal web pages with many, many links!

5. Legality

The author of a legitimate web site will ensure that she is legally entitled to publish the content on her site, working within copyright and fair use guidelines.

It's common to hear the author of a web site claim she is engaging in "fair use." Sometimes this is a reasonable argument, such as when an author uses examples of an artist's work in order to discuss it. Sometimes it's just a smokescreen — an excuse to justify posting someone else's work.

Shortcut: It's a lot easier to assess whether a web site complies with copyright law when you're familiar with its basic principles. Brad Templeton's guide to common copyright myths is a good primer.

Shortcut: Trust your instincts. If a web site looks and feels like a rip-off, it probably is. Take a chunk of its text and paste it into Google to see if it shows up elsewhere.

Shortcut: Avoid fan sites, lyric sites, paper mills, and any site posting newspaper or magazine articles (the full articles, not quotes or links) without also posting explicit permission statements.

So there you have it — the big five show-stoppers. Of course, once a web site makes it past the first cut, there are more finely grained heuristics for gauging authority. But you'll have what you need to be sure it's worth your time to dig deeper: a site you can trust.
Digging Deeper: A Few More Questions to Consider
1 Does the author provide sources for information?
2 If the site provides opinion rather than facts, are these opinions clearly identifiable as such?
3 Who are the audiences for this site? Is the site appropriate for the intended audiences?
4 Does the point of view provide balance to the information seeker?
5 How does the site compare with other sites on the same subject?

Karen G. Schneider, a librarian and writer, is the Director of Librarians' Internet Index (LII). Her personal blog is Free Range Librarian. She freelances for the library press, most recently at the ALA TechSource blog.

Sunday, January 15, 2006

More Costs

I have just run the item below on the cost of the war in Irag for the Americans. Here is a new one for the U.S. of A and their British allies, with an additional cost estimate editorial by myself.
Economic View
When Talk of Guns and Butter Includes Lives Lost

By LOUIS UCHITELLE

Published: January 15, 2006


AS the toll of American dead and wounded mounts in Iraq, some economists are arguing that the war's costs, broadly measured, far outweigh its benefits.

Studies of previous wars focused on the huge outlays for military operations. That is still a big concern, along with the collateral impact on such things as oil prices, economic growth and interest on the debt run up to pay for the war. Now some economists have added in the dollar value of a life lost in combat, and that has fed antiwar sentiment.

"The economics profession in general is paying more attention to the cost of lives cut short or curtailed by injury and illness," said David Gold, an economist at the New School. "The whole tobacco issue has encouraged this research."

The economics of war is a subject that goes back centuries. But in the cost-benefit analyses of past American wars, a soldier killed or wounded in battle was typically thought of not as a cost but as a sacrifice, an inevitable and sad consequence in achieving a victory that protected and enhanced the country. The victory was a benefit that offset the cost of death.

That halo still applies to World War II, which sits in the American psyche as a defensive war in response to attack. The lives lost in combat helped preserve the nation, and that is a considerable and perhaps immeasurable benefit.

Through the cold war, economists generally avoided calculations of the cost of a human life. Even during Vietnam, the focus of economic studies was on guns and butter - the misguided insistence of the Johnson administration that America could afford a full-blown war and uncurtailed civilian spending. The inflation in the 1970's was partly a result of the Vietnam era.

Cost-benefit analysis, applied to war, all but ceased after Vietnam and did not pick up again until the fall of 2002 as President Bush moved the nation toward war in Iraq. "We are doing this research again," said William D. Nordhaus, a Yale economist, "because the Iraq war is so contentious."

Mr. Nordhaus is the economist who put the subject back on the table with the publication of a prescient prewar paper that compared the coming conflict to a "giant role of the dice." He warned that "if the United States had a string of bad luck or misjudgments during or after the war, the outcome could reach $1.9 trillion," once all the secondary costs over many years were included.

So far, the string of bad luck has materialized, and Mr. Nordhaus's forecast has been partially fulfilled. In recent studies by other economists, the high-end estimates of the war's actual cost, broadly measured, are already moving into the $1 trillion range. For starters, the outlay just for military operations totaled $251 billion through December, and that number is expected to double if the war runs a few more years.

The researchers add to this the cost of disability payments and of lifelong care in Veterans Administration hospitals for the most severely injured - those with brain and spinal injuries, roughly 20 percent of the 16,000 wounded so far. Even before the Iraq war, these outlays were rising to compensate the aging veterans of World War II and Korea. But those wars were accepted by the public, and the costs escape public notice.

Not so Iraq. In a war that has lost much public support, the costs stand out and the benefits - offsetting the costs and justifying the war - are harder to pinpoint. In a paper last September, for example, Scott Wallsten, a resident scholar at the conservative American Enterprise Institute, and Katrina Kosec, a research assistant, listed as benefits "no longer enforcing U.N. sanctions such as the 'no-fly zone' in northern and southern Iraq and people no longer being murdered by Saddam Hussein's regime."

Such benefits, they found, fall well short of the costs. "Another possible impact of the conflict, is a change in the probability of future major terrorist attacks," they wrote. "Unfortunately, experts do not agree on whether the war has increased or decreased this probability. Clearly, whether the direct benefits of the war exceed the costs ultimately relies at least in part on the answer to that question."

The newest research was a paper posted last week on the Web (www2.gsb.columbia.edu/faculty/jstiglitz/cost_of_war_in_iraq.pdf) by two antiwar Democrats from the Clinton administration: Joseph E. Stiglitz of Columbia University and Linda Bilmes, now at the Kennedy School of Government at Harvard. Their upper-end, long-term cost estimate tops $1 trillion, based on the death and damage caused by the war to date. They assumed an American presence in Iraq through at least 2010, and their estimate includes the war's contribution to higher domestic petroleum prices. They also argue that while military spending has contributed to economic growth, that growth would have been greater if the outlays had gone instead to highways, schools, civilian research and other more productive investment.

The war has raised the cost of Army recruiting, they argue, and has subtracted from income the wages given up by thousands of reservists who left civilian jobs to fight in Iraq at lower pay.

JUST as Mr. Wallsten and Ms. Kosec calculated the value of life lost in battle or impaired by injury, so did Mr. Stiglitz and Ms. Bilmes - putting the loss at upwards of $100 billion. That is more than double the Wallsten-Kosec estimate. Both studies draw on research undertaken since Vietnam by W. Kip Viscusi, a Harvard law professor.

The old way of valuing life calculated the present value of lost earnings, a standard still used by the courts to compensate accident victims, generally awarding $500,000 a victim, at most. Mr. Viscusi, however, found that Americans tend to value risk differently. He found that society pays people an additional $700 a year, on average, to take on risky work in hazardous occupations. Given one death per 10,000 risk-takers, on average, the cost to society adds up to $7 million for each life lost, according to Mr. Viscusi's calculation. Mr. Stiglitz and Ms. Bilmes reduced this number to about $6 million, keeping their estimate on the conservative side, as they put it.

None of the heroism or sacrifice for country shows up in the recent research, and for a reason.

"We did not have to fight this war, and we did not have to go to war when we did," Mr. Stiglitz said. "We could have waited until we had more safe body armor and we chose not to wait.
(My editorial)- The cost will be much, much greater then then the above estimates because in the end the Americans and their suck hole British helpers will be responsible for the total cost of the economic and civilian damage done to Iraq. Some esimates place the civilian deaths at over 200,000 and if we use the conservitive estimate of $6 million per person we come up with the additional figure of $1.2 trillion. Damage to the infruscructure and lost production can easily come to an additional $1 trillion and of course there should be additional punitive and emotional damage claims amounting to another $2 trillion. The total for the Americans and their British buddies would be well over $5 trillion and then they would be getting off lucky.

Sunday, January 08, 2006

Cost of Iraq War

Someone was writing about the cost of the Iraq War and thought they might be interested in this:
Real Costs of the Iraq War [+] posted by Jeremy Lyon (0 comments)  
Joseph Stiglitz, a Nobel prize-winning economist, has co-authored a report in which he projects the real costs of the Iraq war to be at least $1 trillion. Those costs include not only the operational war expenses, but also such costs as lifetime care for critically wounded soldiers. Compare that figure with Larry Lindsey's pre-war estimate of $200 billion, characterized by the Bush White House as "too high." [sysrick]