Forgot your password?
typodupeerror
The Internet

Google's Weakness, AltaVista's Strength 326

Posted by timothy
from the not-all-contrarianism dept.
Cory Doctorow has a article on oreillynet called "How I Learned to Stop Worrying and Love the Panopticon," which begins "How much ass does Google kick? All of it." (We linked to it a few days ago.) Reader Richard Seltzer writes with a reaction to Doctorow's article, below. Your results may vary, but this kind of skepticism can only make the competing search engines better.

Some people love the results they get at Google, others are often disappointed. To a large extent, both the pluses and the minuses derive from Google's ranking system, which (as the folks at Google explain www.google.com/technology) depends largely on the number links to a particular page and the relevance of the content on those linking pages to the content on the target page, and the quality of the pages doing the linking.

Thanks to that complex and brilliant system, over time, the best pages often rise to the top of search lists. But that takes time -- a lot of time.

It works great for old, established sites to which many other old, established sites have linked. (It works great for my site :-) www.samizdat.com ). But new sites, regardless of the quality of their content, get short shrift. It takes 2-3 months for the new pages to get into the Google index. Then it takes time -- perhaps years -- for other "important" sites to discover the new site and link to it; and then months more for the new versions of those pages with those new links to get into the Google index.

So if I'm looking for content that is likely to have been on the Internet for a year or more, Google is great. But if I'm looking for fresh content, I'll go elsewhere.

For me, for years "elsewhere" meant AltaVista -- for two reasons. AltaVista used to add new pages to its index, for free, within two days of submission, while other search engines typically took weeks or even months. That meant they had the freshest content. In addition, AltaVista provided you with a set of very precise commands that couldn't be matched anywhere else.

Over the last year, as AltaVista has struggled to become profitable, they have destroyed their beautiful free submission process, trying to force Web sites to pay for submission. Free submissions (which typically come from the kinds of content-rich sites that I'm interested in) now seem to take three months or more -- no better than the other search engines and often worse.

Fortunately, the powerful commands remain -- for instance, the ability to exclude as well as include terms in your query. AltaVista lets you use minus signs and plus signs to indicate what you really don't want and what you do want. And for some specialized searches the exclusion is essential.

For instance, say you want to know what Web pages outside of your own site have links to your pages. At Google, I can do a search for link:samizdat.com or get the same results by going to their "Advanced" search and using their "page specific search" to find pages that link to a particular page. But my results are then littered with pages from my own site -- information I don't need and don't want. At AltaVista, I can search for +link:samizdat.com -host:samizdat.com and get exactly what I want -- finding out who thinks enough of my pages to have linked to me without my having contacted them: a valuable list of well-wishers and potential partners.

Similarly, Google lets me restrict a search to a particular Web site. For instance, if I include in my query the term site:samizdat.com or in Advanced search under Domains I choose to restrict the search to that domain, Yes, I get results only from that site. But to use that command, I need to have additional query terms: site:samizdat.com alone generates no results.

At AltaVista, however, I can search for host:samizdat.com and get a complete list of all the pages at my site that are in the AltaVista index. Or I can search for url:samizdat.com/isyn and get a list of all the pages in that directory at my site are in the AltaVista index. Or I can search for url:samizdat.com/consult.html to see if that particular page is in the index.

In other words, AltaVista provides a higher level of precision and the ability to get information that is particularly valuable to people in charge of Web sites and Web-based marketing projects. And if they'd just fix their free submission process and provide the service they used to, they'd kick Google's ass for searches for current information.

P.S. -- The folks at Google are very proud that their system defies human tampering. In fact, what they've done is encouraged the development of bizarre business models structured to take advantage of their link-based ranking system. For instance, Webseed Publishing now has over 1000 sites, all with different domain names. These content-rich sites are each run by different dedicated individuals. (I'm one of them :-) In many cases, the content deserves high rankings for its quality. You might wonder why the umbrella business for all these sites bothers to maintain over a thousand different domain names, when it would be far simpler and cheaper to have them as directories under a single domain. But because the domains are different, the many thousands of links these sites have to one another all count toward the automated calculation of their popularity and quality at Google, giving them all a boost in the rankings and hence bringing Webseed more traffic and hence more revenue.

P.P.S. -- AltaVista appears to be making a comeback. Six years ago, when I was in the Internet Business Group at Digital and Digital owned AltaVista, about a third of the traffic to my Web site came by way of AltaVista. Whenever AltaVista had a glitch, I saw it immediately in my traffic stats. In fact, I sometimes was able to alert the engineers at AltaVista about problems before they had noticed them themselves. Over the years, due to increased competition from other search engines and also due to the business folks at AltaVista making bad decisions and jettisoning great capabilities/services (like 2-day free submissions, their affiliate program, LiveTopics, and newsgroup search), the number of people finding my pages by way of AltaVista plummeted. By January 2002, only 1% of my traffic was coming by way of AltaVista, despite the fact that as a long-standing fan and also as co-author of the book The AltaVista Search Revolution, I had lots of information about AltaVista at my site. I was actually getting twice as much traffic from the International Atomic Energy Agency (part of the UN), when I had no information at all related to atomic energy. But in recent weeks the traffic from AltaVista has climbed sharply. It now amounts to 6% of my total. I wish I knew why that was happening. In any case, I hope that trend continues.

This discussion has been archived. No new comments can be posted.

Google's Weakness, AltaVista's Strength

Comments Filter:
  • by Agthorr (135998) on Wednesday March 13, 2002 @05:09PM (#3159139) Homepage

    You can add your own toolbars for any search engine. I have several samples for Mozilla [mozilla.org] on my webpage [barsoom.org]. I also include a very brief description on how to add other search engines, and/or add them to IE.

    -- Agthorr

  • by drew_kime (303965) on Wednesday March 13, 2002 @05:10PM (#3159143) Homepage Journal
    Go here [altavista.com].
  • Copernic (Score:2, Informative)

    by ZaneMcAuley (266747) on Wednesday March 13, 2002 @05:12PM (#3159161) Homepage Journal
    I use Copernic 2001 Pro search client, So i get the best (and worst) features of them all :D
  • Not only that... (Score:5, Informative)

    by SlashChick (544252) <erica AT erica DOT biz> on Wednesday March 13, 2002 @05:13PM (#3159165) Homepage Journal
    ...but you can also make Google pop up when you click the "Search" button in IE. [google.com] This makes Google searching even easier since you can have the search window open on the left and hit your search results on the right. (Yay for "tabbed browsing", IE style.)

    Also, the coolest feature of the Google toolbar IMHO is not even the instant search, but the "Highlight" button. Gone forever is hitting Ctrl-F and typing in a search term. Just search for something in Google, go to a result, and hit "Highlight" -- the search terms are instantly highlighted. This saves me an incredible amount of time when I'm searching through, say, mailing list archives.

    The Google toolbar is one of the biggest reasons I use IE. (Well, that and the fact that page developers, including myself, follow the rule of thumb "Design so that it looks good in IE and works in Netscape.") But anyway, I digress. If you're using IE, check out toolbar.google.com [google.com] and download it.
  • by nstrom (152310) on Wednesday March 13, 2002 @05:13PM (#3159172)
    Similarly, Google lets me restrict a search to a particular Web site. For instance, if I include in my query the term site:samizdat.com or in Advanced search under Domains I choose to restrict the search to that domain, Yes, I get results only from that site. But to use that command, I need to have additional query terms: site:samizdat.com alone generates no results.

    You can use the following workaround to do a site: search on google without any keywords. Just do "site:yoursite.com -stuff" where stuff is gibberish (bang on the keyboard a bit). For example, this search [google.com] shows 1,290 pages from samizsat.com. On the other hand, an altavista search for that site shows 1,090 hits for pages on that site.

    I don't know why Google doesn't allow simultaneous "site:" and "link:" searching, as that is something many users would like to do.
  • A is best for... (Score:2, Informative)

    by bmooney28 (537716) on Wednesday March 13, 2002 @05:13PM (#3159173) Homepage
    In my experience, Google websearch is best for specific web searches... Dmoz.org directory is best for broad Directory style searches, where you know the broad category that your search fits into, and you wish to find several sites that have this topic in common. (Yahoo, prior to advertisement bombardments held first place in this category) Google websearch is also among the best for file searches... try including "index of" (with quotes) in a search for a specific file.. (example: "index of" passwords.doc for interesting results) Google websearch is best for up to date news story searches... (try including "news" in the search query.) Limewire is best for music and video searches, both general and specific. Overall, Google is best for nearly all searches, in my opinion... and is usually more effective than using search boxes on specific websites...
  • by d-e-w (173678) on Wednesday March 13, 2002 @05:18PM (#3159195)
    For certain types of links, Google can take as little as a couple of days. I know that I can find articles (from many different news sites, who allow individual articles to be spidered) usually within 48 hours. Sometimes within 24 hours. That's wonderful for me, if I can remember a news article I've seen recently and wish to reference, but can't remember the site I saw it on.

    As for new sites, it's been taking a week or so recently. Usually if I don't see it in a week, I head over to their add URL page and submit it.

    Talking about Google only using ratings (via number of links) is simplistic. Their index/search algorithms are obviously much more complex than that, and appear to utilitize a wide range of methods beyond simply rating.
  • by wytcld (179112) on Wednesday March 13, 2002 @05:18PM (#3159202) Homepage
    For a text-intensive site [jazzhouse.org] that's been around a few years, and that the search engines were informed of years ago, 4 of the top 10 most frequent visitors are Google bots. None of the 10 is from AltaVista. And Google searches send a lot more people our way too.

    Now I just don't see how AltaVista can give anyone more current results if their bots are featherbedding.
    ___

  • by Cheshyre (43113) on Wednesday March 13, 2002 @05:18PM (#3159204) Homepage
    At AltaVista, I can search for +link:samizdat.com -host:samizdat.com and get exactly what I want
    In Google, +link:samizdat.com -site:samizdat.com does the same thing.
  • by GeekLife.com (84577) on Wednesday March 13, 2002 @05:26PM (#3159266) Homepage
    What would you do if your best friend [google.com] cuddles up with your biggest enemy [microsoft.com]?

    It's alive. [google.com]
  • by gafferted (560272) on Wednesday March 13, 2002 @05:28PM (#3159283)
    In fact, I sometimes was able to alert the engineers at AltaVista about problems before they had noticed them themselves.

    Alas, they can no longer be reached. Their search engine is seriously broken. It picks on a site and hits it hard and repeatedly.

    They will make 100,000 requests on a site with only 20,000 static items within 24 hours. On our co-operative co-loacted server, we host around 80 sites, many of which are content rich. When Alta Vista choose to visit just one of them, our total bandwidth usage jumps by an order of magnitude.

    We have been unable to get past their front line support, I am not prepared to maintain robots.txt on all of our member's sites just to control their broken robot, so we had no alternative but to block their entire subnet at our firewall.

    If anyone has evidence that the AV robot is fixed, I'd be happy to let them back in.

  • by blamanj (253811) on Wednesday March 13, 2002 @05:30PM (#3159298)
    Yes, he's clearly wrong here, as the practisioners of Google bombing [corante.com] have noted. It can take only a few days to have an effect.

    I suspect this is due to a more frequent crawl at sites Google considers interesting, so if you put up a site and no one hears about it for a while, it could take longer, but in general they're quite responsive.
  • by epeus (84683) on Wednesday March 13, 2002 @05:37PM (#3159344) Homepage Journal
    The author is basing this on outdated information. Google knows to crawl sites that change frequently more often than those that don't. Here is a concrete example:

    I posted Two Kinds of Order by John Marks [demon.co.uk] on March 11th, and mentioned this to some colleagues who might be interested. I linked to it from a Weblog [blogspot.com] or two [blogspot.com],and Doc Searls [weblogs.com] did too.
    Today it is number 1 [google.com] on a search for 'two kinds of order' out of over 2 million, and a search for John Marks [google.com] brings the page up in 5th position, despite there being lots of other John Marks's on the net.

    Thats what I call fast (and relevant)
  • by joshv (13017) on Wednesday March 13, 2002 @05:59PM (#3159474)
    The point of the original article was hidden in the last few paragraphs. He was making a point about various government's attempts at universal surveilance, i.e. attempting to log all packet traffic, etc...

    His discussion of web search techniques was to illustrate the nature of the problem these would be omnisicents face. Because the data they collect does not have the richly linked nature of web content, all that these governments government entities will be left with is mountains of meaningless data. They will be stuck using AltaVista like searching and matching techniques.

    And we all know how useful Altavista is these days.

    -josh

  • by nagora (177841) on Wednesday March 13, 2002 @06:03PM (#3159488)
    Macromedia has managed to get to the point where many ads are Flash. These are REALLY shit and annoying while trying to read a page. The result is that I've removed Flash from my browser and, you know what? I don't miss it. This is the reason I think they want the Web to be "all Flash" - if the only people that use it are the page-spammers then everyone else can switch it off and actually have a better experience of the Web.

    So, I wonder if Flash might implode on the basis of their success in the ad market coupled with all the problems of using Flash to generate your pages, plus the simple fact that almost no Flash site actually delivers anything that's still interesting after the first visit so who'd miss it?

    TWW

  • by YetAnotherLogin (534226) on Wednesday March 13, 2002 @06:29PM (#3159622)
    what's [google.com] your [google.com] point? [google.com]
  • by Cryogenes (324121) on Wednesday March 13, 2002 @06:44PM (#3159723)
    Searching on Google for
    two kinds of order
    yields, as you say, a bit over 2 million results, but your site is not number one. It is not even in the first thirty (I did not bother to look any further). Your site is, indeed, number one on a search for the phrase (note the quotations)
    "two kinds of order"
    However, that is number one out 185 which is a lot less impressive than the 2 million which you claim.

    Do you believe in death after life?

  • by scotty (5588) on Wednesday March 13, 2002 @06:45PM (#3159730) Homepage
    Normally Google has this 4 weeks spider/update cycle - sites will only receive a deep crawl every 4 weeks, and Google will only update its index/cached content every 4 weeks. However, since late last year, Google has been indexing the index page of the sites with high PageRank *daily*, and you will see a date next to the search result for the sites that have been indexed more frequently. For example, search for slashdot [google.com] on Google revealed that the index page has been updated on the 13th of March.

    Consequently, if there is a link to a new page appearing on the index page, and you happen to have (very) high PageRank, the new page might get indexed as well, outside the 4-week frame.
  • by tigris (192178) on Wednesday March 13, 2002 @07:13PM (#3159911)
    Eh? When I cut and paste

    +link:samizdat.com -site:samizdat.com

    at Google, I only get 4 results.
  • Re:Wow (Score:2, Informative)

    by eeek (83889) on Monday March 18, 2002 @03:55PM (#3182494) Homepage
    One thing makes Google by prefered search engine now: it gives me relevant results. Alta Vista appears to favor paid submissions over the actual search terms and returns mostly crap on my searches.

"The value of marriage is not that adults produce children, but that children produce adults." -- Peter De Vries

Working...