Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
The Internet

Google's Weakness, AltaVista's Strength 326

Cory Doctorow has a article on oreillynet called "How I Learned to Stop Worrying and Love the Panopticon," which begins "How much ass does Google kick? All of it." (We linked to it a few days ago.) Reader Richard Seltzer writes with a reaction to Doctorow's article, below. Your results may vary, but this kind of skepticism can only make the competing search engines better.

Some people love the results they get at Google, others are often disappointed. To a large extent, both the pluses and the minuses derive from Google's ranking system, which (as the folks at Google explain www.google.com/technology) depends largely on the number links to a particular page and the relevance of the content on those linking pages to the content on the target page, and the quality of the pages doing the linking.

Thanks to that complex and brilliant system, over time, the best pages often rise to the top of search lists. But that takes time -- a lot of time.

It works great for old, established sites to which many other old, established sites have linked. (It works great for my site :-) www.samizdat.com ). But new sites, regardless of the quality of their content, get short shrift. It takes 2-3 months for the new pages to get into the Google index. Then it takes time -- perhaps years -- for other "important" sites to discover the new site and link to it; and then months more for the new versions of those pages with those new links to get into the Google index.

So if I'm looking for content that is likely to have been on the Internet for a year or more, Google is great. But if I'm looking for fresh content, I'll go elsewhere.

For me, for years "elsewhere" meant AltaVista -- for two reasons. AltaVista used to add new pages to its index, for free, within two days of submission, while other search engines typically took weeks or even months. That meant they had the freshest content. In addition, AltaVista provided you with a set of very precise commands that couldn't be matched anywhere else.

Over the last year, as AltaVista has struggled to become profitable, they have destroyed their beautiful free submission process, trying to force Web sites to pay for submission. Free submissions (which typically come from the kinds of content-rich sites that I'm interested in) now seem to take three months or more -- no better than the other search engines and often worse.

Fortunately, the powerful commands remain -- for instance, the ability to exclude as well as include terms in your query. AltaVista lets you use minus signs and plus signs to indicate what you really don't want and what you do want. And for some specialized searches the exclusion is essential.

For instance, say you want to know what Web pages outside of your own site have links to your pages. At Google, I can do a search for link:samizdat.com or get the same results by going to their "Advanced" search and using their "page specific search" to find pages that link to a particular page. But my results are then littered with pages from my own site -- information I don't need and don't want. At AltaVista, I can search for +link:samizdat.com -host:samizdat.com and get exactly what I want -- finding out who thinks enough of my pages to have linked to me without my having contacted them: a valuable list of well-wishers and potential partners.

Similarly, Google lets me restrict a search to a particular Web site. For instance, if I include in my query the term site:samizdat.com or in Advanced search under Domains I choose to restrict the search to that domain, Yes, I get results only from that site. But to use that command, I need to have additional query terms: site:samizdat.com alone generates no results.

At AltaVista, however, I can search for host:samizdat.com and get a complete list of all the pages at my site that are in the AltaVista index. Or I can search for url:samizdat.com/isyn and get a list of all the pages in that directory at my site are in the AltaVista index. Or I can search for url:samizdat.com/consult.html to see if that particular page is in the index.

In other words, AltaVista provides a higher level of precision and the ability to get information that is particularly valuable to people in charge of Web sites and Web-based marketing projects. And if they'd just fix their free submission process and provide the service they used to, they'd kick Google's ass for searches for current information.

P.S. -- The folks at Google are very proud that their system defies human tampering. In fact, what they've done is encouraged the development of bizarre business models structured to take advantage of their link-based ranking system. For instance, Webseed Publishing now has over 1000 sites, all with different domain names. These content-rich sites are each run by different dedicated individuals. (I'm one of them :-) In many cases, the content deserves high rankings for its quality. You might wonder why the umbrella business for all these sites bothers to maintain over a thousand different domain names, when it would be far simpler and cheaper to have them as directories under a single domain. But because the domains are different, the many thousands of links these sites have to one another all count toward the automated calculation of their popularity and quality at Google, giving them all a boost in the rankings and hence bringing Webseed more traffic and hence more revenue.

P.P.S. -- AltaVista appears to be making a comeback. Six years ago, when I was in the Internet Business Group at Digital and Digital owned AltaVista, about a third of the traffic to my Web site came by way of AltaVista. Whenever AltaVista had a glitch, I saw it immediately in my traffic stats. In fact, I sometimes was able to alert the engineers at AltaVista about problems before they had noticed them themselves. Over the years, due to increased competition from other search engines and also due to the business folks at AltaVista making bad decisions and jettisoning great capabilities/services (like 2-day free submissions, their affiliate program, LiveTopics, and newsgroup search), the number of people finding my pages by way of AltaVista plummeted. By January 2002, only 1% of my traffic was coming by way of AltaVista, despite the fact that as a long-standing fan and also as co-author of the book The AltaVista Search Revolution, I had lots of information about AltaVista at my site. I was actually getting twice as much traffic from the International Atomic Energy Agency (part of the UN), when I had no information at all related to atomic energy. But in recent weeks the traffic from AltaVista has climbed sharply. It now amounts to 6% of my total. I wish I knew why that was happening. In any case, I hope that trend continues.

This discussion has been archived. No new comments can be posted.

Google's Weakness, AltaVista's Strength

Comments Filter:
  • Wow (Score:3, Interesting)

    by Anonymous Coward on Wednesday March 13, 2002 @03:39PM (#3158900)
    I'm amazed, an entire article astroturfing AltaVista. Sadly, the author is a bit short-sighted, and doesn't realize how quickly stuff appears in Google's cache (often within weeks, less than a month), or that even if something accidentally ranked lower because of the number of links a given page receives, it still ends up in the first page or two anyway. *Sigh*
    • Re:Wow (Score:4, Interesting)

      by neuroticia ( 557805 ) <neuroticia AT yahoo DOT com> on Wednesday March 13, 2002 @05:00PM (#3159478) Journal
      Agreed. New sites that I've posted have been up within a week or two, and new content to already indexed sites usually shows up within search results anywhere from a day to a few weeks later.

      If the site is unique to its topic then it will appear higher in the rankings immediately as opposed to *yet another PHP site* which might never climb higher than number 80,991. This is not necessarily harmful to the surfers though the owners of the site will not be pleased.

      If it's taking your sites a long time to show up in the rankings then chances are it's not a Google problem so much as well.. Is your site really that unique afterall? Are you using the same search terms that the average user looking for your site is going to use? If you're a shoe store in Massachusetts your customers wouldn't find you by searching for shoes- they'd find you by searching for "Shoes" and "MA".

      I'm always finding new content with Google, but I never use it to find up-to-the-minute stuff. I never use *any* search engine to find that. I ask myself what it is I want to know and go to a news site related to that item. Chances are that NO ONE has it indexed yet. Not Google, not Altavista.

      Isn't that what everyone does?

      -Sara
    • Re:Wow (Score:3, Interesting)

      by Anonymous Coward
      I take it Google must actively search some pages, such as well known news pages.

      Some who are more familiar with eastern and Australian news might know that a few days ago a young Melbourne couple were detained in China, and sent back to Australia for unfolding a banner. The story isn't important here, what is, is that I went to school with the girl, actually that's not important either, what is important is that I plugged Emma Dodrell (the young lass's name) into google _that night_, less than 12 hours on, and got 4 related articles from news sites around the world.

      Somewhere the gears are churning.
  • long after banner ads had come to altavista, you could avoid them easily by using its text-only mode.
    powerful commands and no ads... what a concept!

    i only switched to google after altavista finally got rid of their text-only page.
  • What I don't find on one, I look for on the other, if I can't find what I want on either I change my critera. And so on until I either find what I want, something close to what I want or fall asleep trying.
  • by Anonymous Coward on Wednesday March 13, 2002 @03:45PM (#3158936)
    Whenever Google results have been disappointing, I hop over to AltaVista and search there.

    For me, Google doesn't have to be the perfect search engine - it's already enough. I type in google.com and it loads damn near instantly. There's no annoying advertisements, and I can search in h4x0r or Sveedish Chef, bork bork bork.

    If I can't find what I want on Google, fine, I'll use another engine. And what's wrong with that? We honestly can't have too many search engines (Well, business problems aside), because each one ends up with different ranking systems, different data pulled up from queries, etc.
    • by basso ( 230632 ) on Wednesday March 13, 2002 @04:45PM (#3159393)
      I type in google.com and it loads damn near instantly.

      The Google folks were at a local user group meeting a few months ago. They told us that they have byte counters -- the human kind -- monitoring how many bytes each page served takes. Their mission is to keep the count down.

      They got very noisy applause for that statement.
  • So waitaminute ... (Score:5, Insightful)

    by SirSlud ( 67381 ) on Wednesday March 13, 2002 @03:45PM (#3158938) Homepage
    .. are you suggesting that different goals require different tools, possibly made by different companies? Don't let the OS market know this, or we will kill the thriving flamebait OS war scene. :(

    Actually, there is lots of good information you provide on the capabilities of search engines. I, for one, would love to see more "A is good for this, B is good for this", instead of simply grouping and competing A & B, suggesting that one can only use one.

    IMHO, this is where (free) web services really rule - I can't buy 5 different cars for 5 different reasons I use cars, but in the case of these types of services, the cost of using and switching between these services is very next-to-nil. Hopefully, web services will start encouraging companies to share again, as Google and Altavista may very well demonstrate that sharing market segments with other players makes everyone happier in the long run.
    • I for one have not used Altavista since google came out. I do lots of research, on many differen't topics (I used to do debate). And I have never had any reason to go back to Altavista, Indeed this article has encouraged me to try out Altavista again, but I would have liked the author to show exactly in what respects Altavista is better than Google.
    • A is best for... (Score:2, Informative)

      by bmooney28 ( 537716 )
      In my experience, Google websearch is best for specific web searches... Dmoz.org directory is best for broad Directory style searches, where you know the broad category that your search fits into, and you wish to find several sites that have this topic in common. (Yahoo, prior to advertisement bombardments held first place in this category) Google websearch is also among the best for file searches... try including "index of" (with quotes) in a search for a specific file.. (example: "index of" passwords.doc for interesting results) Google websearch is best for up to date news story searches... (try including "news" in the search query.) Limewire is best for music and video searches, both general and specific. Overall, Google is best for nearly all searches, in my opinion... and is usually more effective than using search boxes on specific websites...
  • by baptiste ( 256004 ) <mike@bapt[ ]e.us ['ist' in gap]> on Wednesday March 13, 2002 @03:48PM (#3158961) Homepage Journal
    Seems a bit extreme to me. My sites have shown up in Google fairly quickly AND I've found the Google tends to index the most - grabbing new stuff faster than the others.

    Now it took months to get into DMOZ, but we did. Yahoo - still hasn't accepted us into our proepr catagory even after 2 or 3 tries over a year and a half.

    I think Google could benefit by adding some more advanced filtering command slike Altavista has - I agree they are nice. But the bottom line is, for obscure sites, once you get in Google, look out. Months later we finally got into the other mainstream search indexes (we submitted to them all at the same time) and in teh end Google is THE place for referrals. By orders of magnitude. YMMV, but it seems the other search indexes blew it when tehy killed free submits since folks knwo that they will only return paid sites (plus rank skewring, for $$$, etc)

    Only time will tell, but I use Google daily and am happy with the results and performance - no other search engine comes close IMHO

    • For certain types of links, Google can take as little as a couple of days. I know that I can find articles (from many different news sites, who allow individual articles to be spidered) usually within 48 hours. Sometimes within 24 hours. That's wonderful for me, if I can remember a news article I've seen recently and wish to reference, but can't remember the site I saw it on.

      As for new sites, it's been taking a week or so recently. Usually if I don't see it in a week, I head over to their add URL page and submit it.

      Talking about Google only using ratings (via number of links) is simplistic. Their index/search algorithms are obviously much more complex than that, and appear to utilitize a wide range of methods beyond simply rating.
      • by scotty ( 5588 )
        Normally Google has this 4 weeks spider/update cycle - sites will only receive a deep crawl every 4 weeks, and Google will only update its index/cached content every 4 weeks. However, since late last year, Google has been indexing the index page of the sites with high PageRank *daily*, and you will see a date next to the search result for the sites that have been indexed more frequently. For example, search for slashdot [google.com] on Google revealed that the index page has been updated on the 13th of March.

        Consequently, if there is a link to a new page appearing on the index page, and you happen to have (very) high PageRank, the new page might get indexed as well, outside the 4-week frame.
    • by blamanj ( 253811 ) on Wednesday March 13, 2002 @04:30PM (#3159298)
      Yes, he's clearly wrong here, as the practisioners of Google bombing [corante.com] have noted. It can take only a few days to have an effect.

      I suspect this is due to a more frequent crawl at sites Google considers interesting, so if you put up a site and no one hears about it for a while, it could take longer, but in general they're quite responsive.
    • While reading the article, I was pretty sure such a delay was simply not true. So I tried "bush nuclear" as keywords (a current hot topic). Guess what ?
      The first 6 links were less than 12 hours old. So maybe "Months" should be understood as "Hours".
      I also used to use Altavista.
      A very long time ago...
  • by kb3edk ( 463011 ) on Wednesday March 13, 2002 @03:49PM (#3158975)
    Altavista used to be my search engine of choice, but I gradually abandoned it around 3-4 years ago - shortly after it was spun off from DEC I noticed a general decline in quality.

    The one thing I've noticed about these "flaws" in Google "exposed" on ./ today is that they are being done in an organized fashion by intelligent (and somewhat witty) people. I agree that there is significant potential for Google-bombing to be exploited for commercial gain in the coming years. But I don't think it can nearly as bad as some of the awful stuff that's done with meta tags. I'm sticking with Google (for now) because it is still lightning fast and doesn't put a bunch of crap up on my screen.
  • by edrugtrader ( 442064 ) on Wednesday March 13, 2002 @03:50PM (#3158981) Homepage
    i wish more sites would develop tool bars similar to google... it is extremely convienient.

    on all my windows boxes it is one of the first things i install.

    google is probably the best search tool right now, and they make using it a breeze. altavista used to be the best search tool, but they made it harder and harder to use, and then search tool lost its top spot. totally different situation. if google looses its top spot in the search tool field, i'll still use it for its ease of use.
    • You can add your own toolbars for any search engine. I have several samples for Mozilla [mozilla.org] on my webpage [barsoom.org]. I also include a very brief description on how to add other search engines, and/or add them to IE.

      -- Agthorr

    • Not only that... (Score:5, Informative)

      by SlashChick ( 544252 ) <erica@eric[ ]iz ['a.b' in gap]> on Wednesday March 13, 2002 @04:13PM (#3159165) Homepage Journal
      ...but you can also make Google pop up when you click the "Search" button in IE. [google.com] This makes Google searching even easier since you can have the search window open on the left and hit your search results on the right. (Yay for "tabbed browsing", IE style.)

      Also, the coolest feature of the Google toolbar IMHO is not even the instant search, but the "Highlight" button. Gone forever is hitting Ctrl-F and typing in a search term. Just search for something in Google, go to a result, and hit "Highlight" -- the search terms are instantly highlighted. This saves me an incredible amount of time when I'm searching through, say, mailing list archives.

      The Google toolbar is one of the biggest reasons I use IE. (Well, that and the fact that page developers, including myself, follow the rule of thumb "Design so that it looks good in IE and works in Netscape.") But anyway, I digress. If you're using IE, check out toolbar.google.com [google.com] and download it.
      • For easy access to search engines try the page in my sig... I have it set as my home page so I just hit Alt+Home and then type in a search term.

        I still use the Google toolbar for searching within a page though. Somehow, the toolbar picks up the search terms when the search page is loaded.

        Another cool feature is that you don't even have to perform a search to be able to use the highlight and search features. Just type the strings into the Toolbar and the search buttons appear right away. You can then search within the current page.

        Finally, you can press Alt+G to bring the cursor into the toolbar. Pressing Alt+Enter instead of Enter in the toolbar is the equivalent of the Feeling Lucky feature.

      • Re:Not only that... (Score:3, Interesting)

        by yota ( 165006 )
        The Google toolbar is one of the biggest reasons I use IE. (Well, that and the fact that page developers, including myself, follow the rule of thumb "Design so that it looks good in IE and works in Netscape.") But anyway, I digress. If you're using IE, check out toolbar.google.com [google.com] and download it.
        There an implementation of the Googlebar for Mozilla too and it works nicely, it's not as cool as the original one but it's improving quickly. You can find it here: http://googlebar.mozdev.org [mozdev.org]

        Andrea

      • AOL could be switching from IE to Mozilla [newsforge.com] - hadn't you better rethink?
    • Opera has Google search as the default search in the toolbar right next to the address bar. Out of all the search toolbar things, Opera's is the only one I could bring myself to use.
    • check out lsdie : unformed.hypermar.net

      (i wrote it)

      it's for ie, it installs a little bar like the IE Search bar but a lot smaller, and no ads, and has support for selecting among about 20 different search engines, easily selectable in a convenient interface....
    • Try The Googlizer [linux.org.uk] - a X-based cut-paste "example" which is hugely useful.

      It takes the current clipboard, and opens a Google search for it. Small tweaking (it's only a short amount of code) would, I'm sure, make it work for other sites. Telsa claims it as working with Gnome, but I use IceWM, and it works great. A real boon, and no wasted space in my netscape toolbar.

  • Faster and Faster (Score:5, Insightful)

    by erasmus_ ( 119185 ) on Wednesday March 13, 2002 @03:50PM (#3158989)
    Perhaps AltaVista is indeed better (or used to be, as the author points out) at indexing new content, but I'd never know, as I have been using Google exclusively almost since its public debut. However, I think that this point will become less and less important.

    Yes, it's true that Google's algorithm prevents new content from being ranked high, because no one has linked to it yet necessarily, but that's by design - it is indeed at that point unproven in terms of quality. However, the spidering process can use improvement so that when many many people link to this new site just a few days later, it now ranks higher.

    Google specifically mentions (in previous interviews I read with employees) that they're always working on updating the speed, as well as the precision. The longterm goal is to significantly decrease the amount of time it takes to respider everything, and therefore make the info more relevant faster. I trust that they will continue to improve, and eventually this differentiation between "Altavista is better for new stuff, Google for old" will go away completely.
  • I love AltaVista (Score:4, Interesting)

    by Anonymous Coward on Wednesday March 13, 2002 @03:51PM (#3158992)
    Boolean mumbo-jumbo? That's the best PART of AltaVista. Google limits querys to 10 words? That stinks! Google is great for simple querys about common subjects. AltaVista's boolean query is great for finding that site whose link you can't remember but you remember some of the words that were close together. AltaVista's boolean query is great for finding information on little-known subjects that you can pretty well guess what keywords will be near each other. I used to use AltaVista's boolean query exclusively. Now, I find it's best to try both AltaVista and Google. Each find content the other won't.
  • by AdamBa ( 64128 ) on Wednesday March 13, 2002 @03:53PM (#3159015) Homepage
    It is amazing how much lameness people put up with from search engines right now. It's one of those things where people will look back in ten years and be amazed. Think of all the fiddling around you do with search terms to try to find what you want...gak! Search engines need to figure out what a page is actually about -- only then will they be reasonable.

    Of course you can find things with search engines now. Google's "trick" of counting links helps a little bit for a particular class of query, which is when you know the name of an organization and you want to find its site...it works well because more people will link to the site as opposed to other sites that discuss it. But as I have written elsewhere, if AltaVista is 99% lame, then maybe Google is only 97% lame...which is three times better, but still terrible if you take a step back.

    Now Google is doing a lot of good things outside from its basic search engine, which should be applauded. The caches, saving old Usenet posts, the image and catalog searches, etc. are all good things -- but they don't affect its basic ability to search well.

    Further karma ho' expounding can be found right here [osopinion.com].

    - adam

  • by graveytrain ( 218936 ) <lynn@x.hjsoft.com> on Wednesday March 13, 2002 @03:55PM (#3159034) Homepage
    After reading this article, tons of /.'ers are now hitting altavista and doing a

    +link:mysite.com -host:mysite.com

    to see how many people have linked to them :) (myself included) :)
  • by ACK!! ( 10229 ) on Wednesday March 13, 2002 @03:56PM (#3159036) Journal
    Sure I go to AltaVista and others after hitting a brick wall with Google but that is very rare for me. Perhaps the issue is when I do searches I am looking for info on technical issues usually revolving around compiling this or that GNU package or Service.

    No tool is the best tool for every purpose and perhaps many people should give other search engines a try and see the strengths.

    However, I don't really see that point of an article that is simply a Hoorah for one service over another with differing models of profit and aims.

    The author had simply pointed out that AltaVista as opposed to other search engines has advanced searching abilities including the ability to exclude terms. No, it has to be an AltaVista over Google article.

    Different tools for different times and different uses.

    ________________________________________________ __
  • luck (Score:5, Funny)

    by Deanasc ( 201050 ) on Wednesday March 13, 2002 @03:56PM (#3159046) Homepage Journal
    If it wasn't for the "I'm feeling lucky" button then some day's I'd have no luck at all.
    • Re:luck (Score:3, Funny)

      by tswinzig ( 210999 )
      If it wasn't for the "I'm feeling lucky" button then some day's I'd have no luck at all.

      Then technically you're breaking Google's TOS. You are supposed to be feeling lucky BEFORE you press the button.
  • by Brecker ( 66870 ) on Wednesday March 13, 2002 @04:01PM (#3159080)
    Google has exclusions, site and link queries too.

    See http://www.google.com/help/refinesearch.html
  • 5 years ago Altavista was my search engine of choice. Both for my own searches and as the number 1 engine for getting my clients websites ranked in.

    Back then you could submit to Altavista, and have a good ranking within a week.
    Over time, the relevance of the returned results dropped dramatically and the time to get a site listed plummetted, quite often taking longer than Yahoo!

    Then Google came along and I haven't looked back since. I've consistently been able to find the results I'm after thanks to the way Google indexes sites.
    I'm now able to almost guarantee clients that their sites, whether old sites that are being revamped or new sites that are freshly hatched, will be ranked well within Google and also ranked within a short period of time. I think the longest I've ever had to wait for a site to be fully indexed is three months.

    Plus the indexing of database generated pages and PDF documents by Google is a life saver. Without this feature a lot of the content I develop would be lost.

    I think it will take a miracle to get Altavista back on track. I wish it was as great as it once was, but for now it's relegated to one of the less important engines both from a searching and a submitting point of view.
  • by tiltowait ( 306189 ) on Wednesday March 13, 2002 @04:02PM (#3159082) Homepage Journal
    ... because it is so good.

    I'm a librarian. It is the most difficult time in history to do library research. There are hundreds of overlapping commercial databases out there, each with their own coverage, interface, and search engines.

    Students used to locating information with Google are appalled at the steps it takes to locate a scholarly journal. You need to browse a list of subject databases, search them, then locate a printed copy of the journal via our catalog (a growing but still small percent of journals are available online).

    Someday searching the various literary databases may be as easy as Google, but in the meantime there are drastic capitalist impediments to making it easy to do library research.

    ... so ask a Librarian if you ever need help ...
  • I've used google exclusively for the past 2 years. Never, not once, have I had to go to another search engine. 99% of the time, what I search for with google, I will find what I am looking for within the first page (10 results), very very often in the first 2 or 3 results.

    I have no need for altavista. I don't care if yo use altavista. Google works just fine for me. If altavista works just fine for you, so be it. Use it. No one cares.

    All this speculation on the future of google recently is ludicrous. "google bombing" poses no threat. The people who work there are extremely talented. If it becomes a problem, they will undoubtedly fix it.

    Google is the most popular search engine in the world, and with good reason. They are not going to give that up.

    So will everyone please just sit down, shut up, and stop bickering. Use whatever tool works best for you.
    • this /. blind worship of google's "extreme talent" is quite amusing. sure, there are some clever people there, the thing grew out of Stanford from what i understand, but, there are clever people in all companies. what tends to stifle the clever peoples' talent is business case - i.e. is your stuff gonna give Return On Investment. whilst your vc is throwing money at you it is quite easy to seem like the coolest company (as far as i understand google is a private company and can say what they want about their finances, profitable or not).

      thing is, i'm sure there are some extremely talented people working at inkotomi, altavista, etc. but, those companies have been around long enough to have to 'fess up to the accountants and justify the work they do.

      google, i think, is just hitting that stage - the google competition, whilst being an ingenious idea to most of you guys, suggests to me (cynical engineer type that i am) that they have run stone dry of ideas...
      talented people working at inkotomi, altavista, etc. but, those companies have been around long enough to have to 'fess up to the accountants and justify the work they do.

      google, i think, is just hitting that stage - the google competition, whilst being an ingenious idea to most of you guys, suggests to me (cynical engineer type that i am) that they have run stone dry of ideas...
  • Here's a very relavant /. example:

    The other day /. posted that netscape 6 is supposedly spyware [slashdot.org]. one poster replied that
    He was going to screw up the spyware system by searching google for "CROSSDRESSING MONKEY PORNO" a bunch. I replied with a physical link for search google for this [slashdot.org]. Sometime later an anonymous coward posted that the /. article had become the #1 result for those search words. It has since fallen back to the original results, but it shows that google can be tampered with using lots of hits.

    But these posts on /. today can argue all they want, but IMO Google's results are qualitatively more relavent than altavista. So if this is going to be problem, we haven't seen it yet.

    -Sean
  • by Anonymous Coward
    Sitting right here [google.com] is how to get links that refer to your page. If you bothered to read, it clearly states that site: is a modifier, ie. needs more input to work. Once again, a pointless argument because someone couldn't do a little research. It took me all of five seconds to get this info.
  • I don't mean to be a reactionary or anyhting, and I could be totally misreading this...but the author describes Webseed Publishing's business as very much the same kind of "Google Bombing" discussed earlier today.

    The way I'm interpreting that is abuse of Google's ranking system. Its an inherently dishonest business practice and I'm led to the conclusion that (Webseed Publishing && affiliates)==dicks.
  • Copernic (Score:2, Informative)

    by ZaneMcAuley ( 266747 )
    I use Copernic 2001 Pro search client, So i get the best (and worst) features of them all :D
  • by SkyLeach ( 188871 ) on Wednesday March 13, 2002 @04:13PM (#3159170) Homepage
    I would like to see a program and specification that dictates a formal data format for information in a mathematical schema. This could be the foundation for a universal translator and certainly a decent means of doing a search engine.

    The idea is pretty simplistic, although the implementation is complex.

    Any communication takes place by translating an idea into a sensory input form.

    Examples: Sight (written language, video, sign-language), Touch (brail, texture), Sound (conversation, music), Taste (Like water for chocolate?), Smell (pheromones?).

    Obviously, not all of these mediums are easy to work with, but we can certainly start with written language.

    All languages use the same basic principle: convey relevant information about a central subject. How they go about doing it is different even between versions of the same language (British English vs. American English).

    If we described an objective hierarchy of physical objects described by pure mathematics and implanted them into a central, world-wide database then open-source parsers for each language could handle the task of translating any written text, in any supported language, into this common language. If correctly implemented a search engine could enter into a short dialogue with a person performing a search and then return information very specifically relevant to what the user was searching for.

    Example dialogue:
    [user]I want information on Mary Jane Carpenter.
    [google]There is a very famous person by that name. Her official website is [here]. [Here] is a list of fansites and [here] are some other sites which discuss her. That name is mentioned in [these] sites, but it is unclear if they are talking about the same person. [Here] is a list of other people with that name.
    [user]The person I am looking for isn't famous.
    [google]Then you are probably looking for one of [these] people.
    [user] Are any of those people from St. Lewis?
    [google] [Here] is a sight dedicated to a Mary Jane Carpenter from St. Lewis.

    This may sound like an impossible streatch but it really isn't. The famous Mary Jane Carpenter has a unique id on her object and many thousands of attributes which uniquely identify it from any other Mary Jane Carpenters. Ambiguity is dictated by the same rules that govern conversation: context.

    If I have a page that contains no content other than Mary Jane Carpenter sucks! then a simple fuzzy logic routine should be able to infer that the Mary Jane Carpenter I am talking about is probably the famous one. Other clues could be gained from other parts of my site or other documents which have me as a source.

    I realize that I am talking about a HUGE database, but it sure would be handy...
    • Sounds good! Why don't you enter the Google programming contest? This idea would be interesting to implement, even if in testbed conditions. Incidentally, Teoma (or Vivisimo -- forget which) already offers something like this, though not on the scale you outline (they have "topic clusters").
  • by afidel ( 530433 ) on Wednesday March 13, 2002 @04:13PM (#3159171)
    And that is that I sometimes NEED to use the near keyword in altavista to get a complex search to work correctly. If google added the near keyword I would get rid of my quicklink bar entry for altavista's advanced search.

    p.s.
    the advanced search page is all text, not even a banner ad so it's almost faster than google to load.
  • by nstrom ( 152310 ) on Wednesday March 13, 2002 @04:13PM (#3159172)
    Similarly, Google lets me restrict a search to a particular Web site. For instance, if I include in my query the term site:samizdat.com or in Advanced search under Domains I choose to restrict the search to that domain, Yes, I get results only from that site. But to use that command, I need to have additional query terms: site:samizdat.com alone generates no results.

    You can use the following workaround to do a site: search on google without any keywords. Just do "site:yoursite.com -stuff" where stuff is gibberish (bang on the keyboard a bit). For example, this search [google.com] shows 1,290 pages from samizsat.com. On the other hand, an altavista search for that site shows 1,090 hits for pages on that site.

    I don't know why Google doesn't allow simultaneous "site:" and "link:" searching, as that is something many users would like to do.
    • Seems to be a new thing (only spotted it a week or so ago), but searching for http://slashdot.org/ gives info about that page - links to, about, similar, contain, and the Google cache.
  • by socratic method ( 15936 ) on Wednesday March 13, 2002 @04:13PM (#3159174)
    Holy shit! That guy got to plug his website NINE TIMES in an article. I can't imagine how much he had to pay for exposure like that. Next we'll be seeing ads like this:

    Features: ICMP echo requests are 37337!
    Posted by CmdrTaco on 03:35 PM -- Wednesday March 13 2002
    from the leet-nettools-impress-chicks dept.

    Hey, Slashdotters! I just found this 37337 tool called pign. You can use it to send an ICMP echo request to IBM.COM. You just type "ping ibm.com"...
    And it pings IBM.COM! Check it out:

    >ping ibm.com
    Pinging www.ibm.com [129.42.17.99] with 32 bytes of data:
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128
    Reply from ibm.com : bytes 32 time 80ms TTL=128

    (Read more...)


    Seriously -- I'm sure more curious people clicked over to samizdat.com than clicked on any of the other ads on the screen (thinkgeek and ibm for me). Maybe there is something to text ads on community sites (ala kuro5hin)

    sm
  • by Anonymous Coward
    A lot of people ignore the single biggest innovation for quality results that Google did: default 'and' states for keywords. I worked at AltaVista for a year and tried to convince people that it was the way to go but no one would listen. When combined with their ranking technology [which is impressive but not infallible] it yields the best results.

    fun fact: I also tried to get a proposal started for AltaVista to acquire Google in the summer of '99. Aren't you glad I failed?
  • by wytcld ( 179112 ) on Wednesday March 13, 2002 @04:18PM (#3159202) Homepage
    For a text-intensive site [jazzhouse.org] that's been around a few years, and that the search engines were informed of years ago, 4 of the top 10 most frequent visitors are Google bots. None of the 10 is from AltaVista. And Google searches send a lot more people our way too.

    Now I just don't see how AltaVista can give anyone more current results if their bots are featherbedding.
    ___

  • by Cheshyre ( 43113 ) on Wednesday March 13, 2002 @04:18PM (#3159204) Homepage
    At AltaVista, I can search for +link:samizdat.com -host:samizdat.com and get exactly what I want
    In Google, +link:samizdat.com -site:samizdat.com does the same thing.
  • The mailto link "Richard Seltzer" is woefully malformed.
    "mailto:seltzer@samizdat.com or http://www.samizdat"

    Please fix it.

    When it is fixed, please dont fuck up my karma by marking this as redundant.

    I would consider subscibing if it would gaurantee proper links and spellchecking.
  • Mr. Seltzer thinks that the shortcomings of Google are that it doesn't allow for more "powerful" or "expressive" queries like "link:samizdat.com" or "url:samizdat.com/isyn". The question is: how many people really use such queries? How many times have you (also not the typical user, but lets assume so) wanted to see who links to a particular site? Typically, someone who knows that site well or has already found it will look for such information. As far as I'm concerned, Google does a tremendous job of finding informative sites for me, quickly. Usually when I search, I have a keyword or two in mind, and start with that. Within a couple of clicks (or just 1, if "I'm feeling lucky") I'm on my way. Probably Mr. Seltzer is biased because he is ex-Digital or something, and was pleasantly surprised at the uptick in Altavista referrals to his sites.
  • by GeekLife.com ( 84577 ) on Wednesday March 13, 2002 @04:26PM (#3159266) Homepage
    What would you do if your best friend [google.com] cuddles up with your biggest enemy [microsoft.com]?

    It's alive. [google.com]
  • by gafferted ( 560272 ) on Wednesday March 13, 2002 @04:28PM (#3159283)
    In fact, I sometimes was able to alert the engineers at AltaVista about problems before they had noticed them themselves.

    Alas, they can no longer be reached. Their search engine is seriously broken. It picks on a site and hits it hard and repeatedly.

    They will make 100,000 requests on a site with only 20,000 static items within 24 hours. On our co-operative co-loacted server, we host around 80 sites, many of which are content rich. When Alta Vista choose to visit just one of them, our total bandwidth usage jumps by an order of magnitude.

    We have been unable to get past their front line support, I am not prepared to maintain robots.txt on all of our member's sites just to control their broken robot, so we had no alternative but to block their entire subnet at our firewall.

    If anyone has evidence that the AV robot is fixed, I'd be happy to let them back in.

  • Learn your tools (Score:5, Insightful)

    by Snowfox ( 34467 ) <snowfox@nospaM.snowfox.net> on Wednesday March 13, 2002 @04:34PM (#3159316) Homepage
    Google is similar to a UNIX command-line tool. It's a well-defined and simple interface, and enough information is provided for you to use it effectively if you just do a little reading.

    Read the excellent information Google has provided about how the engine works, and use the engine with its inner-workings in mind. When you meet the machine half-way instead of trying to dumb it down for the user, you'll get a hell of a lot more done.

    In Google's case, taking half a minute to think about what you're looking for, then tossing in a few related bits of jargon or other words relevant to the context you're after does amazing things. With a little forethought, you can almost always find what you're after and be down to a page with nothing but relevant links with just an extra word or two added as filters.

  • by douglips ( 513461 ) on Wednesday March 13, 2002 @04:35PM (#3159324) Homepage Journal
    I had lots of information about AltaVista at my site. I was actually getting twice as much traffic from the International Atomic Energy Agency (part of the UN), when I had no information at all related to atomic energy.


    According to http://www.leekillough.com/robots.html [leekillough.com] - iaea.org is commonly used as a fake referrer by spam harvesters.

    [iaea.org is a] fake referrer that's often used -- [deny requests with that referrer] unless your pages are related

    in some way to atomic energy and could really be linked to from www.iaea.org
  • by epeus ( 84683 ) on Wednesday March 13, 2002 @04:37PM (#3159344) Homepage Journal
    The author is basing this on outdated information. Google knows to crawl sites that change frequently more often than those that don't. Here is a concrete example:

    I posted Two Kinds of Order by John Marks [demon.co.uk] on March 11th, and mentioned this to some colleagues who might be interested. I linked to it from a Weblog [blogspot.com] or two [blogspot.com],and Doc Searls [weblogs.com] did too.
    Today it is number 1 [google.com] on a search for 'two kinds of order' out of over 2 million, and a search for John Marks [google.com] brings the page up in 5th position, despite there being lots of other John Marks's on the net.

    Thats what I call fast (and relevant)
    • Searching on Google for
      two kinds of order
      yields, as you say, a bit over 2 million results, but your site is not number one. It is not even in the first thirty (I did not bother to look any further). Your site is, indeed, number one on a search for the phrase (note the quotations)
      "two kinds of order"
      However, that is number one out 185 which is a lot less impressive than the 2 million which you claim.

      Do you believe in death after life?

      • I clicked on the link in my post above, and it was number one again. Maybe you're hitting a different google server than I am and the cache is out of date. Odd.
        I saw something like this when playing with Pocket GoogleWhacker - sometimes the score for a word would vary from time to time. Distributed systems are like that. I'm sure it will stabilise.
  • by JhAgA ( 24929 ) on Wednesday March 13, 2002 @04:40PM (#3159367)
    As it was over-explained, Google ranks pages according to how many links elsewhere points to that page.

    Remember this post from Slashdot [slashdot.org] ? It is about Macromedia wanting Flash to be used to design the entirety of a site.

    So, I don't suppose Google can fetch the URLS inside a Flash file (correct me If I'm wrong), so, if Macromedia's dream become true, how would Google cope with it?

    BTW, how any search engine would deal with such a catastrophe? :D

    Cheers.
    • Macromedia has managed to get to the point where many ads are Flash. These are REALLY shit and annoying while trying to read a page. The result is that I've removed Flash from my browser and, you know what? I don't miss it. This is the reason I think they want the Web to be "all Flash" - if the only people that use it are the page-spammers then everyone else can switch it off and actually have a better experience of the Web.

      So, I wonder if Flash might implode on the basis of their success in the ad market coupled with all the problems of using Flash to generate your pages, plus the simple fact that almost no Flash site actually delivers anything that's still interesting after the first visit so who'd miss it?

      TWW

  • Google treats new sites as having low utlity, but that doesn't mean that Google is out of luck on new content. Google knows that certain web sites, especially web logs (like Slashdot itself) and news sites are updated very frequently, and re-indexes them more often. Thus, if you're interested in current events, Google will tend to return results on current events from "reputable" sites. (I've been unable to find a reliable reference for this; you can check out this one from DaveNet [userland.com].)

    This doesn't help you out if you're trying to get your new business noticed, which is something site managers care about desperately. It also doesn't help you find the new business that appeared two weeks ago that might be able to help with your problem. Sadly, it's generally the same business owners who care about that case, too, since in general somebody has already beaten you to the punch with their web site and the customer gets the problem solved, without you.

    No, it's not perfect, but it solves the problems of web searchers very, very often. It may be less good for web site owners, but compared to the searchers they are in the minority.
  • OK, so this is only marginally on topic, but I think the experimental result was interesting.


    The other day I played with the Google advertising generator, just to see how much an ad would cost and how it worked, not with any intention of advertising. (Check it out, it's fun.) Anyway, I pretended to be advertising a local special-interest club where I am a member. By the time I had picked the advertising keywords that gave me the ad traffic that I wanted, those very same words typed into the search box brought up the club's web site as the third link on page one.


    I would advertise why, exactly?

  • The problem with popularity-based systems such as Google and Freenet is that popular != good. For years, the most popular television show in America was Married with Children, a program of such ungodly awful lowest-common-denominator content that it frankly horrifies me to think that alien civilizations may someday receive those television signals and decide not to contact us. The books on the bestsellers lists are often the lowest grade of junk -- Danielle Steele and Sidney Sheldon, anyone? -- and, as is often lamented by Slashdot stories and posters, the most popular songs on the radio are also of dubious value. The same applies, mutatis mutandis, to the web.

    When rating systems -- including Slashdot's -- increase the visibility of what is already popular, they only serve to reinforce the status quo. What's "cool" stays cool, not even necessarily because the audience is that monotonically unimaginative, but because new and different things are filtered out. If, for example, Microsoft actually managed to produce a solid, reliable, inexpensive, and reasonably licensed piece of software, this is about the last place you'd hear about it. With Google's link-popularity system, websites presenting unpopular or dissenting views are much, much harder to find than knee-jerk me-too reactionary sites. This is no small issue, considering that the benefits of a free society are built, ultimately, upon dissent -- and the ability to spread dissent. This is no less true when the dissent is artistic than when it is political.

    Almost a decade ago, I used to laugh at the efforts of old-media companies to transform the web into another form of television. It's a lot harder to laugh now, as the chief gateways to the net for the vast majority of the population use sophisticated software to dumb down the net. Sure, the "other" stuff is still out there, but if you can't find it, it may as well not be.
  • by joshv ( 13017 ) on Wednesday March 13, 2002 @04:59PM (#3159474)
    The point of the original article was hidden in the last few paragraphs. He was making a point about various government's attempts at universal surveilance, i.e. attempting to log all packet traffic, etc...

    His discussion of web search techniques was to illustrate the nature of the problem these would be omnisicents face. Because the data they collect does not have the richly linked nature of web content, all that these governments government entities will be left with is mountains of meaningless data. They will be stuck using AltaVista like searching and matching techniques.

    And we all know how useful Altavista is these days.

    -josh

    • by Nerds ( 126684 )
      You're the other guy who actually reads the articles! Wow, it's like finding my identical hand twin.

  • So if I'm looking for content that is likely to have been on the Internet for a year or more, Google is great. But if I'm looking for fresh content, I'll go elsewhere.

    How do you explain this [google.com], then? It's a standard Google search for the terms 'Andrea', 'Yates' and 'verdict'. The top link is hardly a year old, but rather an extremely recent and relevant link [cnn.com] to CNN's site about the trail verdict.
  • I just did a quick search on Google for information on The Fightin' Whities (a basketball team in Colorado). The first link that came up is from a newspaper article on the team published two days ago. I can't think of any other search engine that would index something that quickly.
  • by child_of_mercy ( 168861 ) <johnboy@the- r i otact.com> on Wednesday March 13, 2002 @09:52PM (#3160733) Homepage
    the first time I saw googlebots in my access logs i nearly fell over, we hadn't asked anyone to index our site.

    but I have to admite to being very impressed, every month the googlebots come to visit, they don't disrupt the site (the National Library of Australia hit us with a denial of service attack called "Pandora" when they tried to suck down the enitre site in one go, complete with recursing loops), and they rank us very highly (perhaps too highly, there are more authoritative sites in our region, we do more comment).

    anyway I suspect the author forgot that most users of search engines aren't website owners hoping to be indexed, but people doing searches.

    Sites that have been regularly updated for a couple of years tend to be a better source of information than those slapped up yesterday.

This is clearly another case of too many mad scientists, and not enough hunchbacks.

Working...