tag:blogger.com,1999:blog-15064703168907422942024-02-20T15:49:55.460-08:00My Mind's PanoramaJaihttp://www.blogger.com/profile/10432792233235750452noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-1506470316890742294.post-74290743604206367792007-10-27T10:41:00.000-07:002007-11-02T11:15:27.952-07:00Death of Google<div style="text-align: justify;">Unarguably Google is the most phenomenal breakthrough to happen on the WWW scene. Tim <span class="blsp-spelling-error" id="SPELLING_ERROR_0">Berner</span> Lee's invention could not have been this efficacious had Google not been founded on September 7<span class="blsp-spelling-error" id="SPELLING_ERROR_1">th</span> 1998. Without Google, the web would just be a huge archipelago of resources, with no way of making out the significant ones from the insignificant ones. Undeniably Google has become imperative for bringing out the intrinsic value of web.<br /><br />What is so unique about Google that makes it the <span class="blsp-spelling-corrected" id="SPELLING_ERROR_2">entrance way</span>(Sorry Yahoo) to the web? Its not that Google was the first search engine to come into existence, there were others(few still are) like Alta Vista, Yahoo etc, but none could reach where Google has today. What makes Google special is its approach to rank search results based on an ingenious algorithm called <a href="http://en.wikipedia.org/wiki/PageRank">Page Rank</a>. Page rank attaches a numeric relevance to a resource(web page) broadly based on two parameters:<br /></div><ol style="text-align: justify;"><li>Number of resources referring(hyper linking) to the given resource.</li><br /><li>Relevance of resources referring to the given resource.</li></ol><div style="text-align: justify;">Basic(and quite powerful) idea behind this algorithm is to find an intermediate path between traditional approaches removing their shortcomings while incorporating their advantageous. There are two predominant approaches from the <span class="blsp-spelling-error" id="SPELLING_ERROR_3">pre</span>-Google era:<br /></div><ol style="text-align: justify;"><li><span style="font-weight: bold;">Web Directory</span>:-Yahoo started off as a <span style="font-style: italic;">Web Directory,</span> where flesh and bone humans <span style="font-weight: bold;">manually</span> index the web with listing down most relevant sites in various categories.In order to find information one can either browse individual categories or can search for a page containing a particular keyword within the indexed sites.This approach, though scoring high on relevance scale, suffers from low scalability, hence rapid growth of web rendered this approach unusable on account of incompleteness of the finished(if it ever can be) product.</li><br /><li><span style="font-weight: bold;">Relevance Based on Keyword Usage</span>:<span style="font-style: italic;">-</span>This approach for <span style="font-weight: bold;">automatic</span> indexing of the web involves sending software agents called web-bots or spiders across the web and then determining the relevance of a web page, for a particular keyword, based on frequency and location(title/body) of the given keyword in the current web page.This approach, though scalable, suffers from low relevance problem as high frequency usage of a keyword does not guarantee relevance. Also, as the relevance to a web page is associated solely based on keyword usage, it is pretty easy, for a rogue webmaster to spoof the keyword usage in a web page to deceive the spiders into treating his page as of high relevance, irrespective of the actual content and hence showing up higher in the search results.<br /></li></ol><p style="text-align: justify;"><span style="font-weight: bold;">The Google Approach</span><br /><span style="font-style: italic;">Page Rank</span> approach used by Google is an elegant intermediary between the two approaches given above. This approach resembles approach <span style="font-style: italic;">two</span> above, in the sense, that it involves sending spiders across the web to automatically index web pages, hence enjoying high scalability, but it does not use keyword usage in a page to determine its relevance, instead it exploits structure of the web to determine popularity/relevance of the web page by finding number and relevance of other web pages referring to it.<br /><br />It is remarkable that how similar this approach is to the <span style="font-style: italic;">Web Directory</span> approach, in the sense that both consider involvement of flesh and bone human beings important in determining the relevance of a resource. But with an important difference; While <span style="font-style: italic;">Web Directories</span> explicitly rank relevant web pages, <span style="font-style: italic;">Page Rank </span>treats a reference(a hyper link) as an implicit vote of relevance. Also the vote is given a higher <span class="blsp-spelling-error" id="SPELLING_ERROR_4">weightage</span> if it comes from a page with high relevance itself. This way it saves mankind of the impossible feat of manually ranking each web page.<br /><span style="font-weight: bold;"><br />Google and Web 2.0</span><br />Web 2.0 is here. Web as a network is being utilized more than it has ever been. Today everything is about user generated content to the extent that Time magazine named <span style="font-style: italic;">You</span> as the person of the year 2006. Be it <span class="blsp-spelling-error" id="SPELLING_ERROR_5"><a href="http://www.youtube.com/">YouTube</a></span>, <a href="http://www.blogger.com/">Blogger</a>, <span class="blsp-spelling-error" id="SPELLING_ERROR_6"><a href="http://www.wikipedia.org/">Wikipedia</a></span>, <span class="blsp-spelling-error" id="SPELLING_ERROR_7"><a href="http://www.myspace.com/">MySpace</a></span>, <a href="http://www.ebay.com/">eBay</a>, <span class="blsp-spelling-error" id="SPELLING_ERROR_8"><a href="http://www.flickr.com/">Flickr</a></span>, <a href="http://www.secondlife.com/">Second Life</a>, <a href="http://del.icio.us/">del.icio.us</a>, <a href="http://twitter.com/">Twitter</a> and so on, it's all about user generated content.<br /><br />How does Google fit in this new form of web, where user generated content heavily outstrips all other form of contents?Is there a need for Google still? First reason these questions arise is because most of Web2.0 applications have a search mechanism of their own;<span class="blsp-spelling-error" id="SPELLING_ERROR_9">YouTube</span>, <span class="blsp-spelling-error" id="SPELLING_ERROR_10">Flickr </span>and del.icio.us use tagging mechanism for all the content entered by its users to determine the type of content. Blogger, <span class="blsp-spelling-error" id="SPELLING_ERROR_11">Wikipedia</span>, <span class="blsp-spelling-error" id="SPELLING_ERROR_12">MySpace</span> an eBay have search engines of their own as they are the whole and sole owners of their content. The argument simply is that if one wants to search for a particular article in <span class="blsp-spelling-error" id="SPELLING_ERROR_13">Wikipedia</span> he/she will prefer searching from <span class="blsp-spelling-error" id="SPELLING_ERROR_14">Wikipedia's</span> inbuilt search engine rather than using Google.<br /><br />Second and more powerful reason which is the main topic of this post is that <span class="blsp-spelling-error" id="SPELLING_ERROR_15">Google's</span> democratic model will have hard time dealing with the authoritarian nature of the <span style="font-style: italic;">User Generated Content Service Providers</span>. One such shocking example which has gone largely unnoticed till now relates to how <span class="blsp-spelling-error" id="SPELLING_ERROR_16">Wikipedia</span> works behind the scene.<br /><br /><span style="font-weight: bold;"><span class="blsp-spelling-error" id="SPELLING_ERROR_17">Wikipedia</span> and Google</span><br /><span class="blsp-spelling-error" id="SPELLING_ERROR_18">Wikipedia</span> is another great thing to happen to web after Google. For the uninitiated, <span class="blsp-spelling-error" id="SPELLING_ERROR_19">Wikipedia</span> is an online encyclopedia with over two million articles in the English version. Its power lies in the fact that all of its content is user(non experts included) generated which is continuously reviewed and commented upon by <span class="blsp-spelling-error" id="SPELLING_ERROR_20">Wikipedia</span> editors for various factors which make an article great. This way <span class="blsp-spelling-error" id="SPELLING_ERROR_21">Wikipedia</span> ensures that it grows infinitely, which is impossible for any other encyclopedia authored by a limited set of experts, but still maintains more than reasonable standard of quality for its articles.<br /><br />So far so good. Though from the last para it looks like that <span class="blsp-spelling-error" id="SPELLING_ERROR_22">Wikipedia</span> is the epitome of democracy. Sadly that's not true. Though the content on <span class="blsp-spelling-error" id="SPELLING_ERROR_23">Wikipedia</span> is generated by its users, the control of that information lies in the hands of <span class="blsp-spelling-error" id="SPELLING_ERROR_24">Wikipedia</span> and when a single authority is in control of information of this extent, results can be devastating if it is not cautious.<br /><br />A careful observation of search results obtained from Google, search term being immaterial, reveals a pattern. Almost every time, first page will contain a link to some article on <span class="blsp-spelling-error" id="SPELLING_ERROR_25">Wikipedia</span>. A striking example :Search <span style="font-style: italic;"><a href="http://www.google.co.in/search?hl=en&q=sergey+brin&meta="><span class="blsp-spelling-error" id="SPELLING_ERROR_26">Sergey</span> <span class="blsp-spelling-error" id="SPELLING_ERROR_27">Brin</span> </a></span>and first result that pops up links to the <span class="blsp-spelling-error" id="SPELLING_ERROR_28">Wikipedia</span> entry, superseding the <span style="font-style: italic;">Google Corporate Information</span> page itself!!!<br /><br />I am, by no means stating that page from Google is more relevant to the search term when compared to <span class="blsp-spelling-error" id="SPELLING_ERROR_30">Wikipedia</span> one, in fact <span class="blsp-spelling-error" id="SPELLING_ERROR_31">Wikipedia</span> page provides much more information than its Google counterpart. All I am trying to say is that it's quite possible for a relatively irrelevant page in <span class="blsp-spelling-error" id="SPELLING_ERROR_32">Wikipedia</span> to show up higher than a more relevant page, owing to the way <span class="blsp-spelling-error" id="SPELLING_ERROR_33">Wikipedia</span> has structured its content. Let me explain. An article in <span class="blsp-spelling-error" id="SPELLING_ERROR_34">Wikipedia</span>(as it's manifested on web) is simply a HTML page with external links to pages outside <span class="blsp-spelling-error" id="SPELLING_ERROR_35">Wikipedia</span> and internal links to pages inside <span class="blsp-spelling-error" id="SPELLING_ERROR_36">Wikipedia</span>. Also <span class="blsp-spelling-error" id="SPELLING_ERROR_37">Wikipedia</span> provides an WYSIWYG(What You See Is What You Get) editor for easy editing of its pages. While this takes the burden of encoding the page in HTML from the author but it also takes away lot of control. Interestingly <span class="blsp-spelling-error" id="SPELLING_ERROR_38">Wikipedia's</span> editor has two different hypertext buttons, one for linking to pages inside <span class="blsp-spelling-error" id="SPELLING_ERROR_39">Wikipedia</span> and the other for creating an external link. Why such differential treatment? Why two buttons where one could have sufficed?<br /><br />The answer to these questions is the most shocking revelation of how irresponsible control of information of this enormous extent in WWW era can lead to authoritarian behavior. A look at HTML source of any <span class="blsp-spelling-error" id="SPELLING_ERROR_40">Wikipedia</span> article would unearth that while all external links have an extra attribute <span style="font-style: italic;">rel='<span class="blsp-spelling-error" id="SPELLING_ERROR_41">nofollow</span>'</span> in their link tag, the same is missing from all the internal links. Presence of <span style="font-style: italic;">rel</span> attribute with value <span style="font-style: italic;"><span class="blsp-spelling-error" id="SPELLING_ERROR_42">nofollow</span></span> discounts the vote of relevance by the source page to the target page, that otherwise would have helped increase the relevance of the target page in the Google's(and few other search engines') database.<br /></p><div style="text-align: justify;">As noted on <a href="http://en.wikipedia.org/wiki/Nofollow">this</a> page <em>rel='<span class="blsp-spelling-error" id="SPELLING_ERROR_43">nofollow</span>'</em> was originally suggested by Google to avoid <a href="http://en.wikipedia.org/wiki/Spamdexing"><span class="blsp-spelling-error" id="SPELLING_ERROR_44">spamdexing</span></a> which is nothing but stealing high relevance or page rank of a site by linking from <span class="blsp-spelling-error" id="SPELLING_ERROR_45">origninal</span> page to an irrelevant spam page. The problem is more prominent on Web 2.0 sites where its pretty easy to author content(and hence create links) on a highly relevant site(say a forum). Some Web 2.0 sites have circumvented the problem by enabling <em>rel='<span class="blsp-spelling-error" id="SPELLING_ERROR_46">nofollow</span>'</em> by default for user generated content, with no way of disabling it. <span class="blsp-spelling-error" id="SPELLING_ERROR_47">Wikipedia</span> is one of them.<br /><br />Sadly <span class="blsp-spelling-error" id="SPELLING_ERROR_48">Wikipedia</span> has enabled <em>rel='<span class="blsp-spelling-error" id="SPELLING_ERROR_49">nofollow</span>'</em> only for external links i.e. links pointing to pages outside <span class="blsp-spelling-error" id="SPELLING_ERROR_50">Wikipedia</span> and not for its internal links i.e. links pointing to pages within <span class="blsp-spelling-error" id="SPELLING_ERROR_51">Wikipedia</span>. What this necessarily means is that while author of an article on <span class="blsp-spelling-error" id="SPELLING_ERROR_52">Wikipedia</span>, might have <span class="blsp-spelling-corrected" id="SPELLING_ERROR_53">painstakingly</span> gone through hundreds of web pages for creating the article and would have happily returned the credit by linking to all the references, which he found highly relevant, but due to links to his references being external he has not helped their page rank by linking back to them because of the <em>rel='<span class="blsp-spelling-error" id="SPELLING_ERROR_54">nofollow</span>'</em> attribute. Also what he might have done inadvertently is, while linking to some of the <span class="blsp-spelling-error" id="SPELLING_ERROR_55">Wikipedia</span> pages from his article, just to adhere to the convention of linking terms to existing articles on <span class="blsp-spelling-error" id="SPELLING_ERROR_56">Wikipedia</span> or while creating the <em>See Also </em>section in his article, consequently passed on the relevance of the current page to all those pages he linked to!!!<br /><br />This type of structuring leads to a <em>Viral Effect</em> where one highly page-ranked page on <span class="blsp-spelling-error" id="SPELLING_ERROR_57">Wikipedia</span> ends up recursively contributing to the page ranks of all the pages it links to and so on. This ensures that all pages on <span class="blsp-spelling-error" id="SPELLING_ERROR_58">Wikipedia</span> have a reasonably good page rank irrespective of the actual content. A slap on <span class="blsp-spelling-error" id="SPELLING_ERROR_59">Google's</span> face. Other than degrading search results it also highly discredits the references used for creating the page by not contributing to their rank. A slap on original content providers' face. What this also means is that, in times to come, <span class="blsp-spelling-error" id="SPELLING_ERROR_60">Wikipedia</span> will invariably be one of the top rank holders for any search term and Google will simply be a menu card with just a single item on the offering. External pages, if required will be accessed from references section in the resulting <span class="blsp-spelling-error" id="SPELLING_ERROR_61">Wikipedia</span> article from the search. Or in other words, Google will be dead.<br /><br /><strong>Corroborations</strong><br />Here are few instances corroborating this authoritarian behaviour(Knowingly or unknowingly) on <span class="blsp-spelling-error" id="SPELLING_ERROR_62">Wikipedia's</span> part and <em>Viral Effect</em> it leads to.<br /></div><ul style="text-align: justify;"><li><a href="http://lists.wikimedia.org/pipermail/wikien-l/2007-January/061137.html">Brion <span class="blsp-spelling-error" id="SPELLING_ERROR_63">Vibber's</span>(<span class="blsp-spelling-error" id="SPELLING_ERROR_64">Wikimedia</span>) email of turning <em>rel='<span class="blsp-spelling-error" id="SPELLING_ERROR_65">nofollow</span>'</em> back on <span class="blsp-spelling-error" id="SPELLING_ERROR_66">Wikipedia</span></a></li><li>A comparison of <a href="http://en.wikipedia.org/wiki/Yahoo%21_Pipes">third</a> and <a href="http://www.readwriteweb.com/archives/yahoo_pipes_web_database.php">fourth</a> ranked search results obtained for search term <em>Yahoo Pipes, </em>with a much taut description on <span class="blsp-spelling-error" id="SPELLING_ERROR_67"><em>readwriteweb</em></span> then the corresponding <span class="blsp-spelling-error" id="SPELLING_ERROR_68">Wikipedia</span> entry. </li><li><span class="blsp-spelling-error" id="SPELLING_ERROR_69">Wikipedia's</span> <em>What links here</em> section that lists all internal articles linking to the current page and hence contributing to its page rank.This section invariably contains a bloated list with many relatively unrelated pages, linking to the current page, out of context.</li></ul><p style="text-align: justify;"><strong>Conclusion</strong><br />This article, by no means, should be treated as an <span class="blsp-spelling-corrected" id="SPELLING_ERROR_70">attempt</span> to bring ignominy on <span class="blsp-spelling-error" id="SPELLING_ERROR_71">Wikipedia's</span> name. <span class="blsp-spelling-error" id="SPELLING_ERROR_72">Wikipedia</span> is a great humanitarian project. I personally refer to it regularly for all my doubts.The purpose of this article is just to bring out the fact that Google will need a renovation in the changing WWW scene and <span class="blsp-spelling-error" id="SPELLING_ERROR_73">Wikipedia</span> could have been a bit more responsible by disclosing the differential treatment of internal and external links to its editor-users and by cautioning them about using internal links. Another solution can be to allow users to enable/disable <em>rel='nofollow' </em>for all the links and differentiate two classes(enabled/disabled) of links by an easily visible feature(say different colors) on the article page and wish that spamdexing will be curbed by <a href="http://en.wikipedia.org/wiki/User:Brion_VIBBER">assuming same good faith</a> that drives Wikipedia. But that's just a suggestion, being an outsider I don't have even faintest of idea of managing world's largest encyclopedia and people at Wikimedia know a lot better and I leave the solution to them. I also hope resolving this is high on <span class="blsp-spelling-error" id="SPELLING_ERROR_74">Wikipedia's</span> priority list and <span class="blsp-spelling-error" id="SPELLING_ERROR_75">Wikimedia</span> people are being held back only by technical difficulties of implementing an ingenious solution like all their solutions in the past.</p><p style="text-align: justify;"><strong></strong> </p>Jaihttp://www.blogger.com/profile/10432792233235750452noreply@blogger.com0