25
Lets Unleash The Secret Behind Search Engine Giant Presented by: Prakhar Gethe (CEO and Co-Founder Team Zenith )

How google works and functions: A complete Approach

Embed Size (px)

Citation preview

Page 1: How google works and functions: A complete Approach

Lets Unleash The Secret Behind Search Engine Giant

Presented by:Prakhar Gethe(CEO and Co-Founder Team Zenith)

Page 2: How google works and functions: A complete Approach

Facts About Google How A Search Engine Works ** Types Of search engine How Google Works ** Google Architecture ** Google Web Crawler ** Google indexer ** Google Query Processor Goole Working Info graphic What Is Seo ** SEO techniques What Is Google Digging ** Methods Of Google Digging Technology Requirements Of Creating

Search Engine

TOPICS TO BE COVERED

Page 3: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 3

FACTS ABOUT GOOGLE• Google was founded by Larry Page and Sergey Brin while they were Ph.D.

students at Stanford University• Founded on 4th september 1998.• Google uses approximately 20 petabytes of user-generated data every day.

(Petabytes are estimated at 10 to the 15th power bytes.) • In June 2006, the Oxford English Dictionary (OED) added “Google” as a

verb• A Google employee is named a “Googler” while a new team member is

called a “Noogler

• The name ‘Google’ was an accident. A spelling mistake made by the original founders who thought they were going for ‘Googol’

• The prime reason the Google home page is so bare is due to the fact that the founders didn’t know HTML and just wanted a quick interface. In fact it was noted that the submit button was a long time coming and hitting the RETURN key was the only way to burst Google into life.

• Google has the largest network of translators in the world• On average, Google has acquired more than one company every week

since 2010.

Page 4: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 4

• On average, Google has acquired more than one company every week since 2010.

• Google might be the only company with the explicit goal to REDUCE the amount of time people spend on its site.

• The world watches 450,000 years of YouTube videos each month, over twice as long as modern humans have existed.

• Google has photographed 5 million miles of road for its Street View maps

• Google.com, home to arguably the world's most important internet company, contains 23 markup errors in its code.

Page 5: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 5

HOW A SEARCH ENGINE WORKS

A program that searches for and identifies items in a database that correspond to keywords or characters specified by the user, used especially for finding particular sites on the Internet.Or simply A search engine is a database system designed to index and categorize internet addresses, otherwise known as URLs.

FACTS ABOUT SEARCH ENGINES

Search Engine PopularityThe most popular search engines on the web:Google 55.2% Yahoo 21.7% MSN Search 9.6% AOL Search 3.8% Terra Lycos 2.6% AltaVista 2.2% AskJeeves 1.5%

Page 6: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 6

Number of Words Used in Search Phrases2-word phrases 32.58% 3-word phrase 25.61% 1-word phrases 19.02% 4-word phrases 12.83% 5-word phrases 5.64% 6-word phrases 2.32% 7-word phrases 0.98%When People Search

The breakdown of surfer traffic by day of the week: Monday 15.31% Tuesday 15.23% Thursday 14.73% Wednesday 14.62% Friday 14.48% Saturday 13.08% Sunday 12.55%Screen Resolutions

The most popular screen resolutions on the web: 1024 x 768 48.3% 800 x 600 31.7% 1280 x 1024 13.6% 1152 x 864 4.0% 640 x 480 1.0% 1600 x 1200 1.0% 1152 x 870 0.2%

Page 7: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 7

TYPES OF SEARCH ENGINES

Automatic: These search engines are based on information that is collected, sorted and analyzed by software programs, commonly referred to as "robots", "spiders", or "crawlers". These spiders crawl through web pages collecting information which is then analyzed and categorized into an "index". When you conduct a search using one of these search engines, you are really searching the index. The results of the search will depend on the contents of that index and its relevancy to your query.

Types of search engine

Automatic Directories PPCMeta

Page 8: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 8

Directories: A directory is a searchable subject guide of Web sites that have been reviewed and compiled by human editors. These editors decide which sites to list, and, in which categories.

Meta: Meta search engines use automated technology to gather information from a spider and then deliver a summary of that information as the results of a search to the end user.

Pay-per-click (PPC): A search engine that determines ranking according to the dollar amount you pay for each click from that search engine to your site. Examples of PPC search engines are Overture.com and FindWhat.com. The highest ranking goes to the highest bidder.

Page 9: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 9

How Do Search Engines Rank Web Pages?

When ranking Web pages, search engines follow specific criteria, which may vary from one search engine to another. Naturally, they want to generate the most popular (or relevant) pages at the top of their list. Search engines will look at keywords and phrases, content, HTML meta tags and link popularity -- just to name a few -- to determine the value of the Web page.

How Do Search Engines Work?

Search engines compile their databases with the aid of spiders (a.k.a. robots). These search engine spiders crawl the Internet from link to link, identifying Web pages. Once search engine spiders find a Web site, they index the content on those pages, making the URLs available to Internet users.

In turn, owners of Websites submit their URLs to search engines for crawling and, ultimately, inclusion in their databases. This is known as search engine submission.

When you use search engines to find something on the Internet, you're Basically asking the search engine to scan its database and match your keywords and phrases with the content of the URLs they have on file at that time. Spiders regularly return to the URLs they index to look for changes.When changes occur, the index is updated to reflect the new information.

Page 10: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 10

HOW GOOGLE WORKS

Google runs on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. Parallel processing is a method of computation in which many calculations can be performed simultaneously, significantly speeding up data processing. Google has three distinct parts:

Googlebot, a web crawler that finds and fetches web pages. The indexer that sorts every word on every page and stores the resulting

index of words in a huge database. The query processor, which compares your search query to the index and

recommends the documents that it considers most relevant.

Page 11: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 11

Google Architecture

Various Data Structures Used In

Repository Lexicon Document Index Hit Lists Forward Index Inverted Index

Page 12: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 12

Googlebot, Google’s Web CrawlerGooglebot is Google’s web crawling robot, which finds and retrieves pages on the web and hands them off to the Google indexer. It’s easy to imagine Googlebot as a little spider scurrying across the strands of cyberspace, but in reality Googlebot doesn’t traverse the web at all. It functions much like your web browser, by sending a request to a web server for a web page, downloading the entire page, then handing it off to Google’s indexer.Googlebot consists of many computers requesting and fetching pages much more quickly than you can with your web browser. In fact, Googlebot can request thousands of different pages simultaneously. To avoid overwhelming web servers, or crowding out requests from human users, Googlebot deliberately makes requests of each individual web server more slowly than it’s capable of doing.

Page 13: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 13

Google’s Indexer

Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google’s index database. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms.

To improve search performance, Google ignores (doesn’t index) common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters). Stop words are so common that they do little to narrow a search, and therefore they can safely be discarded. The indexer also ignores some punctuation and multiple spaces, as well as converting all letters to lowercase, to improve Google’s performance.

Page 14: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 14

Traditional method Google Caffeine

Page 15: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 15

Google’s Query Processor

The query processor has several parts, including the user interface (search box), the “engine” that evaluates queries and matches them to relevant documents, and the results formatter.

PageRank is Google’s system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a page with a lower PageRank.

Google considers over a hundred factors in computing a PageRank and determining which documents are most relevant to a query, including the popularity of the page, the position and size of the search terms within the page, and the proximity of the search terms to one another on the page. A patent application discusses other factors that Google considers when ranking a page.

Page 16: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 16

Let’s see how Google processes a query.

Page 17: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 17

Page 18: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 18

SEO-Search Engine OptimizationSearch Engine Optimization is the process of improving the visibility of a website on organic ("natural" or un-paid) search engine result pages (SERPs), by incorporating search engine friendly elements into a website. A successful search engine optimization campaign will have, as part of the improvements, carefully select, relevant, keywords which the on-page optimization will be designed to make prominent for search engine algorithms.

Search engine optimization is broken down into two basic areas: on-page, and off-page optimization. On-page optimization refers to website elements which

comprise a web page, such as HTML code, textual content, and images.

Off-page optimization refers, predominantly, to backlinks (links pointing to the site which is being optimized, from other relevant websites).

Page 19: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 19

Optimize your title tags

Create compelling meta descriptions

Utilize keyword-rich headings

Add ALT tags to your images Create a sitemap

Build internal links between pages

Update your site regularly

Image Optimization URL Optimization Directory Submission Commenting Social Networking

Guest Posting

SEO cont.…

Various SEO techniques:-

Page 20: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 20

GOOGLE DIGGING The art of searching any content using google is called Google digging or the art of googling or sometimes even Google hacking

Google Dorks or search techniques which can be used to refine our search

1) Intitle :2) Filetype :3) Site :4) Related5) Inurl :

Page 21: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 21

GOOGLE DIGGING cont….

Page 22: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 22

Technology Requirements Of Creating Search Engine

For back-end:- Asp.Net PHP Python Perl Or your customized language

For database• MySql• Oracle technology• Any Nosql Databases• Or any customized database

There are various technologies which can be used to create search engine and web crawlers ,Bots and query indexer.

For Front-End• Javascript• Xml• JSON• Dart etc.

Page 23: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 23

Source : Wikipedia

Cont……

Page 24: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 24

Lets thank you to Google for such a wonderful technology and search

engine

Page 25: How google works and functions: A complete Approach

04/11/2023 PSIT CS SOCIETY 25

Questions, comments, feedbacks arewelcome