dibu ajk

7/29/2019 dibu ajk

1/3

Net BASED WEB CRAWLER

WWW provides us with great amounts of useful information electronically available as hypertext. This

large pool of hypertext is changing dynamically and semantically unstructured, making us finding the

related and valuable information difficult. Therefore, a web crawler for automatic discovering of valuable

information from the Web, or Web Mining is important for us nowadays. In reality, this web crawler is a

program, which automatically traverses the web by downloading documents and following links from

page to page. They are mainly used by search engine to gather data for indexing. Other possible

applications include page validation, structural analysis and visualization, update notification, mirroring

and personal web assistants/ agents etc. Web crawlers are also known as spiders, robots,worms etc. time,

so it needs to prioritize its downloads. The high rate of change implies that by the time the crawler is

downloading the last pages from a site, it is very likely that new pages have been added to the site, or that

pages have already been updated or even deleted.

GENERAL-PURPOSE WEB CRAWLER

General-purpose web crawlers collect and process the entire contents of the Web in a centralized location,

so that it can be indexed in advance to be able to respond to many user queries. In the early stage when

the Web is still not very large, simple or random crawling method was enough to index the whole web.

However, after the Web has grown very large, a crawler can have large coverage but rarely refresh its

crawls, or a crawler can have good coverage and fast refresh rates but not have good ranking

functions or support advanced query capabilities that need more processing power. Therefore, more

advance crawling methodologies are needed due to the limited resources like time and network

bandwidth.

BEHAVIOR OF WEB CRAWLER

The behavior of a Web crawler is the outcome of a combination of policies

a selection policy that states which pages to download, a re -visit policy that states when to check for

changes to the pages, a politeness policy that states how to avoid overloading Web sites, and

a parallelization policy that states how to coordinate distributed Web crawlers. dynamic page

generation.

7/29/2019 dibu ajk

2/3

DATA LEKAGE DETECTION

ABSTRACT:The data sets are been distributed to the data distributors which are so called third party agents

or third party users. In some circumstances, the distributed data seems to be leaked by some

unknown users. This study will propose some data distribution strategies in order to increase and

develop the prospects of reducing the data leakages which are not based on the changes done on

the data that is released such as watermarks. Alternately, a realistic data records can be created

which are not genuine in order to identify and detect the data leakages from third party user. A

data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some

of the data is leaked and found in an unauthorized place (e.g., on the web or somebodys laptop).

The distributor must assess the likelihood that the leaked data came from one or more agents, as

opposed to having been independently gathered by other means. We propose data allocation

strategies (across the agents) that improve the probability of identifying leakages. These methods

do not rely on alterations of the released data (e.g., watermarks). In some cases we can also

inject realistic but fake data records to further improve our chances of detecting leakage and

identifying the guilty party.

7/29/2019 dibu ajk

3/3

Image Authentication Using Digital Watermarking

Abstract:

A digital Watermark is a digital signal or pattern inserted into a digital image. Since this signal or

pattern is present in each unaltered copy of the original image, the digital Watermark may also

serve as a digital signature for the copies. The desirable characteristics of a Watermark are

Watermark should be resilient to standard manipulations of any nature.

It should be statistically irremovable

Every Watermarking system consists at least two different parts:

Watermark Embedding Unit

Watermark Detection and Extraction Unit

In this project , we discuss an algorithm for embedding and detecting the Watermark in a still

image. A robust, secure, invisible Watermark is imprinted on the image I, and the Watermarked

image WI, is distributed. The author keeps the original image I. To prove that an image WI' or a

portion of it has been pirated, the author shows that W' contains his Watermark (to this purpose,

he could but does not have to use his original image I). The best a pirate can do is to try to

remove the original W Watermark (which is impossible if the Watermark is secure).

Documents

dibu ajk