dibu ajk

Embed Size (px)

Citation preview

  • 7/29/2019 dibu ajk

    1/3

    Net BASED WEB CRAWLER

    WWW provides us with great amounts of useful information electronically available as hypertext. This

    large pool of hypertext is changing dynamically and semantically unstructured, making us finding the

    related and valuable information difficult. Therefore, a web crawler for automatic discovering of valuable

    information from the Web, or Web Mining is important for us nowadays. In reality, this web crawler is a

    program, which automatically traverses the web by downloading documents and following links from

    page to page. They are mainly used by search engine to gather data for indexing. Other possible

    applications include page validation, structural analysis and visualization, update notification, mirroring

    and personal web assistants/ agents etc. Web crawlers are also known as spiders, robots,worms etc. time,

    so it needs to prioritize its downloads. The high rate of change implies that by the time the crawler is

    downloading the last pages from a site, it is very likely that new pages have been added to the site, or that

    pages have already been updated or even deleted.

    GENERAL-PURPOSE WEB CRAWLER

    General-purpose web crawlers collect and process the entire contents of the Web in a centralized location,

    so that it can be indexed in advance to be able to respond to many user queries. In the early stage when

    the Web is still not very large, simple or random crawling method was enough to index the whole web.

    However, after the Web has grown very large, a crawler can have large coverage but rarely refresh its

    crawls, or a crawler can have good coverage and fast refresh rates but not have good ranking

    functions or support advanced query capabilities that need more processing power. Therefore, more

    advance crawling methodologies are needed due to the limited resources like time and network

    bandwidth.

    BEHAVIOR OF WEB CRAWLER

    The behavior of a Web crawler is the outcome of a combination of policies

    a selection policy that states which pages to download, a re -visit policy that states when to check for

    changes to the pages, a politeness policy that states how to avoid overloading Web sites, and

    a parallelization policy that states how to coordinate distributed Web crawlers. dynamic page

    generation.

  • 7/29/2019 dibu ajk

    2/3

    DATA LEKAGE DETECTION

    ABSTRACT:The data sets are been distributed to the data distributors which are so called third party agents

    or third party users. In some circumstances, the distributed data seems to be leaked by some

    unknown users. This study will propose some data distribution strategies in order to increase and

    develop the prospects of reducing the data leakages which are not based on the changes done on

    the data that is released such as watermarks. Alternately, a realistic data records can be created

    which are not genuine in order to identify and detect the data leakages from third party user. A

    data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some

    of the data is leaked and found in an unauthorized place (e.g., on the web or somebodys laptop).

    The distributor must assess the likelihood that the leaked data came from one or more agents, as

    opposed to having been independently gathered by other means. We propose data allocation

    strategies (across the agents) that improve the probability of identifying leakages. These methods

    do not rely on alterations of the released data (e.g., watermarks). In some cases we can also

    inject realistic but fake data records to further improve our chances of detecting leakage and

    identifying the guilty party.

  • 7/29/2019 dibu ajk

    3/3

    Image Authentication Using Digital Watermarking

    Abstract:

    A digital Watermark is a digital signal or pattern inserted into a digital image. Since this signal or

    pattern is present in each unaltered copy of the original image, the digital Watermark may also

    serve as a digital signature for the copies. The desirable characteristics of a Watermark are

    Watermark should be resilient to standard manipulations of any nature.

    It should be statistically irremovable

    Every Watermarking system consists at least two different parts:

    Watermark Embedding Unit

    Watermark Detection and Extraction Unit

    In this project , we discuss an algorithm for embedding and detecting the Watermark in a still

    image. A robust, secure, invisible Watermark is imprinted on the image I, and the Watermarked

    image WI, is distributed. The author keeps the original image I. To prove that an image WI' or a

    portion of it has been pirated, the author shows that W' contains his Watermark (to this purpose,

    he could but does not have to use his original image I). The best a pirate can do is to try to

    remove the original W Watermark (which is impossible if the Watermark is secure).