4
Python OCR… or how to break CAPTCHAs http://blog.c22.cc/2010/10/12/python-ocr-or-how-to-break-captchas/ After my little stint writing the scr. im PoC script, a few people on T witter reminded me of a blog post that Andreas Riancho from Bonsai-sec wrote back in Febrary . Andreas ! the creator of the excellent W3AF tool" wrote a short Python script to take a CAPT C#A image and perform an $CR on it. As a geek, this pi%ed my interest, bt the one problem & had with it was that the script relied on the pytesser Python library, which is 'indows only(  There were a few isses with that. ). &t *s 'indows o nl y and & pr ef er to a+oi d 'i ndo ws nl es s ther e* s no other choice . Th e pro ect onl y e+e r reached +ersion . .) /. The pro ect has been abandoned since 0ay 1 2o, not wanting to gi+e p on something that looked fn, and also sefl, & started a search for an alternati+e. & %ickly fond that the pytesser Python library is a wrapper arond the tesseract-ocr proect, and that there had been some work on another Python library called Python-Tesseract that looks like it does the ob ! and isn’t platform dependent ". After installing tesseract-ocr ! apt-get install tesseract-ocr on Backtrack " & downloaded the Python-tesseract 3les and modi3ed the script from Andreas Riancho a little ! the actual changes to make things work are minimal". & also changed a few things to get the script to reasonably accrately decode scr.im captcha images. #!/usr/bin/python # [PoC] tesseract OCR script - tuned for scr.im captcha # # Chris John Riley # blo.c.cc # contact ["] c [$O] cc # %/%&/&%& # 'ersion( %.& #

Python Quebrar Captch Python Ocr

  • Upload
    adilson

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

7/25/2019 Python Quebrar Captch Python Ocr

http://slidepdf.com/reader/full/python-quebrar-captch-python-ocr 1/4

Python OCR… or how to break CAPTCHAs

http://blog.c22.cc/2010/10/12/python-ocr-or-how-to-break-captchas/

After my little stint writing the scr.im PoC script, a few people on Twitter reminded me of a

blog post that Andreas Riancho from Bonsai-sec wrote back in Febrary. Andreas !the creator

of the excellent W3AF tool" wrote a short Python script to take a CAPTC#A image and perform

an $CR on it. As a geek, this pi%ed my interest, bt the one problem & had with it was that

the script relied on the pytesser Python library, which is 'indows only(

 There were a few isses with that.

). &t*s 'indows only and & prefer to a+oid 'indows nless there*s no other choice

. The proect only e+er reached +ersion ..)

/. The proect has been abandoned since 0ay 1

2o, not wanting to gi+e p on something that looked fn, and also sefl, & started a search

for an alternati+e. & %ickly fond that the pytesser Python library is a wrapper arond

the tesseract-ocr proect, and that there had been some work on another Python library

called Python-Tesseract that looks like it does the ob !and isn’t platform dependent ".

After installing tesseract-ocr !apt-get install tesseract-ocr on Backtrack " & downloaded the

Python-tesseract 3les and modi3ed the script from Andreas Riancho a little !the actual

changes to make things work are minimal". & also changed a few things to get the script to

reasonably accrately decode scr.im captcha images.

#!/usr/bin/python

# [PoC] tesseract OCR script - tuned for scr.im captcha

#

# Chris John Riley

# blo.c.cc

# contact ["] c [$O] cc

# %/%&/&%&

# 'ersion( %.&

#

7/25/2019 Python Quebrar Captch Python Ocr

http://slidepdf.com/reader/full/python-quebrar-captch-python-ocr 2/4

# Chanelo

# &.%) *nitial +ersion ta,en from "ndreas Rianchos

# eample script 0bonsai-sec.com1

# %.&) "ltered to use Python-tesseract2 tuned imae

# manipulation for scr.im specific captchas

#

from P*3 import *mae

im 4 *mae.open0captcha.5p1 # 6our imae here!

im 4 im.con+ert07R89"71

pidata 4 im.load01

# :a,e the letters bolder for easier reconition

for y in rane0im.si;e[%]1(

 for in rane0im.si;e[&]1(

 if pidata[2 y][&] < =&(

 pidata[2 y] 4 0&2 &2 &2 >>1

for y in rane0im.si;e[%]1(

 for in rane0im.si;e[&]1(

 if pidata[2 y][%] < %?@(

 pidata[2 y] 4 0&2 &2 &2 >>1

7/25/2019 Python Quebrar Captch Python Ocr

http://slidepdf.com/reader/full/python-quebrar-captch-python-ocr 3/4

for y in rane0im.si;e[%]1(

 for in rane0im.si;e[&]1(

 if pidata[2 y][] ) &(

 pidata[2 y] 4 0>>2 >>2 >>2 >>1

im.sa+e07input-blac,.if72 78*A71

# :a,e the imae bier 0needed for OCR1

imBori 4 *mae.open0input-blac,.if1

bi 4 imBori.resi;e00%&&&2 >&&12 *mae.D"RDE1

et 4 7.tif7

bi.sa+e07input-D"RDE7 F et1

# Perform OCR usin tesseract-ocr library

from tesseract import imaeBtoBstrin

imae 4 *mae.open0input-D"RDE.tif1

print imaeBtoBstrin0imae1

A maority of this code is preparation, the actal $CR ob is performed in the 3nal lines sing

the image4to4string call. 2imple isn*t it(

 The abo+e script is tned to the scr.im captcha image. As can be seen by the below

e5amples6

7/25/2019 Python Quebrar Captch Python Ocr

http://slidepdf.com/reader/full/python-quebrar-captch-python-ocr 4/4

As yo can see, after rnning it throgh some 3lters !thanks Andreas", the CAPTC#A becomes

a lot clearer, and signi3cantly easier to $CR. 7+en in this case howe+er, tesseract-ocrsometimes retrns the +ale as '8B#P instead of '89#P. 2till, that*s an easy mistake to

make: and &*m sre with more tweaking, the preparation cold be perfected(

2o, ne5t time somebody says ;we implemented a CAPTCA to pre!ent scripted attacks;, yo

can take it with a pinch of salt(

Links6

• <PoC= scr.im.tesseract.py script >? here

• Breaking 'eak CAPTC#A in 8 @ines of Code >? bonsai-sec.com

• Pytesser >? here

•  Tesseract-$CR >? here

• Python-Tesseract >? here