Off Topic: Heh, u zadnje vreme baš se mislim da mi master rad bude u vezi sigurnosti različith klasa CAPTCHA testova
Prednosti:
* Teži za računare od tekstualne CAPTCHA-e
* Pristojna baza slika
Mane:
* Kvalitet slika - na nekima je nejasno šta je u stvari na njima - što zbog kvaliteta slike, što zbog same živuljke
* Cena - znajući MS ova usluga neće još dugo biti besplatna
* Sporo - u odnosu na uobičajne CAPTCHA-e treba dovući 12x više slika
* Različitost - pitanje je koliko je vremena prosečnom korisniku potrebno da shvati šta se od njega traži, naročito ako ne zna engleski.
* Sličnost - bez obzira na
zaštitne mehanizme i ovakav sistem se da razbiti: matematika + dobar AI + bot-net + brute-force + jeftina ljudska radna snaga. Pitanje je samo želje i novca.
A da ovaj unos ne bi bio čisto teoritisanje evo malo koda:
Kôd:
"""
A script for testing Asirra CAPTCHA (http://research.microsoft.com/asirra/) security
The idea behind this script is pretty simple:
* First, let's say we have a way to learn which animal is on a picture (cheap
human labor or AI - neural network).
* Then, with the help of this script, we calculate how much of CAPTCHA requests
we need to make in order to have a specific amount of animal pictures.
* After we have a rough estimate of the needed requests we employ our bot-network
to get the pictures.
* We classify the pictures using the selected learning algo.
* Now we train our Agent (AI) to recognize the rest of unknown pictures.
* CAPTCHA PASSED :)
"""
__author__ = 'Petar Maric - http://www.petarmaric.com/'
TOTAL_PICTURES = 2*10**6 # They say "It's powered by over two million photos"
PICTURES_PER_VIEW = 4*3 # CAPTCHA test picutre grid is 4x3 pictures
# List of how much CAPTCHA requests to make
TRIES_LIST = xrange(5*10**4, 5*10**5, 5*10**4)
###############################
# You can look, but no touching
###############################
import random
ALL_PICTURES = xrange(TOTAL_PICTURES)
def num_pictures_learned(num_tries):
"""Returns the number of learned pictures"""
pictures_learned = {}
for i in xrange(num_tries):
for pic in random.sample(ALL_PICTURES, PICTURES_PER_VIEW):
pictures_learned[pic] = 0
return len(pictures_learned)
def main():
for num_tries in TRIES_LIST:
learned = num_pictures_learned(num_tries)
learned_percent = 100.0 * learned/TOTAL_PICTURES
print "Learned %d/%d (%.2f%%) with %d tries." % (
learned,
TOTAL_PICTURES,
learned_percent,
num_tries
)
if __name__ == "__main__":
main()
Rezultat:
Kôd:
Learned 518207/2000000 (25.91%) with 50000 tries.
Learned 901901/2000000 (45.10%) with 100000 tries.
Learned 1187203/2000000 (59.36%) with 150000 tries.
Learned 1396808/2000000 (69.84%) with 200000 tries.
Learned 1553736/2000000 (77.69%) with 250000 tries.
Learned 1669414/2000000 (83.47%) with 300000 tries.
Learned 1754926/2000000 (87.75%) with 350000 tries.
Learned 1818577/2000000 (90.93%) with 400000 tries.
Learned 1865694/2000000 (93.28%) with 450000 tries.