Regexp i naša slova [Arhiva]

Pogčedajte punu verziju : Regexp i naša slova

Dragan Babić

05. 09. 2006., 19:00

Postoji sledeći kôd:

$result = ereg ("^[A-Za-z0-9\ ]+$", $theinput )

On bi trebalo da proverava da li se $input sastoji samo iz unesenih karaktera. Kako uključiti u ovu proveru i naša slova tako da ih "propušta"?

Ovo ne radi:
$result = ereg ("^[A-Za-z0-9ćč\ ]+$", $theinput )

kao ni

$result = ereg ("^[A-Za-z0-9\ |ć|č| ]+$", $theinput )

(ćč su tu samo radi ilustracije, potrebno je da radi naravno za sva naša slova.)

DejanVesic

05. 09. 2006., 20:06

Postoji sledeći kôd:

$result = ereg ("^[A-Za-z0-9\ ]+$", $theinput )

On bi trebalo da proverava da li se $input sastoji samo iz unesenih karaktera. Kako uključiti u ovu proveru i naša slova tako da ih "propušta"?

Jako sam malo radio sa PHP-om, ali ako se dobro sećam, moraš da radiš sa MultiByte verzijama tih funkcija a pre toga da postaviš odgovarajući context:

mb_regex_encoding('UTF-8');

a onda upotrebiš mb_ereg()

ivanhoe

05. 09. 2006., 20:24

Evo iz helpa:
If you want to perform regular expressions on Unicode strings, the PCRE functions will NOT be of any help. You need to use the Multibyte extension : mb_ereg(), mb_eregi(), pb_ereg_replace() and so on. When doing so, be carefull to set the default text encoding to the same encoding used by the text you are searching and replacing in. You can do that with the mb_regex_encoding() function. You will probably also want to set the default encoding for the other mb_* string functions with mb_internal_encoding().
So when dealing with, say, french text, I start with these :

<?php
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
setlocale(LC_ALL, 'fr-fr');
?>

dinke

05. 09. 2006., 20:53

Ja sam se namucio sa ovim pre neki dan kada sam parsovao blogove sa planetoida (i to na cirilici :). Ja sam se nekako snasao bez mb_* f-ja, ali za ovo ce tesko ici bez toga. Sve u svemu, setuj encoding na odgovarajuci i probaj sa [[:alnum:]] tj:

mb_regex_encoding('UTF-8');
mb_ereg()
$result = mb_ereg ("^[[:alnum:][:space:]]+$", $theinput )

Valjda ce raditi, ako ne javi pa da mozgamo dalje :)

Br@nkoR

06. 09. 2006., 13:33

$result = preg_match('/^[\p{L}\d\s]+$/', $theinput);

Dragan Babić

06. 09. 2006., 15:15

Ubode ga Branko! :)

Jovana se zahvaljuje. ;)

godza

06. 09. 2006., 16:24

$result = preg_match('/^[\p{L}\d\s]+$/', $theinput);

koja si ti legenda, cutis, cutis, samo se pojavis i zveknesh resenje :)

dinke

06. 09. 2006., 16:37

http://www.php.net/preg_match

Izzy
17-Aug-2006 12:27

Concerning the German umlauts (and other language-specific chars as accented letters etc.): If you use unicode (utf-8), you can match them easily with the unicode character property \pL (match any unicode letter) and the "u" modifier, so e.g.

<?php preg_match("/[\w\pL]/u",$var); ?>

would really match all "words" in $var - whether they contain umlauts or not. Took me a while to figure this out, so maybe this comment will safe the day for someone else :-)

Lepo covek napisao, al' nema ko da cita. A mislio sam da sve znam o reg. izrazima ;)