libglyr Reference Manual | ||||
---|---|---|---|---|
Top | Description |
#include <glyr/misc.h> size_t glyr_levenshtein_strcmp (const char *string
,const char *other
); size_t glyr_levenshtein_strnormcmp (const char *string
,const char *other
);
Provides different functions to help with certain tasks may run into in certain cases. This includes:
String utils (e.g. glyr_levenshtein_strcmp()
)
size_t glyr_levenshtein_strcmp (const char *string
,const char *other
);
Computes the levenshtein distance betwenn string
and other
.
See Also: http://de.wikipedia.org/wiki/Levenshtein-Distanz
In very simple words this means: glyr_levenshtein_strcmp()
checks if two
string are 'similar', the similarity is returned as int from
0 (== total match) to MAX(strlen(string),strlen(other))
This is fully UTF-8 aware and calls g_utf8_normalize()
beforehand.
Example:
Equilibrium <=> Aqquilibrim will return 3 since: Equilibrium -> Aquilibrium // one Edit: 'E' -> 'A'
Aquilibrium -> Aquilibrim // one Delete: 'u' -> ''
Aquilibrim -> Aqquilibrim // one Insert: '' -> 'q'
|
first string to compare |
|
second string to compare |
Returns : |
the levenshtein distance (number of Edits, Deletes and Inserts needed to turn string to other) |
size_t glyr_levenshtein_strnormcmp (const char *string
,const char *other
);
Same as levenshtein_strcmp, but tries to normalize the two strings as best as it can (includes strdown, stripping html, utf8 normalization, stripping stuff like (CD 1) and Clapton, Eric -> Eric Clapton
For very small strings the function may return very high values in order to prevent accidental matches. See below.
internally glyr_levenshtein_strcmp()
is used, so this is UTF-8 aware as well.
Example:
Adios <=> Weiß or 19 <=> 21 return 4 or 2 for glyr_levenshtein_strcmp()
(and may pass a max threshold of e.g. 4),
but a lot higher value with glyr_levenshtein_strnormcmp()
|
first string to compare |
|
second string to compare |
Returns : |
the levenshtein distance |