Misc

Misc — Random utils that may be useful for users of the API for different reasons

Stability Level

Stable, unless otherwise indicated

Synopsis

#include <glyr/misc.h>

size_t              glyr_levenshtein_strcmp             (const char *string,
                                                         const char *other);
size_t              glyr_levenshtein_strnormcmp         (const char *string,
                                                         const char *other);

Description

Provides different functions to help with certain tasks may run into in certain cases. This includes:

Details

glyr_levenshtein_strcmp ()

size_t              glyr_levenshtein_strcmp             (const char *string,
                                                         const char *other);

Computes the levenshtein distance betwenn string and other. See Also: http://de.wikipedia.org/wiki/Levenshtein-Distanz

In very simple words this means: glyr_levenshtein_strcmp() checks if two string are 'similar', the similarity is returned as int from 0 (== total match) to MAX(strlen(string),strlen(other))

This is fully UTF-8 aware and calls g_utf8_normalize() beforehand.

Example:

Note

Equilibrium <=> Aqquilibrim will return 3 since: Equilibrium -> Aquilibrium // one Edit: 'E' -> 'A'

Aquilibrium -> Aquilibrim // one Delete: 'u' -> ''

Aquilibrim -> Aqquilibrim // one Insert: '' -> 'q'

string :

first string to compare

other :

second string to compare

Returns :

the levenshtein distance (number of Edits, Deletes and Inserts needed to turn string to other)

glyr_levenshtein_strnormcmp ()

size_t              glyr_levenshtein_strnormcmp         (const char *string,
                                                         const char *other);

Same as levenshtein_strcmp, but tries to normalize the two strings as best as it can (includes strdown, stripping html, utf8 normalization, stripping stuff like (CD 1) and Clapton, Eric -> Eric Clapton

For very small strings the function may return very high values in order to prevent accidental matches. See below.

internally glyr_levenshtein_strcmp() is used, so this is UTF-8 aware as well.

Example:

Note

Adios <=> Weiß or 19 <=> 21 return 4 or 2 for glyr_levenshtein_strcmp() (and may pass a max threshold of e.g. 4), but a lot higher value with glyr_levenshtein_strnormcmp()

string :

first string to compare

other :

second string to compare

Returns :

the levenshtein distance