Canonical Strings, or, why I like Python
I needed a quick and easy function to map strings into a canonical form. In this case, punctuation, upper/lower case, and word order are not important. i.e. "!$%!@$!@!This!?! is... a test" == "a test this is". Less than 1 minute and I am good to go with...
import re
re_punctuation = re.compile(
r"[`~!@#\$%\^&\*\(\)\-_\+={\[}\]\\|;:\'\",<\.>/\?]")
def GetCanonical(input):
canonical = re_punctuation.sub(" ", input.lower()).split()
canonical.sort()
return ' '.join(canonical)
GetCanonical("This is a test") == GetCanonical("a test this is")
Labels: python


0 Comments:
Post a Comment
<< Home