@@ -59,24 +59,23 @@ RiveScript as a library for Python 2 and 3, respectively.
5959
6060## UTF-8 SUPPORT
6161
62- Version 1.05 adds experimental support for UTF-8 in RiveScript. It is not
63- enabled by default. Enable it by passing a ` True ` value for the ` utf8 `
64- option in the constructor, or by using the ` --utf8 ` (or ` -u ` for short)
65- option to the interactive mode.
66-
67- By default (without UTF-8 mode on), triggers may only contain basic ASCII
68- characters (no foreign characters), and the user's message is stripped of
69- all characters except letters/numbers and spaces. This means that, for
70- example, you can't capture a user's e-mail address in a RiveScript reply,
71- because of the @ and . characters.
72-
73- When UTF-8 mode is enabled, these restrictions are lifted. Triggers are only
74- limited to not contain certain metacharacters like the backslash, and the
75- user's message is only stripped of backslashes and HTML angled brackets (to
76- protect from obvious XSS if you use RiveScript in a web application). The
77- ` <star> ` tags in RiveScript will capture the user's "raw" input, so you can
78- write replies to get the user's e-mail address or store foreign characters
79- in their name.
62+ RiveScript supports Unicode but it is not enabled by default. Enable it by
63+ passing a ` True ` value for the ` utf8 ` option in the constructor, or by using the
64+ ` --utf8 ` argument to the standalone interactive mode.
65+
66+ In UTF-8 mode, most characters in a user's message are left intact, except for
67+ certain metacharacters like backslashes and common punctuation characters like
68+ ` /[.,!?;:]/ ` .
69+
70+ If you want to override the punctuation regexp, you can provide a new one by
71+ assigning the ` unicode_punctuation ` attribute of the bot object after
72+ initialization. Example:
73+
74+ ``` python
75+ import re
76+ bot = RiveScript(utf8 = True )
77+ bot.unicode_punctuation = re.compile(r ' [.,!?;: ]' )
78+ ```
8079
8180Regardless of whether UTF-8 mode is on, all input messages given to the bot
8281are converted (if needed) to Python's ` unicode ` data type. So, while it's
@@ -127,7 +126,7 @@ The `status` will be `ok` on success, or `error` if there was an error. The
127126```
128127The MIT License (MIT)
129128
130- Copyright (c) 2015 Noah Petherbridge
129+ Copyright (c) 2016 Noah Petherbridge
131130
132131Permission is hereby granted, free of charge, to any person obtaining a copy
133132of this software and associated documentation files (the "Software"), to deal
0 commit comments