Skip to content

Commit 7c5cb2a

Browse files
author
Noah Petherbridge
committed
Support unicode_punctuation attribute, prep for 1.10.0 release
1 parent 4c27071 commit 7c5cb2a

12 files changed

Lines changed: 70 additions & 35 deletions

Changes renamed to Changes.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
Revision history for the Python package RiveScript.
22

3+
1.10.0 Feb 16 2016
4+
- Add configurable `unicode_punctuation` attribute to strip out punctuation
5+
when running in UTF-8 mode.
6+
37
1.8.1 Nov 19 2015
48
- Add `@` to the list of characters that disqualifies a trigger from being
59
considered "atomic"

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The MIT License (MIT)
22

3-
Copyright (c) 2015 Noah Petherbridge
3+
Copyright (c) 2016 Noah Petherbridge
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

MANIFEST.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
include setup.py
22
recursive-include rivescript *.py
33
recursive-include docs *.*
4-
include Changes
4+
include Changes.md
55
include README.md
66
include LICENSE
77
recursive-include brain *.rive

README.md

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -59,24 +59,23 @@ RiveScript as a library for Python 2 and 3, respectively.
5959

6060
## UTF-8 SUPPORT
6161

62-
Version 1.05 adds experimental support for UTF-8 in RiveScript. It is not
63-
enabled by default. Enable it by passing a `True` value for the `utf8`
64-
option in the constructor, or by using the `--utf8` (or `-u` for short)
65-
option to the interactive mode.
66-
67-
By default (without UTF-8 mode on), triggers may only contain basic ASCII
68-
characters (no foreign characters), and the user's message is stripped of
69-
all characters except letters/numbers and spaces. This means that, for
70-
example, you can't capture a user's e-mail address in a RiveScript reply,
71-
because of the @ and . characters.
72-
73-
When UTF-8 mode is enabled, these restrictions are lifted. Triggers are only
74-
limited to not contain certain metacharacters like the backslash, and the
75-
user's message is only stripped of backslashes and HTML angled brackets (to
76-
protect from obvious XSS if you use RiveScript in a web application). The
77-
`<star>` tags in RiveScript will capture the user's "raw" input, so you can
78-
write replies to get the user's e-mail address or store foreign characters
79-
in their name.
62+
RiveScript supports Unicode but it is not enabled by default. Enable it by
63+
passing a `True` value for the `utf8` option in the constructor, or by using the
64+
`--utf8` argument to the standalone interactive mode.
65+
66+
In UTF-8 mode, most characters in a user's message are left intact, except for
67+
certain metacharacters like backslashes and common punctuation characters like
68+
`/[.,!?;:]/`.
69+
70+
If you want to override the punctuation regexp, you can provide a new one by
71+
assigning the `unicode_punctuation` attribute of the bot object after
72+
initialization. Example:
73+
74+
```python
75+
import re
76+
bot = RiveScript(utf8=True)
77+
bot.unicode_punctuation = re.compile(r'[.,!?;:]')
78+
```
8079

8180
Regardless of whether UTF-8 mode is on, all input messages given to the bot
8281
are converted (if needed) to Python's `unicode` data type. So, while it's
@@ -127,7 +126,7 @@ The `status` will be `ok` on success, or `error` if there was an error. The
127126
```
128127
The MIT License (MIT)
129128
130-
Copyright (c) 2015 Noah Petherbridge
129+
Copyright (c) 2016 Noah Petherbridge
131130
132131
Permission is hereby granted, free of charge, to any person obtaining a copy
133132
of this software and associated documentation files (the "Software"), to deal

python-rivescript.spec

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
%global desc A scripting language to make it easy to write responses for a chatterbot.
44

55
Name: python-%{srcname}
6-
Version: 1.8.1
6+
Version: 1.10.0
77
Release: 1%{?dist}
88
Summary: %{sum}
99

rivescript/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
# The MIT License (MIT)
66
#
7-
# Copyright (c) 2015 Noah Petherbridge
7+
# Copyright (c) 2016 Noah Petherbridge
88
#
99
# Permission is hereby granted, free of charge, to any person obtaining a copy
1010
# of this software and associated documentation files (the "Software"), to deal
@@ -39,7 +39,7 @@
3939
__docformat__ = 'plaintext'
4040

4141
__all__ = ['rivescript']
42-
__version__ = '1.8.1'
42+
__version__ = '1.10.0'
4343

4444
from .rivescript import RiveScript, RiveScriptError, NoMatchError, NoReplyError,\
4545
ObjectError, DeepRecursionError, NoDefaultRandomTopicError, RepliesNotSortedError

rivescript/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
# The MIT License (MIT)
44
#
5-
# Copyright (c) 2015 Noah Petherbridge
5+
# Copyright (c) 2016 Noah Petherbridge
66
#
77
# Permission is hereby granted, free of charge, to any person obtaining a copy
88
# of this software and associated documentation files (the "Software"), to deal

rivescript/interactive.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
# The MIT License (MIT)
44
#
5-
# Copyright (c) 2015 Noah Petherbridge
5+
# Copyright (c) 2016 Noah Petherbridge
66
#
77
# Permission is hereby granted, free of charge, to any person obtaining a copy
88
# of this software and associated documentation files (the "Software"), to deal

rivescript/python.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
# The MIT License (MIT)
44
#
5-
# Copyright (c) 2015 Noah Petherbridge
5+
# Copyright (c) 2016 Noah Petherbridge
66
#
77
# Permission is hereby granted, free of charge, to any person obtaining a copy
88
# of this software and associated documentation files (the "Software"), to deal

rivescript/rivescript.py

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
# The MIT License (MIT)
44
#
5-
# Copyright (c) 2015 Noah Petherbridge
5+
# Copyright (c) 2016 Noah Petherbridge
66
#
77
# Permission is hereby granted, free of charge, to any person obtaining a copy
88
# of this software and associated documentation files (the "Software"), to deal
@@ -106,12 +106,26 @@ def __init__(self, debug=False, strict=True, depth=50, log="", utf8=False):
106106
str log: Specify a log file for debug output to go to (instead of STDOUT).
107107
int depth: Specify the recursion depth limit.
108108
bool utf8: Enable UTF-8 support."""
109-
# Instance variables.
110-
self._debug = debug # Debug mode
111-
self._log = log # Debug log file
112-
self._utf8 = utf8 # UTF-8 mode
113-
self._strict = strict # Strict mode
114-
self._depth = depth # Recursion depth limit
109+
110+
###
111+
# User configurable fields.
112+
###
113+
114+
# Debugging
115+
self._debug = debug # Debug mode
116+
self._log = log # Debug log file
117+
118+
# Unicode stuff
119+
self._utf8 = utf8 # UTF-8 mode
120+
self.unicode_punctuation = re.compile(r'[.,!?;:]')
121+
122+
# Misc.
123+
self._strict = strict # Strict mode
124+
self._depth = depth # Recursion depth limit
125+
126+
###
127+
# Internal fields.
128+
###
115129
self._gvars = {} # 'global' variables
116130
self._bvars = {} # 'bot' variables
117131
self._subs = {} # 'sub' variables
@@ -1584,6 +1598,7 @@ def _format_message(self, msg, botreply=False):
15841598
# (to protect from obvious XSS attacks).
15851599
if self._utf8:
15861600
msg = re.sub(RE.utf8_meta, '', msg)
1601+
msg = re.sub(self.unicode_punctuation, '', msg)
15871602

15881603
# For the bot's reply, also strip common punctuation.
15891604
if botreply:

0 commit comments

Comments
 (0)