Fix edge cases for int() casting of strings #1589
Fix edge cases for int() casting of strings #1589faze-geek wants to merge 2 commits intolcompilers:mainfrom
int() casting of strings #1589Conversation
| ch++; | ||
| } | ||
| ival = std::stoi(c); | ||
| ival = std::stoi(str); |
There was a problem hiding this comment.
Edge case to be looked into :
def f():
print(int(' ')) # Only whitespace without any numeric character.
f()
(lp) C:\Users\kunni\lpython>python try.py
Traceback (most recent call last):
File "C:\Users\kunni\lpython\try.py", line 3, in <module>
f()
File "C:\Users\kunni\lpython\try.py", line 2, in f
print(int(' '))
ValueError: invalid literal for int() with base 10: ' '
(lp) C:\Users\kunni\lpython>src\bin\lpython try.py
std::exception: invalid stoi argument
Been a bit busy with university exams, will handle this soon.
There was a problem hiding this comment.
Does this case throw an error?
There was a problem hiding this comment.
Yes this case throws an error - The stoi catches invalid input and throws an error. I'll have to add a if condition and throws a corresponding semantic error. Thats it right?
There was a problem hiding this comment.
Yes, I think, we have to check and throw a SemanticError.
There was a problem hiding this comment.
Other than this edge case I feel all cases are handled !
|
@czgdp1807 @Thirumalai-Shaktivel Kindly consider this pull request ready for review. The CI logs fail as of now, but I'll handle it at once after having collective reviews. |
|
LGTM! |
Have a look at the CI logs once please. I'm not sure if I had updated the references after the latest commit. |
| assert i32(int("+3")) == 3 | ||
| assert i32(int("\n3")) == 3 | ||
| assert i32(int("3\n")) == 3 | ||
| assert i32(int("\r\t\n3\r\t\n")) == 3 |
There was a problem hiding this comment.
There is an AssertionError among these tests.
Check all the cases if it gives the correct output.
| if (*ch == '-' || *ch == '+') { | ||
| ch++; | ||
| } |
There was a problem hiding this comment.
It seems currently + and - signs are being ignored. We have error at the following place.
(lp) lpython$ git diff
diff --git a/integration_tests/test_str_to_int.py b/integration_tests/test_str_to_int.py
index 765dd1e0f..b67dcab32 100644
--- a/integration_tests/test_str_to_int.py
+++ b/integration_tests/test_str_to_int.py
@@ -3,21 +3,27 @@ from ltypes import i32
def f():
i: i32
i = i32(int("314"))
+ print(i)
assert i == 314
i = i32(int("-314"))
i = i32(int("-314"))
+ print(i)
assert i == -314
s: str
s = "123"
i = i32(int(s))
+ print(i)
assert i == 123
s = "-123"
i = i32(int(s))
+ print(i)
assert i == -123
s = " 1234"
i = i32(int(s))
+ print(i)
assert i == 1234
s = " -1234 "
i = i32(int(s))
+ print(i)
assert i == -1234
assert i32(int(" 3 ")) == 3
@@ -25,5 +31,6 @@ def f():
assert i32(int("\n3")) == 3
assert i32(int("3\n")) == 3
assert i32(int("\r\t\n3\r\t\n")) == 3
+ print("Done")
f()
(lp) lpython$ lpython integration_tests/test_str_to_int.py
314
314
AssertionErrorThere was a problem hiding this comment.
Also, since we check for +/- sign for only at the start, I think the following could fail (it fails currently):
(lp) lpython$ python3 examples/expr2.py
-3
3
(lp) lpython$ lpython examples/expr2.py
semantic error: invalid literal for int() with base 10: ' -3 '
--> examples/expr2.py:4:19
|
4 | print(i32(int(" -3 ")))
| ^^^^^^^^^^^^
Note: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that must be fixed).| if(*ch == ' '){ | ||
| ch++; | ||
| continue; | ||
| } | ||
| if(*ch == '\\'){ | ||
| ch++; | ||
| if (*ch == 'n' || *ch == 'r' || *ch == 't') { | ||
| ch++; | ||
| continue; | ||
| } | ||
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
| } | ||
| if (*ch == '.') { | ||
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
| } | ||
| if (*ch < '0' || *ch > '9') { | ||
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
| } |
There was a problem hiding this comment.
I think the above could be simplified as follows:
| if(*ch == ' '){ | |
| ch++; | |
| continue; | |
| } | |
| if(*ch == '\\'){ | |
| ch++; | |
| if (*ch == 'n' || *ch == 'r' || *ch == 't') { | |
| ch++; | |
| continue; | |
| } | |
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
| } | |
| if (*ch == '.') { | |
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
| } | |
| if (*ch < '0' || *ch > '9') { | |
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
| } | |
| if(*ch == ' ' || *ch == '\n' || *ch == '\r' || *ch == '\t'){ | |
| ch++; | |
| continue; | |
| } | |
| if (*ch < '0' || *ch > '9') { | |
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
| } |
There was a problem hiding this comment.
I think ch holds the whole escaped character. So, we need not check for \n (or others) in two steps. The above approach performs the check in one step.
Example:
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char *mystr = "abc def \n\r\t pqr xyz";
char *ch = mystr;
while (*ch) {
if (*ch == ' ') {
// do not print anything
} else if (*ch == '\n') {
std::cout << "newline character" << std::endl;
} else if (*ch == '\r') {
std::cout << "carriage return" << std::endl;
} else if (*ch == '\t') {
std::cout << "tab character" << std::endl;
} else {
std::cout << *ch << std::endl;
}
ch++;
}
return 0;
}Output:
a
b
c
d
e
f
newline character
carriage return
tab character
p
q
r
x
y
z| if (*ch < '0' || *ch > '9') { | ||
| throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
| } | ||
| str+=*ch; |
There was a problem hiding this comment.
(Minor: I guess two spaces here would look good).
| str+=*ch; | |
| str += *ch; |
All of these cases below throw the
invalid literalsemantic error on master as of now.On branch