Skip to content

Commit 0ca8d56

Browse files
committed
libstdc++: Fix std::chrono::tzdb to work with vanguard format
I found some issues in the std::chrono::tzdb parser by testing the tzdata "vanguard" format, which uses new features that aren't enabled in the "main" and "rearguard" data formats. Since 2024a the keyword "minimum" is no longer valid for the FROM and TO fields in a Rule line, which means that "m" is now a valid abbreviation for "maximum". Previously we expected either "mi" or "ma". For backwards compatibility, a FROM field beginning with "mi" is still supported and is treated as 1900. The "maximum" keyword is only allowed in TO now, because it makes no sense in FROM. To support these changes the minmax_year and minmax_year2 classes for parsing FROM and TO are replaced with a single years_from_to class that reads both fields. The vanguard format makes use of %z in Zone FORMAT fields, which caused an exception to be thrown from ZoneInfo::set_abbrev because no % or / characters were expected when a Zone doesn't use a named Rule. The ZoneInfo::to(sys_info&) function now uses format_abbrev_str to replace any %z with the current offset. Although format_abbrev_str also checks for %s and STD/DST formats, those only make sense when a named Rule is in effect, so won't occur when ZoneInfo::to(sys_info&) is used. This change also implements a feature that has always been missing from time_zone::_M_get_sys_info: finding the Rule that is active before the specified time point, so that we can correctly handle %s in the FORMAT for the first new sys_info that gets created. This requires implementing a poorly documented feature of zic, to get the LETTERS field from a later transition, as described at https://mm.icann.org/pipermail/tz/2024-April/058891.html In order for this to work we need to be able to distinguish an empty letters field (as used by CE%sT where the variable part is either empty or "S") from "the letters field is not known for this transition". The tzdata file uses "-" for an empty letters field, which libstdc++ was previously replacing with "" when the Rule was parsed. Instead, we now preserve the "-" in the Rule object, so that "" can be used for the case where we don't know the letters (and so need to decide it). libstdc++-v3/ChangeLog: * src/c++20/tzdb.cc (minmax_year, minmax_year2): Remove. (years_from_to): New class replacing minmax_year and minmax_year2. (format_abbrev_str, select_std_or_dst_abbrev): Move earlier in the file. Handle "-" for letters. (ZoneInfo::to): Use format_abbrev_str to expand %z. (ZoneInfo::set_abbrev): Remove exception. Change parameter from reference to value. (operator>>(istream&, Rule&)): Do not clear letters when it contains "-". (time_zone::_M_get_sys_info): Add missing logic to find the Rule in effect before the time point. * testsuite/std/time/tzdb/1.cc: Adjust for vanguard format using "GMT" as the Zone name, not as a Link to "Etc/GMT". * testsuite/std/time/time_zone/sys_info_abbrev.cc: New test.
1 parent 629257b commit 0ca8d56

File tree

3 files changed

+274
-103
lines changed

3 files changed

+274
-103
lines changed

libstdc++-v3/src/c++20/tzdb.cc

+163-102
Original file line numberDiff line numberDiff line change
@@ -342,51 +342,103 @@ namespace std::chrono
342342
friend istream& operator>>(istream&, on_day&);
343343
};
344344

345-
// Wrapper for chrono::year that reads a year, or one of the keywords
346-
// "minimum" or "maximum", or an unambiguous prefix of a keyword.
347-
struct minmax_year
345+
// Wrapper for two chrono::year values, which reads the FROM and TO
346+
// fields of a Rule line. The FROM field is a year and TO is a year or
347+
// one of the keywords "maximum" or "only" (or an abbreviation of those).
348+
// For backwards compatibility, the keyword "minimum" is recognized
349+
// for FROM and interpreted as 1900.
350+
struct years_from_to
348351
{
349-
year& y;
352+
year& from;
353+
year& to;
350354

351-
friend istream& operator>>(istream& in, minmax_year&& y)
355+
friend istream& operator>>(istream& in, years_from_to&& yy)
352356
{
353-
if (ws(in).peek() == 'm') // keywords "minimum" or "maximum"
357+
string s;
358+
auto c = ws(in).peek();
359+
if (c == 'm') [[unlikely]] // keyword "minimum"
354360
{
355-
string s;
356-
in >> s; // extract the rest of the word, but only look at s[1]
357-
if (s[1] == 'a')
358-
y.y = year::max();
359-
else if (s[1] == 'i')
360-
y.y = year::min();
361-
else
362-
in.setstate(ios::failbit);
361+
in >> s; // extract the rest of the word
362+
yy.from = year(1900);
363+
}
364+
else if (int num = 0; in >> num) [[likely]]
365+
yy.from = year{num};
366+
367+
c = ws(in).peek();
368+
if (c == 'm') // keyword "maximum"
369+
{
370+
in >> s; // extract the rest of the word
371+
yy.to = year::max();
372+
}
373+
else if (c == 'o') // keyword "only"
374+
{
375+
in >> s; // extract the rest of the word
376+
yy.to = yy.from;
363377
}
364378
else if (int num = 0; in >> num)
365-
y.y = year{num};
379+
yy.to = year{num};
380+
366381
return in;
367382
}
368383
};
369384

370-
// As above for minmax_year, but also supports the keyword "only",
371-
// meaning that the TO year is the same as the FROM year.
372-
struct minmax_year2
385+
bool
386+
select_std_or_dst_abbrev(string& abbrev, minutes save)
373387
{
374-
minmax_year to;
375-
year from;
388+
if (size_t pos = abbrev.find('/'); pos != string::npos)
389+
{
390+
// Select one of "STD/DST" for standard or daylight.
391+
if (save == 0min)
392+
abbrev.erase(pos);
393+
else
394+
abbrev.erase(0, pos + 1);
395+
return true;
396+
}
397+
return false;
398+
}
376399

377-
friend istream& operator>>(istream& in, minmax_year2&& y)
378-
{
379-
if (ws(in).peek() == 'o') // keyword "only"
380-
{
381-
string s;
382-
in >> s; // extract the whole keyword
383-
y.to.y = y.from;
384-
}
385-
else
386-
in >> std::move(y.to);
387-
return in;
388-
}
389-
};
400+
// Set the sys_info::abbrev string by expanding any placeholders.
401+
void
402+
format_abbrev_str(sys_info& info, string_view letters = {})
403+
{
404+
if (size_t pos = info.abbrev.find('%'); pos != string::npos)
405+
{
406+
if (info.abbrev[pos + 1] == 's')
407+
{
408+
// Expand "%s" to the variable part, given by Rule::letters.
409+
if (letters == "-")
410+
info.abbrev.erase(pos, 2);
411+
else
412+
info.abbrev.replace(pos, 2, letters);
413+
}
414+
else if (info.abbrev[pos + 1] == 'z')
415+
{
416+
// Expand "%z" to the UT offset as +/-hh, +/-hhmm, or +/-hhmmss.
417+
hh_mm_ss<seconds> t(info.offset);
418+
string z(1, "+-"[t.is_negative()]);
419+
long val = t.hours().count();
420+
int digits = 2;
421+
if (int m = t.minutes().count())
422+
{
423+
digits = 4;
424+
val *= 100;
425+
val += m;
426+
if (int s = t.seconds().count())
427+
{
428+
digits = 6;
429+
val *= 100;
430+
val += s;
431+
}
432+
}
433+
auto sval = std::to_string(val);
434+
z += string(digits - sval.size(), '0');
435+
z += sval;
436+
info.abbrev.replace(pos, 2, z);
437+
}
438+
}
439+
else
440+
select_std_or_dst_abbrev(info.abbrev, info.save);
441+
}
390442

391443
// A time zone information record.
392444
// Zone NAME STDOFF RULES FORMAT [UNTIL]
@@ -462,19 +514,17 @@ namespace std::chrono
462514
info.offset = offset();
463515
info.save = minutes(m_save);
464516
info.abbrev = format();
517+
format_abbrev_str(info); // expand %z
465518
return true;
466519
}
467520

468521
private:
469522
friend class time_zone;
470523

471524
void
472-
set_abbrev(const string& abbrev)
525+
set_abbrev(string abbrev)
473526
{
474-
// In practice, the FORMAT field never needs expanding here.
475-
if (abbrev.find_first_of("/%") != abbrev.npos)
476-
__throw_runtime_error("std::chrono::time_zone: invalid data");
477-
m_buf = abbrev;
527+
m_buf = std::move(abbrev);
478528
m_pos = 0;
479529
m_expanded = true;
480530
}
@@ -544,9 +594,7 @@ namespace std::chrono
544594

545595
// Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S
546596

547-
in >> quoted(rule.name)
548-
>> minmax_year{rule.from}
549-
>> minmax_year2{rule.to, rule.from};
597+
in >> quoted(rule.name) >> years_from_to{rule.from, rule.to};
550598

551599
if (char type; in >> type && type != '-')
552600
in.setstate(ios::failbit);
@@ -557,7 +605,7 @@ namespace std::chrono
557605
if (save_time.indicator != at_time::Wall)
558606
{
559607
// We don't actually store the save_time.indicator, because we
560-
// assume that it's always deducable from the actual offset value.
608+
// assume that it's always deducible from the offset value.
561609
auto expected = save_time.time == 0s
562610
? at_time::Standard
563611
: at_time::Daylight;
@@ -567,8 +615,6 @@ namespace std::chrono
567615
rule.save = save_time.time;
568616

569617
in >> rule.letters;
570-
if (rule.letters == "-")
571-
rule.letters.clear();
572618
return in;
573619
}
574620

@@ -719,58 +765,6 @@ namespace std::chrono
719765
#endif // TZDB_DISABLED
720766
};
721767

722-
#ifndef TZDB_DISABLED
723-
namespace
724-
{
725-
bool
726-
select_std_or_dst_abbrev(string& abbrev, minutes save)
727-
{
728-
if (size_t pos = abbrev.find('/'); pos != string::npos)
729-
{
730-
// Select one of "STD/DST" for standard or daylight.
731-
if (save == 0min)
732-
abbrev.erase(pos);
733-
else
734-
abbrev.erase(0, pos + 1);
735-
return true;
736-
}
737-
return false;
738-
}
739-
740-
// Set the sys_info::abbrev string by expanding any placeholders.
741-
void
742-
format_abbrev_str(sys_info& info, string_view letters = {})
743-
{
744-
if (size_t pos = info.abbrev.find("%s"); pos != string::npos)
745-
{
746-
// Expand "%s" to the variable part, given by Rule::letters.
747-
info.abbrev.replace(pos, 2, letters);
748-
}
749-
else if (size_t pos = info.abbrev.find("%z"); pos != string::npos)
750-
{
751-
// Expand "%z" to the UT offset as +/-hh, +/-hhmm, or +/-hhmmss.
752-
hh_mm_ss<seconds> t(info.offset);
753-
string z(1, "+-"[t.is_negative()]);
754-
long val = t.hours().count();
755-
if (minutes m = t.minutes(); m != m.zero())
756-
{
757-
val *= 100;
758-
val += m.count();
759-
if (seconds s = t.seconds(); s != s.zero())
760-
{
761-
val *= 100;
762-
val += s.count();
763-
}
764-
}
765-
z += std::to_string(val);
766-
info.abbrev.replace(pos, 2, z);
767-
}
768-
else
769-
select_std_or_dst_abbrev(info.abbrev, info.save);
770-
}
771-
}
772-
#endif // TZDB_DISABLED
773-
774768
// Implementation of std::chrono::time_zone::get_info(const sys_time<D>&)
775769
sys_info
776770
time_zone::_M_get_sys_info(sys_seconds tp) const
@@ -839,12 +833,72 @@ namespace std::chrono
839833
info.abbrev = ri.format();
840834

841835
string_view letters;
842-
if (i != infos.begin())
836+
if (i != infos.begin() && i[-1].expanded())
837+
letters = i[-1].next_letters();
838+
839+
if (letters.empty())
843840
{
844-
if (i[-1].expanded())
845-
letters = i[-1].next_letters();
846-
// XXX else need to find Rule active before this time and use it
847-
// to know the initial offset, save, and letters.
841+
sys_seconds t = info.begin - seconds(1);
842+
const year_month_day date(chrono::floor<days>(t));
843+
844+
// Try to find a Rule active before this time, to get initial
845+
// SAVE and LETTERS values. There may not be a Rule for the period
846+
// before the first DST transition, so find the earliest DST->STD
847+
// transition and use the LETTERS from that.
848+
const Rule* active_rule = nullptr;
849+
sys_seconds active_rule_start = sys_seconds::min();
850+
const Rule* first_std = nullptr;
851+
for (const auto& rule : rules)
852+
{
853+
if (rule.save == minutes(0))
854+
{
855+
if (!first_std)
856+
first_std = &rule;
857+
else if (rule.from < first_std->from)
858+
first_std = &rule;
859+
else if (rule.from == first_std->from)
860+
{
861+
if (rule.start_time(rule.from, {})
862+
< first_std->start_time(first_std->from, {}))
863+
first_std = &rule;
864+
}
865+
}
866+
867+
year y = date.year();
868+
869+
if (y > rule.to) // rule no longer applies at time t
870+
continue;
871+
if (y < rule.from) // rule doesn't apply yet at time t
872+
continue;
873+
874+
sys_seconds rule_start;
875+
876+
seconds offset{}; // appropriate for at_time::Universal
877+
if (rule.when.indicator == at_time::Wall)
878+
offset = info.offset;
879+
else if (rule.when.indicator == at_time::Standard)
880+
offset = ri.offset();
881+
882+
// Time the rule takes effect this year:
883+
rule_start = rule.start_time(y, offset);
884+
885+
if (rule_start >= t && rule.from < y)
886+
{
887+
// Try this rule in the previous year.
888+
rule_start = rule.start_time(--y, offset);
889+
}
890+
891+
if (active_rule_start < rule_start && rule_start < t)
892+
{
893+
active_rule_start = rule_start;
894+
active_rule = &rule;
895+
}
896+
}
897+
898+
if (active_rule)
899+
letters = active_rule->letters;
900+
else if (first_std)
901+
letters = first_std->letters;
848902
}
849903

850904
const Rule* curr_rule = nullptr;
@@ -2069,9 +2123,11 @@ namespace std::chrono
20692123
istringstream in2(std::move(rules));
20702124
in2 >> rules_time;
20712125
inf.m_save = duration_cast<minutes>(rules_time.time);
2126+
// If the FORMAT is "STD/DST" then we can choose the right one
2127+
// now, so that we store a shorter string.
20722128
select_std_or_dst_abbrev(fmt, inf.m_save);
20732129
}
2074-
inf.set_abbrev(fmt);
2130+
inf.set_abbrev(std::move(fmt));
20752131
}
20762132

20772133
// YEAR [MONTH [DAY [TIME]]]
@@ -2082,7 +2138,12 @@ namespace std::chrono
20822138
abbrev_month m{January};
20832139
int d = 1;
20842140
at_time t{};
2141+
// XXX DAY should support ON format, e.g. lastSun or Sun>=8
20852142
in >> m >> d >> t;
2143+
// XXX UNTIL field should be interpreted
2144+
// "using the rules in effect just before the transition"
2145+
// so might need to store as year_month_day and hh_mm_ss and only
2146+
// convert to a sys_time once we know the offset in effect.
20862147
inf.m_until = sys_days(year(y)/m.m/day(d)) + seconds(t.time);
20872148
}
20882149
else

0 commit comments

Comments
 (0)