Skip to content

Use YEAR + ERA for XSD 1.0 calendar year numbering#1685

Open
gdesrosiers1805 wants to merge 2 commits into
apache:mainfrom
gdesrosiers1805:bugfix/daffodil-3084
Open

Use YEAR + ERA for XSD 1.0 calendar year numbering#1685
gdesrosiers1805 wants to merge 2 commits into
apache:mainfrom
gdesrosiers1805:bugfix/daffodil-3084

Conversation

@gdesrosiers1805

@gdesrosiers1805 gdesrosiers1805 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Change date/dateTime year handling from astronomical numbering
(uuuu/EXTENDED_YEAR) to year-of-era semantics (yyyy with YEAR/ERA), so
years conform to XSD 1.0 (no year zero; 1 BCE = -0001). This covers the
implicit/default calendar pattern, lexical infoset conversion, facet
parsing/narrowing, and the binaryCalendarEpoch.

Parsing, formatting, comparison, and XPath year extraction now read years
via YEAR/ERA instead of the astronomical EXTENDED_YEAR. Facet conversion and
the binaryCalendarEpoch route through DFDLCalendarConversion rather than
hardcoded calendar patterns; the latter gains negative-year support.

BCE years require an explicit era designator (G) with yyyy-based patterns.
Explicit uuuu patterns still parse astronomically, but extracted years and
infoset output are always rendered in XSD 1.0 numbering.

DAFFODIL-3084

case class DFDLDate(
calendar: Calendar,
override val hasTimeZone: Boolean,
prolepticBCE: Boolean = false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the need for this proleptic flag. Doesn't the Calendar contain all the necessary era information in the ERA field?

We also still have a number of references in our codebase to EXTENDED_YEAR and patterns with uuuu--that feels like a bug to me. Should we instead be setting/getting the YEAR and ERA components everywhere and using yyyy in all of our patterns? And removing all references to EXTENDED_YEAR and uuuu?

So the idea is we let ICU parse things according to dfdl:calendarPattern. When we want to create a string for the infoset or in one of the path fn:year-from-date*() functions we get the YEAR and ERA fields, and if the ERA field is BC we negate the year. Whether or not the pattern has a G in it shouldn't matter I think?. And for unparse, we read a date from the infoset and set the YEAR to the absolute value of the year and set the ERA to BC if the year was negative. It is a little extra work (which it looks like you already have logic for in a few places), but it avoids the potential of confusing EXTENDED_YEAR with YEAR. Seems we don't really want EXTENDED_YEAR or uuuu appearing anywhere in our codebase, except for tests making sure 'uuuu' patterns work correct.

Note I think this change would mean that it's impossible to use dfdl:calendarPattern to model BC years using yyyy without a G indicator (which is maybe the original reason for switching to uuuu and EXTENDED_YEAR many years ago?). But I think that's correct. Many DFDL semantics are based on ICU, and that is just a limitation of the yyyy pattern. If you want a pattern that is of the form yyyy-MM-dd without a G era indicator, and you want negative years to indicate the era, then you must use uuuu with the assumption that your years are astronomical. If your years aren't astronomical, then DFDL can't currently model them, unless it's added a new feature to support negative yyyy patterns.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the prolepticBCE flag is not recording the ERA. It looks at which pattern is used, which is why it checks that the pattern does not contain a G. No G means there is no explicit era designator, so for a BCE value the era was carried by the sign and the year was parsed astronomically (the uuuu form). If it is BCE but the pattern contains G, the era is named explicitly and ICU parses it correctly, so the flag stays false and nothing is adjusted. But if it is BCE and the pattern has no G, the date is represented astronomically which has a year zero and so is off by one from XSD 1.0 for BCE, and the flag is set true to mark that the read-out needs an adjustment to match XSD 1.0. Perhaps a better name for that flag would have been astronomicalYear?

The need for that flag stemmed from the fact that I changed EXTENDED_YEAR to YEAR + ERA (for the most part, other than where special logic was needed to help correct the astronomical representation of BCE dates), but still kept the uuuu pattern as the default/implicit pattern. So if we change the default pattern to yyyy, it eliminates the need for that. The only issue, as you already mentioned, is that we won't be able to represent BCE years without the G marker, which seems to be the correct thing to do. So I'll update any BCE tests that previously used the implicit uuuu pattern to explicitly use a yyyy pattern with a G marker and era-worded input (e.g. BC), since negative years won't parse under yyyy on their own. This should simplify things — it removes the flag, the read-side and unparse adjustments, and the EXTENDED_YEAR/uuuu references.

@gdesrosiers1805 gdesrosiers1805 force-pushed the bugfix/daffodil-3084 branch 2 times, most recently from b1e550a to 4ff33bc Compare June 17, 2026 05:48

@stevedlawrence stevedlawrence left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like, this feels liek the write approach. But I think we need to take it a bit farther and remove remaining patterns.

Comment thread daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/Facets.scala Outdated
Comment thread daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/Facets.scala Outdated
Comment thread daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/Facets.scala Outdated
Comment thread daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/RestrictionUnion.scala Outdated
Comment thread daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/RestrictionUnion.scala Outdated
Comment thread daffodil-core/src/main/scala/org/apache/daffodil/lib/calendar/DFDLCalendar.scala Outdated
Change date/dateTime year handling from astronomical numbering
(uuuu/EXTENDED_YEAR) to year-of-era semantics (yyyy with YEAR/ERA), so
years conform to XSD 1.0 (no year zero; 1 BCE = -0001). This covers the
implicit/default calendar pattern, lexical infoset conversion, facet
parsing/narrowing, and the binaryCalendarEpoch.

Parsing, formatting, comparison, and XPath year extraction now read years
via YEAR/ERA instead of the astronomical EXTENDED_YEAR. Facet conversion and
the binaryCalendarEpoch route through DFDLCalendarConversion rather than
hardcoded calendar patterns; the latter gains negative-year support.

BCE years require an explicit era designator (G) with yyyy-based patterns.
Explicit uuuu patterns still parse astronomically, but extracted years and
infoset output are always rendered in XSD 1.0 numbering.

DAFFODIL-3084
@gdesrosiers1805 gdesrosiers1805 force-pushed the bugfix/daffodil-3084 branch 4 times, most recently from 292e31e to a5d067f Compare July 3, 2026 21:58
Replace the FacetTypes object and its (Facet.Type, Value) tuple
approach with a sealed FacetOrdered trait hierarchy where each XSD
facet type is its own class carrying both the original XML string
value and its converted BigDecimal representation.

Value conversion now happens once, eliminating the redundant
String->BigDecimal->String->BigDecimal round trips
described in the existing TODO comment. Narrowing logic moves into
each facet class, replacing scattered helper methods in Facets.scala.
NodeInfo.fromXMLString is used as the single source of truth for XSD
primitive conversions in value-space facets.

Also fixes two pre-existing bugs:
- signum() != 1 incorrectly rejected maxLength="0" as invalid since
  non-negative integers include zero
- Invalid facet values that previously threw raw NumberFormatException
  now produce Schema Definition Errors with facet name context

Also removes dead code checkRangeReturnsValue and checkRange from
RestrictionUnion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants