Skip to content

feat/refactor: 'expr.transform()'; fixes & refactors to 'replace()' and rule-application#301

Open
samueltlg wants to merge 15 commits into
cortex-js:mainfrom
samueltlg:expr-transform
Open

feat/refactor: 'expr.transform()'; fixes & refactors to 'replace()' and rule-application#301
samueltlg wants to merge 15 commits into
cortex-js:mainfrom
samueltlg:expr-transform

Conversation

@samueltlg
Copy link
Copy Markdown
Contributor

@samueltlg samueltlg commented Apr 11, 2026

(Code commentary incoming)

Sorry that this was delayed again... Upon thinking it was ready to go, encountered a new set of problems/niggles, requiring a fair amount of investigation.
The change here (fixes) are the sum of quite a lot of tedious debugging!

Most of these are fixes - and an update to 'replace()' - but you may wish to discard the head/feature change here (transform()). Personally - think it is very useful (not something have added to the personal/bug-fix fork up to this point, but is something will certainly use...)

(For reference, the failing 'compilation-performance' tests, were notably also failing before any of the changes made here))


Update:

Queries & points of contention (after scanning through changes):

  • 'transform()'
    • The naming of this method is at your discretion. I think 'transform' is on-point enough (designation here borrowed from other libraries).
    • TransformOptions option 'targets'
      • Having this as an alternative to 'match' may not be to your taste; but presently this offers a solution and matching route not offered by traditional pattern-matching (- referential-identity; predicate-based). An alternative altogether, or a merge with 'match', may be preferred.
      • !There still exists the problem of this not being able to match against 'cached' (common) engine-exprs (One,Zero,Infinity...): although, for the majority of use-cases, it is hard to imagine the requirement for applying transformations to these expressions anyway (perhaps some cases of 'replace').

Identified further bugs, for reference:

  • In a 'replace()' context...:

    • During expression matching, the non-canonical expr. is used... Sometimes, however, consequently there can be an unfortunate non-match where a match would otherwise take place. Example:
      • ce.expr(['Exp', 3]).match(['Exp', '_')]) will not match, due to the non-canonical pattern matching against the canonical variant of ['Exp', 3], which notably canonicalizes to a 'Power' FN expr. (['Power', 'ExponentialE', 3]).
      • BoxedString has a missing replace() method (such that replacing always to return 'null').
  • As highlighted in fix (workaround): ensure intended canonical - non-canonical - match pattern...:

    • Requesting the canonical variant of wildcard-containing expressions causes entries (BoxedValueDefinition) to be added to global scope for the captured wildcard symbol (e.g. _ or _c).
      • If this point were addressed, this workaround should be able to be retracted.
  • A couple of extant issues present with the 'structural' Expression form:

    • Requesting the structural form of canonical vs. non-canonical exprs. presently can produce differing output.
      • For the case of a Rational number (representation), for inputs as produced by either ce.parse('2/5', {form: ...}) or ce.expr(['Divide', 2, 5], {form: ...}):
        • In case of boxing/parsing with form 'structural' or 'raw' (and then 'expr.structural'), the representation is 'Divide' (same in both cases)
        • In case of first obtaining the canonical variant, and then requesting the structural, the result is instead a BoxedNumber (type = rational, with an ExactNumericValue value).
      • For the case of an expression with an associative/commutative function head:
        • If obtaining a structural form through requesting this from a canonical expression, then the result will be both sorted (the operands) and flattened.
        • If in contrast boxing/parsing directly as 'structural' - or requesting the 'structural' formfrom of a 'raw' expr. - then expression/operands are neither sorted, or expression flattened.
      • ^Personally here have identified the source of this to be the fact that the 'canonical' variant has access to the OperatorDefinition: whilst in other cases not.

Some possible future directions:

  • expr.subs() - and possibly one or two others (methods) - could also admit a 'form' specification over 'canonical'.
  • If sticking with keeping expr.transform():
    • As per above, TransformOptions 'targets' could be merged with 'match' - or some category of alternative - according to preference.
    • A verbose switch for this method: with secondary/meta data returned such as stats on replacement quantity, replacement-expression instances (or a map of replaced->replacements), transformation-specific data (e.g. 'steps' for simplify). Perhaps overall expression validity
    • An async variant commensurate to 'evaluateAsync()', and others.

…attern variant comparison (rule application)

(this change is marked as a *workaround* rather than a 'fix' per se, since the issue it tackles instead likely has its root in the current behaviour regarding the binding of 'wildcard' symbol variants (-outside the context of pattern matching / replacement.
(See the closing of this message for a further remark on this point.))

Explanation:
- Currently, because canonicalization of expresssions involves wildcard  binding as part of symbol binding (this may be undesirable / a bug), it is presently necessary that a new scope be created upon 'match' pattern canonicalization (performed for optimization purposes) in rule application:  with this otherwise running the risk of present wildcard symbols being bound to definitions in the current scope...

For example, before this change:
```
ce.expr(['Add', 'x', 3]).replace({ match: ['_', '__'], replace: 'y'});

// →Expectedly captures wildcards, and replaces
// top-level expression.
// *However*, the canonical-request of the 'match' expression within 'applyRule'
// results in symbol/wildcard '_' being attributed a function-definition
// (type='function'): within the scope in which this 'expr.replace()' was called

// Consequently, a subsequent replace call involving a universal wildcard
ce.expr(['Add', 'x', 3]).replace({ match: ['Add', 'x', '_')], replace: 5 });

// → Returns 'null'... (assuming this call made with an unmodified/same scope as the previous)
// Ultimately because the same point-of-canonicalization results in the Universal
// Wildcard in this instance uptaking the previous, 'function' definition, resulting
// in a MathJson-internal type-error for the canonicalized match-pattern variant
// in this instance, resulting in an absent wildcard in the canonical variant,
// and leading to an early exit from rule-application 'applyRule()'

//For illustration, the canonical-variant of the initial, non-canonically boxed
// `['Add', 'x', 3]` pattern results in the canonical-variant as:

[ "Add", "x", [ "Error", [ "ErrorCode", "'incompatible-type'", "number", "function", ], ], ]
```

*Note*:
- This 'fix' may no longer be necessary, if canonicalization of wildcard-containing expressions were to no longer perform name-binding on these (this may be unintentional?)
…ition'

Before: passing around a rule with '{ match: .., condition: 'non-delimited LatexString }' would fail to yield/parse a condition.

Now: cases such as the previous result in a parsed condition (yet, presence of display/inline mode delimiters *should* still be admissible)
…fyNonCommutativeFunction

(This inadvertently resulted in *full*-rule simplification of all operands for all functions
Correcting this has resulted in *no broken tests*)
…umeric operands of 'lazy' operator definitions

Explanation
------------
'evaluateNumericSubexpressions()' (function, which is notably called
within the 'simplify' process only for operator-definitions with flag
'lazy' set to 'true', and operator 'Divide', presently fully evaluate
all (more-or-less) numeric operands during simplification.
At least a couple of problems exist with this, mentionably that
'simplify' is, use-case depending, likely not called with intention to
*evaluate all numeric* operands/expressions. However, related to this,
and really a more primary issue here, is that proceeding with this
unconditionally is troublesome in terms of simplifying in the context of
any given rules: particularly when the ruleset is a custom one, or an
incomplete subset of the default ruleset.
For example, consider the simplification of an 'Add' expression with
evaluable 'Ln' or 'Log' arguments... If the accompanying simplify
rule-set does not with it include rules covering the simplification
(perhaps, 'evaluation') of this set or category of functions, then it is
dubious that these should be evaluated prior to be being summed...
(instead - given a ruleset consisting of a single rule which handles
'Add', an input such as 'Ln(e) + Ln(e)' should understandably reduce
this to '2 * Ln(e)' (instead of simply '2'); the case therefore of a
fuller simplification being dependent on presence of separate rules
handling Ln/Log.
(Another illustrative example, is that at time of commit, input '3!'
will remain unsimplified (unevaluated)... illustrating the need for a
requiring/wrapping operator (with a 'lazy' flag) for its evaluation)

\*Stemming from this consideration, it may be worth considering a
single, or set of 'evaluation'-rules: decoupled from respective
simplification rules (plain/eager evaluation of trig., vs.
identities/construction, for example)
… considering replacement expr. sameness, also consider expression 'form'

- A switch to 'form' allows specification of 'structural', and also an explicit specification of 'raw'.
- As per inline documentation, the same/previous behaviour can be replicated, by first ensuring the entire input bears the same form as any speified replacement form.

- When considering 'successful' rule application, any differences in expression form before-> after is now considered, in spite of these anyway bearing structural similiarity.

- The same change would likely benefit from being made on `expr.subs()`.
In disuse: but set for employment particularly in test-files
'transform()' is a wrapper around 'replace()' - always acting recursively - and offers an ergonomic means to match sub-expression targets and apply common fundamental operations/transformations in an expression 'tree', without the need for custom 'Rule' logic / workarounds.

Change includes tests.

- The typing of 'TransformOptions' has necessitated that 'SimplifyOptions' be shifted from its previous home of 'types-definitions(.ts)', to 'types-kernel-evaluation.ts'. It has also been made generic over Expr/SemiExpr/CE, in a similar way as for its sibling types (such as 'RuleReplaceFunction').

if (canonical && match) {
const awc = getWildcards(match);
pushSafeScope(ce);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See commit message - this change indeed may not be necessary, if the described present behaviour of the wildcard-symbol canonicalization (global-scope) is not as it should be).

* **Default** is: `'left-right'` (standard post-order)
*
*/
direction: 'left-right' | 'right-left';
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At your discretion! (this is anyway achievable in a workaround-way with custom rule-functions and what-not).

(I did also intend to update the 'once' rule such that it would be typed as something like 'one-rule' | 'one-replacement' | false: where 'one-rule' would assume the old behaviour of 'true', but 'one-replacement' would command an even stricter policy (currently 'true' applies at the rule level: i.e. a rule may apply multiple times; 'one-replacement' would further restrict this to one replacement/within one rule). This would also somewhat add a 'raison d'être' for this property ('direction'), too.
However, again, both of these can, with a bit of trouble, be achieved through RuleFunctions.)

// Normalize the condition to a function
let condFn: undefined | RuleConditionFunction;
if (typeof condition === 'string') {
const latex = asLatexString(condition);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking back on this did not appear to break anything. Concludes that it was one of either an outdated or mistaken check... the 'asLatexString' call appears to presently require that any string [input] be delimited by an '$' or '$$'

// canonical one. This allows subsequent .canonical calls to perform full
// canonicalization.
if (isFunction(expr) && expr.isCanonical) {
if (isFunction(expr) && forms.length > 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not a test to this... but changing this appeared to correct some behaviours whilst outlining the tests for transform. Throughout the partial-canonicalization process, appears that sometimes the expression is 'intermediarily' not marked as canonical.

@@ -407,7 +407,7 @@ function simplifyNonCommutativeFunction(
let last = result.at(-1)!.value;
if (last.isSame(expr)) return steps;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small oversight... but was inadvertently applying the entire default rule-set to operands! (indeed; no test failures result)

type EvaluateOptions = KernelEvaluateOptions;
type Rule = KernelRule<Expression, ExpressionInput, ComputeEngine>;
type BoxedRule = KernelBoxedRule<Expression, ComputeEngine>;
type BoxedRuleSet = KernelBoxedRuleSet<Expression, ComputeEngine>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(These re-constructions only previously used by SimplifyOptions (now re-located))


/**
*
* Process and transform this expression *recursively* by applying one of a set of predefined
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This doc.) an outline of justification for this method

* @category Boxed Expression
*/

export type SimplifyOptions<
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly not the perfect place for this (i.e. 'evaluate') - but seemed most sensible given the options

/** Available transformation types for `Expression.transform()`
*
* @category Boxed Expression */
export type Transformation =
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Others could be possible)

* ::Note
* The 'extended' matching routes available here are unique to *transform()* and facilitate
* convenient and more expressive matching in the context of recursive traversal.*/
targets?: Expr | Expr[] | MatchConditionFunction<Expr>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible point of contention: distinct matching route from 'match'

@arnog
Copy link
Copy Markdown
Member

arnog commented May 10, 2026

Thanks for putting in the substantial work here, @samueltlg — the bug analyses in the commit messages were really helpful in understanding what each change is solving. There's a lot of good material in this PR.

To make it easier to land incrementally, I've cherry-picked four of the lower-risk fixes from this PR into #303 with regression tests. Authorship is preserved on each cherry-picked commit:

  • applyRule no longer returns null after operand-level rewrites
  • Rule-application optimization (early exit on canonical-variant wildcard loss)
  • Object-rule condition strings work without $...$ delimiters
  • Object-rule condition strings are canonicalized so they can evaluate

Once #303 lands, a rebase off main here will pick those up cleanly.

For the remaining changes, a few things I'd like to address before the rest is ready to merge:

Cleanup

  • test.only / describe.only markers in transform.test.ts (about 13 of them) need to be removed so the tests run in CI.
  • Lint is currently failing with real issues, not just style. The Multiple exports of name 'LatexString' (and similar for LatexDictionaryEntry, ParseLatexOptions, Parser, SerializeLatexOptions) are coming from a recent refactor on main that split global-types.ts into multiple types-*.ts files — your branch needs to pick a single source for those re-exports. There's also a no-this-alias violation around line 720, and prettier wants a sweep.

Splitting

The PR is ambitious enough that I'd like to review a few pieces independentlybefore they all land together. Could we split it into roughly:

  1. expr.transform() feature — the new method, its types, and the transform.test.ts file. This is self-contained and the most straightforward piece to review on its own.

  2. ReplaceOptions.canonicalform — this is a public API breaking change without a deprecation path. I'd like to discuss the migration story before merging: at minimum we'd want to keep canonical as a deprecated alias for one release, and the type change should be called out in CHANGELOG. Worth a small dedicated PR so this gets visibility.

  3. The simplify.ts BASIC_ARITHMETIC operand guard — this is the change you flagged as a workaround. It causes three previously-passing tests to be converted to test.todo (10!/(3!*7!), two exp(log(x)±y) cases). Could you open a tracking issue describing the underlying simplification problem? I'd rather understand the root cause and either fix it properly or accept the workaround with eyes open, rather than have it slip in alongside other changes. Happy to help debug if useful.

  4. The canonical.ts forms.length > 0 change (commit 12cabb53) — this one I'd like to look at separately because it changes how partially-canonicalized expressions are wrapped, and I want to confirm the snapshot drift it causes (e.g. Power(Multiply(j, 4), -1)Divide(1, Multiply(j, 4))) is intentional and consistent with the rest of the codebase.

The remaining smaller fixes (pushSafeScope consolidation, direction option, form-cast policy refinement) can probably go with whichever PR they're most adjacent to.

I realize this is more bureaucracy than ideal. The reason for the splitting ask is that this PR touches enough load-bearing code (rule application, simplification, canonical-form policy, the public replace() API) that I want each piece to land with focused review. It also makes bisecting much easier if anything regresses.

Thanks again — looking forward to landing this incrementally.

arnog added a commit that referenced this pull request May 10, 2026
* fix: do not return 'null' in 'applyRule' if replacement has applied recursively/to operands

* refactor: optimize rule application by exiting earlier on match-pattern canonical-variant wildcard loss

* fix: parse object-Rule condition strings without '$' delimiters and canonicalize

- Drop the '$...$' wrapping requirement so bare LaTeX condition strings parse correctly via ce.parse().
- Always canonicalize the condition so it can evaluate against substitutions.

Co-Authored-By: samueltlg <[email protected]>

* refactor: singular documentative type-cast

* test: add regression tests for cherry-picked rule-application fixes

Three tests covering the fixes from PR #301 commits 2bd9953, 591f870:

- applyRule preserves operand rewrites when function-replace returns
  undefined at top (tests the !result early-return path).
- Object-rule condition string without '$' delimiters is honored
  (tests parsing of bare LaTeX condition strings).
- Object-rule condition string is canonicalized so it can evaluate
  (tests that non-trivial conditions like 'a^2 > 0' work).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: samueltlg <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
@samueltlg
Copy link
Copy Markdown
Contributor Author

samueltlg commented May 10, 2026

Thanks Arno - those requirements sound reasonable and sensible... Indeed, the PR had initial intention of introducing expr.transform() perhaps only with one or two accompanying changes or refactors; However, the attempt to do this resulted in encountering a series of additional issues and requirements - mostly centred in 'replace()' - which simultaneously demanded attention. (The bureaucracy and preference for splitting are reasonable). Later on tonight (UTC+0) I'll make some headway in making those splits as requested (adhering to 'cleanup' points, and will contribute to CHANGELOG too...).

(Notably, the 'transform()' set of changes will have to be made finally... with its implementation requiring several of these included fixes/refactors).

@samueltlg
Copy link
Copy Markdown
Contributor Author

And glad that you are content with keeping expr.transform() - I think it's pretty cool 😎 (although, not sure if there is anything you wanted to do between the distinct 'targets' / 'matching' routes - if so just let me know & will make sure to implement).
If you'd find it easier to do the splitting yourself then also do inform; otherwise, I'll have a go at doing so today and tomorrow 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants