Skip to content

Conversation

@dmsnell
Copy link
Member

@dmsnell dmsnell commented Jul 11, 2025

Trac ticket: Core-63724

Replaces #7407, dmsnell#5
Coordination in #9256

Review feedback

  • Historically the value and whole properties of the returned array indicate the raw parsed bytes from the HTML (with some exceptions). This means that HTML character references are not decoded. This represents an abstraction leak between the HTML and structural return value.
    • Should this refactor leave the messy return values in place or should it decode the attribute values to enforce the view of the world developers are imagining when calling it? (that all values are normal PHP strings and not HTML text node strings)?

Implementation

wp_kses_hair() is built around an impressive state machine for parsing the $attr of an HTML tag, that is, the span of text after the tag name and before the closing >. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell tag for the $attr string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag.

Dependencies

@github-actions
Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 11, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 68c7746 to b476339 Compare July 11, 2025 22:37
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 11, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from b476339 to 6146ecd Compare July 11, 2025 22:45
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 11, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 6146ecd to d64f56e Compare July 11, 2025 22:46
@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 12, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 18, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 18, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 18, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 18, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Sep 30, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from d119749 to e525b75 Compare September 30, 2025 20:59
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 1, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from e525b75 to 9ae464a Compare October 1, 2025 23:14
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 2, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 9ae464a to 5e878cf Compare October 2, 2025 19:39
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 3, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 5e878cf to 8c139c6 Compare October 3, 2025 19:53
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 6, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 8c139c6 to 98868ad Compare October 6, 2025 21:18
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 8, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 98868ad to d3f4ebf Compare October 8, 2025 04:43
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 9, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from d3f4ebf to da96dbd Compare October 9, 2025 00:48
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 9, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from da96dbd to 1da2ae7 Compare October 9, 2025 20:53
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 9, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 1da2ae7 to 0bf0e2f Compare October 9, 2025 23:39
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 18, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 18, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 21, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 0bf0e2f to 236e9a5 Compare October 21, 2025 06:28
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Oct 21, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 236e9a5 to 550921c Compare October 21, 2025 08:33
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 550921c to 5565848 Compare October 21, 2025 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant