Merge lp://staging/~facelessuser/beautifulsoup/lxml-fix into lp://staging/beautifulsoup

Proposed by Isaac Muse
Status: Merged
Merge reported by: Leonard Richardson
Merged at revision: not available
Proposed branch: lp://staging/~facelessuser/beautifulsoup/lxml-fix
Merge into: lp://staging/beautifulsoup
Diff against target: 12 lines (+1/-1)
1 file modified
bs4/element.py (+1/-1)
To merge this branch: bzr merge lp://staging/~facelessuser/beautifulsoup/lxml-fix
Reviewer Review Type Date Requested Status
Leonard Richardson Pending
Review via email: mp+371305@code.staging.launchpad.net

Commit message

This fixes the current breakage with xmlns caused by lxml 4.4.0+

Description of the change

This fixes the current breakage caused by lxml 4.4.0 where xmlns with no defined prefix would be treated as an attribute 'xmlns:' which would cause the xml document to be malformed. The issue is caused by the fact that lxml now returns, in this specific case, the prefix as 'xmlns' and the name as an empty string instead of None. Moving forward, bs4 will now check if the name is either None or empty in order to handle the scenario properly.

To post a comment you must log in.
Revision history for this message
Isaac Muse (facelessuser) wrote :

I'm not sure if we want to force " obj.name = name" to be None in this case? Or just assign the empty string. Let me know what you think. In that case, it may make more sense to do something like this:

    def __new__(cls, prefix, name, namespace=None):
        if not name:
            name = None

        if name is None:
            obj = unicode.__new__(cls, prefix)
        elif prefix is None:
            # Not really namespaced.
            obj = unicode.__new__(cls, name)
        else:
            obj = unicode.__new__(cls, prefix + ":" + name)
        obj.prefix = prefix
        obj.name = name
        obj.namespace = namespace
        return obj

Revision history for this message
Leonard Richardson (leonardr) wrote :

It's a small thing, but I think it's better to make sure name is set to None. https://www.w3.org/TR/xml-names/#defaulting says: If there is no default namespace declaration in scope, the namespace name has no value.

"No value" says None to me, not the empty string. The lxml change seems to be a side effect of a change made for convenience, rather than a change made for better compliance with a spec.

Revision history for this message
Isaac Muse (facelessuser) wrote :

> It's a small thing, but I think it's better to make sure name is set to None.

I kind of figured that would be the preference :). The more I had thought about it the more I thought it would make sense to force the convention to remain the same as it always was. I can update accordingly.

Revision history for this message
Isaac Muse (facelessuser) wrote :

Never mind, it sounds like you committed a fix already :).

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: