Merge lp://staging/~stolowski/unity-lens-shopping/markup-cleaner-fix into lp://staging/unity-lens-shopping

Proposed by Paweł Stołowski
Status: Merged
Approved by: Michal Hruby
Approved revision: 31
Merged at revision: 26
Proposed branch: lp://staging/~stolowski/unity-lens-shopping/markup-cleaner-fix
Merge into: lp://staging/unity-lens-shopping
Diff against target: 345 lines (+273/-26)
5 files modified
Makefile.am (+1/-1)
configure.ac (+1/-0)
src/markup-cleaner.vala (+59/-25)
tests/unit/Makefile.am (+28/-0)
tests/unit/test-markup-cleaner.vala (+184/-0)
To merge this branch: bzr merge lp://staging/~stolowski/unity-lens-shopping/markup-cleaner-fix
Reviewer Review Type Date Requested Status
Michal Hruby (community) Approve
Review via email: mp+127275@code.staging.launchpad.net

Commit message

Fixed regular expression for capturing html tags; fixed replace_cb logic and added unit tests for MarkupCleaner.

Description of the change

Fixed regular expression for capturing html tags; fixed replace_cb logic and added unit tests for MarkupCleaner.

To post a comment you must log in.
Revision history for this message
Michal Hruby (mhr3) wrote :

40 + internal static const string HTML_MARKUP_RE = "(</?)\\s*([^>]*?)\\s*(/?>)|(\\&(?!(([a-z]+)|(#\\d+));))";

> (?!(([a-z]+) -> perhaps (?!(\S{1,6})) instead? (the entities can be uppercase and have digits)

Pls add `<a href="wooo"><small>Click me!</small></a>` to the unit tests.

review: Needs Fixing
28. By Paweł Stołowski

Also support entities with numbers. Added test case for SMALL tag surrounded by A tag.

Revision history for this message
Michal Hruby (mhr3) wrote :

> (?!(([a-z]+) -> perhaps (?!(\S{1,6})) instead? (the entities can be uppercase and have digits)

As discussed on IRC pango/glib recognizes only 5 named entities, will throw an error otherwise (amp, gt, lt, apos, quote) .

29. By Paweł Stołowski

It turns out pango supports a limited set of entities, so filter all named entitites except for amp, lt, gt, quot, apos.

30. By Paweł Stołowski

Use named position in the regex for better readability.

31. By Paweł Stołowski

Lowercase entity names. Unknown entities are converted to raw string instead of discarding.

Revision history for this message
Michal Hruby (mhr3) wrote :

Great now! +1

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches