Page MenuHome

SVG: Properly handle values in exponential notation
ClosedPublic

Authored by Sergey Sharybin (sergey) on Jan 21 2019, 12:37 PM.

Details

Summary

Some SVG exporters outputs small values in an exponential
notation. There is no big reason to reject those files.

This change makes it so any notation of the value is accepted.
Only do it in the path point parsing, since other areas are
already dealing with this correct.

Also covered the array parsing covered with a unit test which
can be run as a stand-alone application.

Diff Detail

Event Timeline

This patch lets me import svg files with exponential notation succesfully!

io_curve_svg/svg_util.py
32

this line has a space at the end

Remove trailing whitespace and duplicated file

I feel like maybe there's a simple/faster way to do this, but since it works it's fine as is I think.

io_curve_svg/svg_util.py
62

Spelling: ommits -> omits

This revision is now accepted and ready to land.Jan 22 2019, 8:32 PM

This parsing code completes all tests successfully.

def parse_array_of_floats(values_encoded: str):
    stripped = values_encoded.strip()
    if len(stripped) == 0:
        return []

    parts = re.split(r"\s*,\s*|\s+", stripped)
    return list(map(value_to_float, parts))

def value_to_float(value_encoded: str):
    if len(value_encoded) == 0:
        return 0
    return float(value_encoded)

@Jacques Lucke (JacquesLucke), that is a good point. But, unfortunately, it simply discovered one corner case which started to be behaving different: the string "1-3" should be converted to [1, -3].

I can't find an exact SVG with polygon in it which uses that crazy thing, but such case isn't really against the specification and our old code was trying to deal with this.

Will update the patch soon with an extra unit test.

Added extra unit test, which tests one of the old corner cases

Campbell Barton (campbellbarton) requested changes to this revision.Feb 1 2019, 12:24 PM
Campbell Barton (campbellbarton) added inline comments.
io_curve_svg/svg_util.py
36

We could expose C's strtod instead?

40

Should be compiled for reuse if it's going to be called many times.

41

Would assign a var to avoid 2x calls.

61

Use {',', ' ', '\t'}

This revision now requires changes to proceed.Feb 1 2019, 12:24 PM
io_curve_svg/svg_util.py
36

We could. But is tricky, and not something i want to dig into now.

But fine to use it once it's exposed :)

This revision is now accepted and ready to land.Feb 1 2019, 12:36 PM

Updated parsing code that completes all tests successfully.

import re

match_number = r"-?\d+([eE][-+]?\d+)?"
match_first_comma = r"^\s*(?=,)"
match_comma_pair = r",\s*,"

pattern = f"({match_number})|{match_first_comma}|{match_comma_pair}"
re_pattern = re.compile(pattern)

def parse_array_of_floats(text):
    elements = re_pattern.findall(text)
    return [value_to_float(v[0]) for v in elements]

def value_to_float(value_encoded: str):
    if len(value_encoded) == 0:
        return 0
    return float(value_encoded)

@Jacques Lucke (JacquesLucke), how's that different/better?

It's shorter, it's faster (in many cases) and I'd argue it's more readable.

Btw, is ,,, also valid for [0, 0, 0, 0]?

@Jacques Lucke (JacquesLucke), yes ,,, should be translated to [0, 0, 0, 0]. And this is where both our codes are failing.

Good that I fixed my code already before asking the questions ;D

match_number = r"-?\d+([eE][-+]?\d+)?"
match_first_comma = r"^\s*(?=,)"
match_comma_pair = r",\s*(?=,)"
match_last_comma = r",\s*$"

pattern = f"({match_number})|{match_first_comma}|{match_comma_pair}|{match_last_comma}"
re_pattern = re.compile(pattern)

def parse_array_of_floats(text):
    elements = re_pattern.findall(text)
    return [value_to_float(v[0]) for v in elements]

def value_to_float(value_encoded: str):
    if len(value_encoded) == 0:
        return 0
    return float(value_encoded)
This revision was automatically updated to reflect the committed changes.