Python lxml: syntax to selectively remove online style attributes?

advertisements

I'm using python 3.4 with the lxml.html library.

I'm trying to remove the border-bottom in-line styling from html elements that I've targeted with a css selector.

Here's a code fragment showing a sample td element and my selector:

html_snippet = lxml.html.fromstring("""<td valign="bottom" colspan="10" align="center" style="background-color:azure; border-bottom:1px solid #000000"><font style="font-family:Times New Roman" size="2">Estimated Future Payouts</font> \n            <br/><font style="font-family:Times New Roman" size="2">Under Non-Equity Incentive</font> \n            <br/><font style="font-family:Times New Roman" size="2">Plan Awards</font> \n        </td>""")
selection = html_snippet.cssselect('td[style*="border-bottom"]')
selection.attrib['style']
>>>>'background-color: azure;border-bottom:1px solid #000000'

What's the proper way to access the in-line style properties so I can remove the border-bottom attribute from any element I target with my selector?


You can approach it by splitting the style attribute value by ;, create a CSS property name -> value map, remove the border-bottom from the map and reconstruct the style attribute again by joining the elements of the map with ;. Sample implementation:

style = selection.attrib['style']
properties = dict([item.split(":") for item in style.split("; ")])

del properties['border-bottom']

selection.attrib['style'] = "; ".join([key + ":" + value for key, value in properties.items()])

print(lxml.html.tostring(selection))

I'm pretty sure you can break this solution easily.


Alternatively, here is a rather "crazy" option - dump the data into the "html" file, open the file in a browser via selenium, remove the attribute via javascript and print out the HTML representation of the element after:

import os
from selenium import webdriver   

data = """
<td valign="bottom" colspan="10" align="center" style="background-color:azure; border-bottom:1px solid #000000"><font style="font-family:Times New Roman" size="2">Estimated Future Payouts</font> \n            <br/><font style="font-family:Times New Roman" size="2">Under Non-Equity Incentive</font> \n            <br/><font style="font-family:Times New Roman" size="2">Plan Awards</font> \n        </td>
"""
with open("index.html", "w") as f:
    f.write("<body><table><tr>%s</tr></table></body>" % data)

driver = webdriver.Chrome()
driver.get("file://" + os.path.abspath("index.html"))

td = driver.find_element_by_tag_name("td")
driver.execute_script("arguments[0].style['border-bottom'] = '';", td)

print(td.get_attribute("outerHTML"))

driver.close()

Prints:

<td valign="bottom" colspan="10" align="center" style="background-color: rgb(240, 255, 255);"><font
        style="font-family:Times New Roman" size="2">Estimated Future Payouts</font>
    <br><font style="font-family:Times New Roman" size="2">Under Non-Equity Incentive</font>
    <br><font style="font-family:Times New Roman" size="2">Plan Awards</font>
</td>