,

How to use html tidy with python to validate HTML content?

Html-source-code
Html-source-code (Photo credit: Wikipedia)
Ttidy is a famous toolset for validating HTML and XML content. It identifies mistakes, and corrects them auto-magically. To use it from python, you can use pytidylib:

sudo apt-get install tidy
pip install pytidylib
 
Or alternatively

easy_install pytidylib
 
Or if you are behind a proxy:
pip install pytidylib --proxy "prxy:port"






Then in the python source code:

from tidylib import tidy_document, tidy_fragment
htmlFragment = """
<h1>An HTML example</h1>
<a href="http://www.blogger.com/blogger.g?blogID=4882386696817687861#">my link</a>
"""
htmlFragment, errors = tidy_fragment(htmlFragment,tidyoptions)

You can pass some options to control more finely tidylib:

tidyoptions={
"indent": "auto",
"indent-spaces": 2,
"wrap": 72,
"markup": True,
"output-xml": False,
"input-xml": False,
"show-warnings": True,
"numeric-entities": True,
"quote-marks": True,
"quote-nbsp": True,
"quote-ampersand": False,
"break-before-br": False,
"uppercase-tags": False,
'uppercase-attributes': False
}
 
 
Share:

No comments:

Post a Comment