So we came to bleach, an html sanitizer. And bleach is great! It allows to give a list of allowed tags and dictionaries of allowed attributes per tag. And more stuff like filtering the allowed classes and style per element. It also allows to give a function to decide whether the attribute is allowed for that tag.
Sadly it didn't allow to give a function to decide whether the tag is allowed at all. So I got on github, forked the repo and created a patch/pull-request.
My patch allowed to give
bleach.clean a function to check for the allowed tags. I also wrote unittests for giving functions as allowed_attributes and allowed_tags. Unfortunately the maintainer denied my pull-request as it allows to change the behaviour of bleach from a whitelist to a blacklist very easily. Well duh, that was the intention. But I do understand and respect that decision!
So is there another way to solve our little problem?
Whitelist in bleach
Lets take a look at how bleach does the whitelisting. Don't worry, its really easy: bleach uses the given list of allowed tags as
if element in allowed_elements: # Do the rest of the parsing
But what if allowed_elements isn't a list? What if its a custom object that just happens to implement
Blacklist script-tag instead of whitelisting everything else
Lets write a little blacklist-object that inverts the behaviour of
class BlackList(object): def __contains__(self, value): return value not in ['script'] html = bleach.clean(html, tags=BlackList())
Done. Changed the whitelist of bleach.clean() into a blacklist to allow all tags excluding only the
Our version is a little bit more advanced, it also takes a list-argument to
__init__ to set an extendable list of forbidden tags.
Of course this depends on the way bleach works, which might change with future versions. But that is one of the reasons we have unittests. Not only do these protect against developers changing one part of the code and breaking a completely different corner they didn't think of. Unittests also protect against changed behaviour in your dependencies...