class: center, middle # Whitelist to blacklist Hacking bleach to work as a blacklist. .footnote[© 2014 by Arnold Krille] --- # Introduction ## Arnold Krille - nerd'n'dad - python - django + web
until recently
- gevent, greenlets + firewalls
since recently
- knows HTML(5), CSS and stuff .footnote[ [@kampfschlaefer](https://twitter.com/kampfschlaefer) | [www.dancingwithpinguins.org](http://www.dancingwithpinguins.org) | [www.arnoldarts.de](http://www.arnoldarts.de) ] --- # Motivation - User input for text (comments, descriptions etc.) - Webapp with only protected access, no public comment-system or similar... - Markdown with HTML allowed - custom 'tags' within markdown blocks possible: \`>username>`\` -- ## HTML-sanitizing needed! - started with homemade tags-closer - tried regular expression against script-tags --- # Bleach to the rescue https://github.com/jsocol/bleach/ - http://bleach.readthedocs.org Uses htmllib to validate/sanitize HTML with real DOM-tree Easy to use: ```python >>> import bleach >>> bleach.clean('an example') u'an <script>evil()</script> example' ``` --- #Bleach to the rescue
cont.
Bleach is a whitelist: ```python >>> import bleach >>> bleach.clean('
bold
', tags=['b']) u"
bold
" >>> bleach.clean('
not slanted
', tags=['b']) u"<i>not slanted</i>" ``` -- ### .red[But we want a blacklist :-(] --- # Modifying bleach bleach allows - filter-maps for allowed attributes (and styles) on tags - filter-functions for attributes (and styles) on tags Why not allow a function as tag-filter? - see [PR131](https://github.com/jsocol/bleach/pull/131) -- - .red[rejected as it allows to change the whitelist into a blacklist ;-)] -- - Well, duh. What else? --- # Hacking bleach How does it check the allowed tags? From [bleach/sanitizer.py line 37](https://github.com/jsocol/bleach/blob/master/bleach/sanitizer.py#L37): ```python if token['type'] in (tokenTypes['StartTag'], tokenTypes['EndTag'], tokenTypes['EmptyTag']): * if token['name'] in self.allowed_elements: if 'data' in token: ``` What does `in` do in python? -- It calls `__contains__` on the object! --- # Hacking bleach Lets define our own object! ```python >>> class BlackList(object): ... def __contains__(self, value): ... return value not in ['script'] >>> bleach.clean('
Bold
', tags=Blacklist()) u'
Bold
' >>> bleach.clean('', tags=Blacklist()) u'<script>evil()</script>' ``` Done. Changed the (more secure!) explicit whitelist into blacklist. --- # Hacking bleach - Don't do this at home! It works nicely in our environment, it will fail horribly in other environments! - Knowing a language is nice, knowing the interals of a language is nicer. - Look at the double-underscore functions! For example: - `in` calls `__contains__` - `isinstance()` calls `__isinstance__` - ... --- class: center, middle # Fin Thanks for your attention .footnote[ [@kampfschlaefer](https://twitter.com/kampfschlaefer) | [www.dancingwithpinguins.org](http://www.dancingwithpinguins.org) | [www.arnoldarts.de](http://www.arnoldarts.de) ]