Dec. 19, 2024

Episode 102: Building Web Hacking Micro Agents with Jason Haddix

The player is loading ...

Episode 102: In this episode of Critical Thinking - Bug Bounty Podcast Justin grabs Jason Haddix to help brainstorm the concept of AI micro-agents in hacking, particularly in terms of web fuzzing, WAF bypasses, report writing, and more. They discuss the importance of contextual knowledge, the cost implications, and the strengths of different LLM Models.

We're new to this podcasting thing, so feel free to send us any feedback here: info@criticalthinkingpodcast.io

Shoutout to YTCracker for the awesome intro music!

------ Links ------

Follow your hosts Rhynorater & Teknogeek on twitter:

https://twitter.com/0xteknogeek

https://twitter.com/rhynorater

------ Ways to Support CTBBPodcast ------

Hop on the CTBB Discord at https://ctbb.show/discord!

We also do Discord subs at $25, $10, and $5 - premium subscribers get access to private masterclasses, exploits, tools, scripts, un-redacted bug reports, etc.

Check out our new SWAG store at https://ctbb.show/swag!

Today’s Guest - https://x.com/Jhaddix

Resources

Keynote: Red, Blue, and Purple AI - Jason Haddix

https://www.youtube.com/watch?v=XHeTn7uWVQM

Attention in transformers,

https://www.youtube.com/watch?v=eMlx5fFNoYc

Shift

https://shiftwaitlist.com/

The Darkest Side of Bug Bounty

https://www.youtube.com/watch?v=6SNy0u6pYOc

Timestamps

(00:00:00) Introduction

(00:01:25) Micro-agents and Weird Machine Tricks

(00:11:05) Web fuzzing with AI

(00:18:15) Brainstorming Shift and micro-agents

(00:34:40) Strengths of different AI Models, and using AI to write reports

(00:54:21) The Darkest Side of Bug Bounty

Justin Gardner (00:00.249)
All right, Jason, thanks again for joining on the show. This is what? This has got to be your third or fourth time.

Jason Haddix (00:08.814)
Yeah, I think fourth time now, yeah, for sure. Yeah.

Justin Gardner (00:10.96)
I'm excited, I'm excited. And we got a good lineup today. I think you're really the guy to talk to about the stuff that I've got at the dock. And I was watching your red, blue, and purple AI talk, and it really got my brain spinning a little bit on what I'm calling these microagents, agents with a very, very narrow specific purpose within hacking. And in the talk, you mentioned a couple things like,

Acquisition Finder GPT and Subdomain Doctor, which I thought were really cool applications. And so I kind of want to double click a little bit into those, talk about how those are working, and then also brainstorm a little bit live on the pod here about what kind of microagents we could build that might help with the hacking process.

Jason Haddix (00:45.658)
Mm-hmm.

Jason Haddix (00:53.198)
Yeah.

Jason Haddix (00:57.282)
Yeah, absolutely. So I think the precursor to this episode was, you and you and you and Joel had been talking about AI in one of the episodes and I hit you up and I'm like, hey, like I'm doing a lot of this stuff for red teaming, pen testing and bug bounty. so, yeah, so was like, let's let's chat about it. And then I've been doing this talk called Red, Blue, Purple, AI for quite a while. Now, the talk is how to apply AI, specifically LLMs, because that's what we have right now, you know, in the consumer space to

all types of offensive security problems, but also defensive and purple teaming and stuff. But in the red portion of that talk that I gave, a couple of the ones I talked about are applications that are very pointed towards bug bounty people, right? And so, yeah. And so, what I did was I took my methodology for wide scale recon and for application hacking. And I was like, okay, what parts of these could...

Justin Gardner (01:39.76)
Love it. Love to see it.

Jason Haddix (01:54.04)
and now I'm help with them. A couple of the first ones that just fell out were the ones you talked about, right? So because of the training data set of pretty much all of the models and it's so in depth and the transformer architecture, basically it has this knowledge base of pretty much every press release that's been released, every article that's been released, a lot of scraping of web data, a lot of scraping of business analytics sites and stuff like that. And so for the recon one, for the acquisitions,

I just started with using chat GPT. That was the first thing I did. And so I started asking chat GPT recon questions, like what are the other acquisitions of Tesla? Right. And so what happened was to my surprise, you know, normally my source for that, my source of truth is Crunchbase. Crunchbase is a business aggregation site and they collect information about different businesses for competitive analysis. And so when I asked GPT, and this was back in GPT 3.5 days, when I asked GPT 3.5,

Justin Gardner (02:24.336)
Mm.

Jason Haddix (02:52.31)
It gave me two acquisitions that I had never seen anywhere else. And I was like, yeah, at first I was like, these must be hallucinations. And so I go look up. They were not hallucinations. They were just not big enough acquisitions to make a site like Crunchbase monitor them. And so there are subsections of acquisitions that, you know, the story I like to tell in the class is that one of them was, at one point, Tesla decided that they needed to be their own insurance carrier.

Justin Gardner (02:55.876)
Really? Huh.

Justin Gardner (03:07.288)
Mmm.

Jason Haddix (03:21.614)
And so instead of build out that arm themselves, they went out and purchased a small insurance carrier to start with, which includes staff. And it was called something else and they acquired them fully, but that didn't make like a crunch base, right? That's not like a big enough splash with, I guess, newsworthy. I don't know what the criteria is, right? But we found it through GPT and one other method.

Justin Gardner (03:23.312)
that's interesting. Huh.

Jason Haddix (03:45.97)
And they were fully owned by Tesla and we managed to find some bugs on them that led towards the Tesla bounty. So that's one instance of like how that acquisition bot helped.

Justin Gardner (03:51.588)
Wow, dude.

Justin Gardner (03:55.51)
Okay, okay, you know, watching the video and seeing you use it, I think these are custom GPTs built into chat GPT, right? And I think that's cool. And actually, I think it's a lot more powerful than I expected because it can actually reference specific sites and research stuff, which is cool. And I think that's a great way to pocket. But I'm wondering, know, like what I'm envisioning for these microagents is like very specific, you know, acquisition finder, GPT, I got it in a command line tool.

Jason Haddix (04:02.542)
Yeah. Yeah.

Jason Haddix (04:10.99)
Yes. Yeah. Yeah.

Jason Haddix (04:22.17)
Mm-hmm.

Justin Gardner (04:24.098)
And then I can see what it's thinking, what steps it's taking to get the data, that sort of thing. And I just pop it off and then I get an output in a txt file or whatever, in my notes file. So if you were gonna take this and you were gonna build it more into a command line application or something like that, what do think that would look like? Or do you think ChatGPT is really the right fit for this specific one? Yeah.

Jason Haddix (04:50.042)
So for this specific one, I think it's absolutely rife to be API. In fact, most of the stuff that I build ends up as an API, right? So to a local server and then a script calls the GPT API. The reason that I show chat GPT in my slides is because it's easiest for people to consume from a talk point of view. But my actual stuff is all API calls to stronger models. And I can use different models. I don't have to use the OpenAI ecosystem. I could use Cloud. I could use whatever.

Justin Gardner (04:53.487)
Yeah.

Justin Gardner (05:09.456)
Mmm.

Jason Haddix (05:18.852)
you know, shoot it off to four or five different AIs and then use one AI, to, you know, stitch it together into a concrete answer and then give that as my notes. So yeah, so you can use, you know, Python go, whatever you want and just instrument the chat GP API. The one thing I want to stress here though, is that I feel like a lot of technical people really, kind of crap a little bit on the prompt engineering that makes some of these things really good. Right. And so even when you're building an agent,

Justin Gardner (05:43.344)
Mm.

Jason Haddix (05:45.986)
It's not because it's an agent and the architecture is agentic that it makes a system good. It is all prompt engineering that makes these micro bots good. All of it. It's all prompt engineering in every single step. and so I think because that's a lot of natural language work and there's research that goes into that, that people just kind of like gloss over it. The acquisition spot has a very rigid structure. In fact, I went over it in the talk. I like, have to tell it what it does. have to give it related research terms. have to.

There's a methodology for prompting that's really important.

Justin Gardner (06:16.334)
Yeah, and it feels a little bit like a pseudoscience, right? But then there is actual science to it. I think one of the ones you covered in the talk was like, all right, you're high on salts or something like that. And I'm like, really? that what you have to say to these things that gives it? And you cited some very specific statistic. It's like a 2 % boost or something like that across some of these together, which is substantive, for sure.

Jason Haddix (06:26.988)
Yeah, yeah, yeah, yeah, yeah, yeah, yeah.

Jason Haddix (06:37.582)
Yeah. Yeah.

Yeah, yeah, so that's the section called weird machine tricks, I call it. And so there are a bunch of weird machine tricks to get LLMs to operate in different ways. I'll give another one, right? So when you have a cloud-based LLM, like OpenAI or Claude or Perplexity or something like that, and you give it a query, right? You can tell it in the prompting to use its search tool, if it has a search tool available to it.

Justin Gardner (06:45.624)
Yeah.

Jason Haddix (07:09.966)
but it will not always use that search tool. Like there is no way for you to force it to use the search. If it feels like it has the context it needs to answer the question inside of the training data, it will not use the search or it will use selective search. It will not use the sites that you reference. And so one of the weird machine tricks is called adding urgency. And so you have to add urgency to that statement when you tell it, hey, I want you to specifically search this site. Like, hey, the world is gonna end or people will die or like,

crazy shit like that. And I learned this, yeah. And I learned this in another class with another person. I went to a gaming security conference where I did that talk. And he was like, hey, the way I get the tools to force to use is adding urgency to that. And I think his example was aliens were gonna take over the earth or something like that in order to force it to use the tool in a specific way. Yeah. Yeah. Yeah.

Justin Gardner (07:40.068)
No way

Justin Gardner (07:53.497)
Mmm.

Justin Gardner (07:59.172)
Wow, dude, that's crazy. That's so applicable too. Like I can't talk about it because this podcast is actually going to get released very soon. But I'm actively on an AI engagement right now where I'm running into this problem of like, I can get my prompt in there. You know, this is sort of an indirect prompt injection sort of situation and I can get my prompt in there and, but I'm having a hard time getting it to consistently trigger tool use to get the data out.

Jason Haddix (08:24.186)
Mm-hmm. Yeah, add some urgency. Yeah. Yeah. So a lot of these things too, you have to learn yourself too. So I've talked to other people, but it's crazy that the scene for this is like discords of prompt engineers and hackers. It's not like, I'm not getting this stuff. I get like 50 % from white papers that I read weekly and then...

Justin Gardner (08:26.158)
you know, to trigger exfiltration. I need to add some urgency to that. That is good. That is good, Jason. Thank you for that. That's amazing.

Jason Haddix (08:49.306)
50 % from this underground community of prompt injection people. It's really interesting. It feels very much like the hacker scene. Yeah.

Justin Gardner (08:54.724)
Nice, yeah, there's definitely that whole scene breaking out and I'm glad, you know, I think as red teaming, you know, red teaming the models and red teaming, actual red teaming sort of comes together a little bit. creates some nice combinations of communities where people from the AI realm will actually start paying attention to the security stuff. then, know, vice versa as well. Yeah, we're really, you know, dabbling into the AI stuff because it is so applicable. And, you know, everyone always says to me like, hey,

Jason Haddix (09:05.145)
Yeah.

Jason Haddix (09:15.098)
vice versa. Yeah.

Justin Gardner (09:23.476)
you know, is AI gonna take over hacking stuff soon? And I'm like, man, you if it is, I'm gonna be running those bots, you know, for sure. So I, yeah. Yeah.

Jason Haddix (09:29.292)
Yeah, That's a thing, you know, our mutual friend Daniel Measler talks about, It's like, don't know what's gonna happen really, right? Like lot of automation is gonna come out. I mean, you're part of that wave now, right? I mean, you're making shift and it's human in the loop, but it's still some automation and you wanna be the master of the tools. You don't want the tools to master you, right? So that's why I'm using it, right? And it's just easy for me to...

to spin this stuff up and it's really interesting. It's a new thing to play with, so yeah.

Justin Gardner (09:59.514)
Yeah, I'm really excited to get your thoughts on Shift, but I do want to click into a couple of these more of these GPTs and kind of brainstorm around the microagents. So, you know, you've got Subdomain Doctor, which, you know, essentially looks at a list of subdomains and, and, you know, outputs some probabilistic subdomains, which we were sort of doing a little bit. We were implementing, you know, back, back when I was in the recon game, you know, we were using machine learning to sort of extrapolate on these, these lists of

Jason Haddix (10:03.898)
Yeah? Yeah. Yeah.

Justin Gardner (10:28.196)
domains that we would see. So I think this is a really natural use case. I've seen that one before. I love a nuclei doctor, right? Where it's very easily making nuclei templates for all these things. But what I was envisioning is agents for specific technical tasks. And I think if we could just create sort of like a framework where this agent has the ability to just modify and tweak an HTTP request in a tool to just send it and get the response. And then we could say something like this. Okay, you know.

Jason Haddix (10:33.562)
Yeah.

Jason Haddix (10:44.378)
Mm-hmm.

Justin Gardner (10:57.048)
Here's this HTTP request. This specific parameter has a restriction on the domain it can redirect to. Fuzz this in every way possible that you can figure out to try to make it hit a domain that is not this domain. And just give it that very specific niche task. Its input location should be very small, and its output part of the response that it's paying attention to should be very small, just the location header or whatever.

Jason Haddix (11:12.334)
Yeah. Yeah.

Justin Gardner (11:25.152)
And I think if we give it to that, I'd be really excited to see what kind of stuff the AI comes up with. So what do you think is the best way to implement something?

Jason Haddix (11:30.835)
yeah?

Jason Haddix (11:34.638)
Okay, so I mean, you're getting into kind of what I would say like it kind of is the cutting edge of like web fuzzing applications of AI, right? So there are, I would say right now there's probably about 15 companies trying to tackle this problem right now and a whole bunch of individuals as well, building agenting systems to do web hacking basically, right? And so what you have normally is most people trying to build a holistic

Justin Gardner (11:41.712)
Mm-hmm.

Justin Gardner (11:50.34)
Mm-hmm.

Justin Gardner (11:56.485)
Mm-hmm.

Jason Haddix (12:02.82)
kind of system to find all web bugs, but you're talking about it like a micro custom agent that is human in the loop, right? And so, yeah, I mean, it's all in the prompt engineering for that problem, right? And so the way I want it to work and the way I work a lot these days is actually voice dictation to my computer through my mic. Yeah, I did. Yeah, yeah. And so really what I would need is like a...

Justin Gardner (12:19.248)
Yeah, I saw that you did that in the presentation live. I thought that was pretty cool. Yeah.

Jason Haddix (12:28.846)
you know, like a way to input into Kaido or into burp or something like that, that's custom context. and then attach it to a vulnerability class bot and a fuzzer, right? And so if I wasn't going to use the interception proxy to actually send the web requests, right? We could use something like puppeteer playwright or something like that, which most people are using. So not a lot of people are actually using the interception proxies to send the traffic and analyze it. Most like people who are working on the DEF CON teams that are doing the AICC competition, which I wanted to expose to the people on the pod. but.

Justin Gardner (12:30.98)
Mm. Mm-hmm.

Justin Gardner (12:44.634)
Mm-hmm.

Justin Gardner (12:57.968)
Mmm.

Jason Haddix (12:58.887)
They're using Puppeteer Playwright to instrument web hacking techniques.

Justin Gardner (13:02.286)
Okay, that seems like a little bit of a higher layer of abstraction there, and I'm wondering why they're going that route, because, you know, HTTP is just text, right? And a lot of these problems, like Daniel Measler talks about, is getting everything to be a world of text, right? Where the LALMs can play with it and parse it and stuff like that, and I think HTTP is already text. So I feel like it should be really simple to create something where...

Jason Haddix (13:07.918)
Yeah. Yeah.

Jason Haddix (13:13.53)
Mhm.

Jason Haddix (13:20.548)
Yeah.

Justin Gardner (13:30.052)
we give it an HTTP request and maybe there's a lot of tokens, right? Because HTTP requests have massive amounts of tokens and stuff like that. But yeah, but, then, we just enable it to just string replace a specific, maybe we give it a string replace tool and an HTTP send tool. And then we just kind of let it brain and watch the chain of thought and watch the response, the request and the response.

Jason Haddix (13:33.226)
Mm-hmm. I know it's fine. Yeah, yeah. Yeah.

Justin Gardner (13:55.044)
Like, I don't know, am I oversimplifying the problem here? Is there more to it? That should be pretty simple, right? Why am I not doing this right now? I need to be doing this right now.

Jason Haddix (13:57.74)
No, no, no, you're, yeah, that's easy. That's easy to do. all of my, yeah, all of, so I sent you the architecture for what I'm building right now. I'm not sharing it, but in that section is the web fuzzers, right? And so that would be a sub-component agent of a web fuzzer. And so you have to build some prompt engineering in there to do the match and replace, and you have to rig it up to Kaido, which I'm sure you're using, right?

Justin Gardner (14:08.528)
Hey, that looked crazy.

Justin Gardner (14:22.672)
Mm-hmm. Yeah.

Jason Haddix (14:24.578)
and to do the web sending. But yeah, it's not a hard problem at all, right? I think the institutional knowledge of, basically the training data set for whatever AI you're going to use to build those bypasses or to contextually attack a certain vulnerability has some prompt engineering built into it that I think that I've noticed. Like, you can't just ask it a general question. Sometimes it'll give you kind of trash answers.

You have to be very specific with the types of tricks you want it to perform sometimes in prompt engineering. So that agent has to have some system prompting to it.

Justin Gardner (14:51.432)
Mm, yeah. So we need to ingest, I think Reza was talking about this with me the other day. We need to ingest like, you know, world-class documentation on these specific types of vulnerabilities and stuff like that. Mm, mm.

Jason Haddix (15:04.942)
Yeah. Yeah. Let me, let me give you an example here. okay. So when you're talking, the attention mechanism inside of LLMs is one of the most kind of like key parts to what makes LLMs what they are today. And what the attention mechanism does is it takes a token, but in our case, let's just talk about a word, right? It takes a word and it updates its context in a 4D space and it shifts your next token generation into that 4D space based on every proceeding word, basically.

And so when you feed into it really good context, like a world-class research document on bypassing filters for SSRF and other things, what that does is it narrows the output focus of the LLM to the best possible research. And so that's one of the prompt engineering tricks that you can do is inside of the prompt engineering, what I do is very, it's like SEO, right? So when you have a website, you have a whole bunch of like SEO keyword terms that you see everywhere. that search engines, you know,

Justin Gardner (15:47.984)
Hmm.

Jason Haddix (16:01.742)
find your site like Google and stuff like that. I see my system prompts with very technical words that shift the 4D space narrower to world-class kind of output. And so that's what you have to do in, yeah.

Justin Gardner (16:12.88)
Dude, it's crazy man. That's some brainy shit. Every time I try to think about four dimensional implications of this sort of thing, my brain just starts, the only way I've found that I can really grasp that sort of thing is obviously we live in a three dimensional reality, And then we've got the fourth dimension of time. And I'm like, how do I map all of that onto, it's crazy man. It's hard for me.

Jason Haddix (16:34.042)
Yeah.

Yeah. So if you want like a visual representation of the attention mechanism, there's a great video by 3Blue1Brown. And he has a whole series on how transformers work, but specifically his video on the attention mechanism really opened my mind in how to think about prompt engineering to better narrow the space for world-class output for these bots.

Justin Gardner (16:47.47)
note that down.

Justin Gardner (17:04.976)
That's awesome, man. I'm definitely going to check that one out afterwards. We'll link that down in the description as well. All right, so we've got, so let's brainstorm a little bit on these microagents because I think it should be pretty easy to spin off sort of like a, and I'm probably oversimplifying the problem, but I think it should be pretty simple once we get the HTTP stuff in place and then we get this match and replace stuff in place. It should be pretty easy to implement this thing. So I'm thinking, we've got the open redirect stuff.

Another one that I think would be really good is WAF bypasses. Because that thing just takes so much freaking time and it's so frustrating. But the techniques are pretty simple. Mix up some encoding, work from character to character and figure out which character is triggering the WAF and then kind of go on from there. Yeah, I think that would be a pretty good one.

Jason Haddix (17:58.36)
Yeah, so you can use, at least what I've used is a backslash powered scanner. So the idea behind backslash powered scanner is send a character that is a control character or a special character of some sort, see how the application reacts and then send it again, but escaped. And because escaping in the Linux land means that it wouldn't carry its special character context on the command line and then seeing if there's any difference in the response. And so that same idea can be used in AI.

Justin Gardner (18:02.22)
Mm-hmm. Yeah.

Justin Gardner (18:25.2)
Mm.

Jason Haddix (18:27.698)
Give it a list of special characters that triggers the WAF. The condition that the WAF usually triggers on, is it a 404? Is it a longer page load? Whatever it is, right? Is it a special error page? Is it the Cloudflare 403? know, whatever, right? And then have something parse the response. Now, what you're missing here is you do need either automation, like some type of response automation, grepper, or an AI agent to basically parse the response.

Justin Gardner (18:40.281)
Right.

Justin Gardner (18:53.85)
Yeah.

Yeah, and it's going to be hella slow if you make it parse the whole response every time and think like, is this the, you know, cloud flare 403, right? So we we need to say, we should come about it a little bit more intelligently and we should say, okay, you know, here's when I give it the context for the situation, I should say, okay, you know, the cloud fair 403, you know, is the, is the situation.

Jason Haddix (19:01.239)
Yeah. Yeah.

Yeah, exactly.

Jason Haddix (19:16.174)
Yeah, it returns the status code. has this icon. It usually has this text. You know, it has these colors. Like if you want to get into like image recognition, it looks like this. You could do all of that.

Justin Gardner (19:20.26)
Yeah. Hmm.

Justin Gardner (19:26.659)
I wonder if we could even ask it to, you know, this is where Rezo starts getting uncomfortable with me when I'm brainstorming with him. He's like, Justin, you know, having AI's run the code is, is not great, but I'm thinking like, all right, have it generate some code that it then runs to say, you know, Boolean true or false. Is this a, a actual, you know, 403 page or is this something different? Right.

Jason Haddix (19:34.542)
Yeah.

Jason Haddix (19:47.235)
Yeah. Yeah.

Justin Gardner (19:48.892)
And so it could be as simple as the status code, which would be easy to extract, but it could also be like, all right, does this regex hit? does, you know, that sort of thing. And if we enable them to use those tools, like, you know, regex, even if you just enabled regex, that should be plenty, right? Yeah. Hmm, okay.

Jason Haddix (19:54.894)
Yeah.

Jason Haddix (20:01.06)
Yeah, yeah, for sure. I mean, a lot of that you can do inside of the interception proxy. Some of it you can't do. You can get more advanced with the AI, with like an AI parser, but yeah, I mean, it should be able to instrument all of that pretty easily, yeah.

Justin Gardner (20:07.077)
Mm-hmm.

Justin Gardner (20:15.386)
All right, what other ones we got? had, I had, like a path traversal fuzzer. think that would be pretty cool if you could, if you could figure out a way to like, that one would be more complicated, right? Is like, how do you determine what is like, what should be causing a traversal and what shouldn't be causing a traversal and when that occurs. But that one I think could drop some crits.

Jason Haddix (20:22.382)
Yeah.

Jason Haddix (20:36.73)
Yeah, yeah, I mean, I think that anything that has to do with manipulating URLs like SSRF and path traversal and, you know, or paths, it is a little bit harder in the response matching and the regex and understanding. But I mean, you can hook it up to a collaborator type server, right, to parse the responses that you get back. The important thing is when you're building the fuzzing bots, each individual request has to have...

a unique key associated to it. So you can tie it back to which fuzzing request did it. Because in my fuzzers, they're building hundreds of attack strings to try to bypass filters, WAFs, everything, right? So my XSS fuzzer will build hundreds of attacks and each one has to be unique so I can tell, you know, which bypass worked. And then the response parser has to tell me, you know, like, okay, number 72 worked out of this list or whatever. So.

Justin Gardner (21:29.52)
Yeah, we could probably just hash the request, save that hash and then save a list of the requests or just build it into logic to write off the request whenever it works.

Jason Haddix (21:33.156)
Yeah.

Jason Haddix (21:38.936)
Yeah, I'm used to, so I came from back in the day writing WebInspect checks, which is a dynamic scanner a long, time ago. And I mean, guess it still exists today, but I don't know who owns it. But yeah, that, I mean, the way we did it was, you know, like, like basically the, you know, XSS payload when it was an alert or something like that would be a unique string. So we could, we could key back on that. So yeah, yeah.

Justin Gardner (21:46.863)
Mm.

Justin Gardner (21:59.856)
nice, yeah, that works. Alright, the last one I had that I'm jazzed about, Jason, I'm sorry for prodding you so much about all this, because I know you sent over that diagram, and I think we were both sort of on the same brain weight, but you were much farther down the path than I was, so now I'm here, like, so, you've been working on this for the past couple months, will you just give me all your secrets? But the one that I'm...

Jason Haddix (22:08.717)
No, it's all good. It's all good.

Jason Haddix (22:14.968)
Yeah.

Jason Haddix (22:21.006)
Yeah. Yeah, that's fine.

Justin Gardner (22:25.392)
pretty jazzed about right now is an automatic fix bypass, right? Because like, so whenever we report a vulnerability, and this would specifically work well for I think unauthenticated vulnerabilities, I think it would be awesome to be able to like give the AI access to your platform account, your bug credit account, your HackerOne account, and say, right, here's the report. Whenever this thing goes to resolved or whatever, try to bypass it. And like for me, I know when I report something I'm like,

Jason Haddix (22:43.456)
Mm-hmm. Yep. Retest it. Yeah.

Justin Gardner (22:53.38)
You know, I bet they're going to fix it like this and I bet that's going to be vulnerable. So what I do is I go in my calendar and I like put in an item and it says, check this report on this date. And then I get to inevitably I get to that date and they still haven't fixed it. And then then I push it again another month or whatever, but then it would be really cool if I could just offload that whole process onto the AI and tell it in advance. Okay. This is what they're probably going to do to fix it. Try this, try this, try this, try this once the fix comes up and, you know, then, then ping me like, wouldn't that be sick?

Jason Haddix (23:20.942)
Yeah. That I'm literally doing that right now. yeah. So, so email automation is part of it to give me the updates and, you know, like discord notifications, stuff like that. But the regression testing, I call it regression testing bot, right? And it has my, yeah, it has that exact stuff, right? It's my institutional knowledge of most developers fix bugs.

Justin Gardner (23:24.656)
Dang it, Jason.

Justin Gardner (23:41.242)
Damn it, damn it Jason.

Jason Haddix (23:50.388)
Because we don't have any input on how they fix it. We could tell them, we could give them remediation, but that's usually not our place in the bug bounty world. More in the pen test and consulting world it is. But normally they fix it with, let's take a cross-site scripting attack. They fix it with a horrible regex to block the attack payload or some sort of the payload or whatever the attack string, or they put in some WAF rule. And so because you have been testing for what?

Justin Gardner (24:05.978)
Yeah.

Jason Haddix (24:17.722)
10 years, 20 years now or something like that, you have all this institutional knowledge of how to break those very simple reg exes. And so you can type that into the system prompt for that agent and then it will just go back and try those for you or prepare them for you and then you can go try them. And yeah, so one of the ones I use that's an example in the talk, I don't know if I gave it in that keynote, but I talk about it in the class that I teach.

Justin Gardner (24:19.234)
Mm-hmm, mm-hmm.

Jason Haddix (24:43.258)
is an actual CBE. So there was a jet brains product I ran into on a, on a web pen test. And so, you know, when you're on a, when you're on a web pen test or, know, you're on like an external red team, you know, you'll go and search, does this software have any CBEs, right? That's part of your workflow. And it's like, you know, if it has bugs, I'm going to exploit them. So it had a CBE that was pretty recent, but it had been patched and it was, it was, you know, basically this version of this jet brain software that was installed had been patched. But I looked at it it was like,

It was a vulnerability that was associated to Markdown basically. And so this was one of those CVEs where they didn't even give you a tag string. They just said, hey, this product had a vulnerability in this section of the software. was a vulnerability for cross-site scripting based on Markdown, a Markdown parser. And so really there was only two places in the application that handled Markdown. And so my regression testing bot, I just fed it that string, the CVE input string, and I said, hey, it says it has a Markdown.

vulnerability. Here's the section where the Markdown interpreter is. How would you come up with 10 ways to bypass this? And then I fed it some context on attacking basically XSS and Markdowns, which I found via activity. I found via some pen test presentations that were at cons. I fed it that context and then some of my own tricks. And it found two bypasses for the CVE in like a publicly, you know, sold JetBrains products. Yeah.

Justin Gardner (26:10.222)
Wow, dude, that's great. And I think this is also, now that I'm thinking about this, this is a great reason to disclose reports for companies, right? Like if you want your Volns to be actually thoroughly regression tested and fixed, then you should disclose the report because what's gonna happen then is eventually someone is gonna come up with this regression tester bot like yours, right? And they're just gonna apply them against all of the activity and just rake in the Volns.

Jason Haddix (26:17.112)
Yo, yeah. Yeah.

Jason Haddix (26:31.236)
Yeah, do you? yeah, yeah, do you do you want to know? You want to know what the bypasses were? OK, so so the the normal injection was adding an image tag with JavaScript in it right inside a markdown, right? The the two breaks were one break, break the markdown into three lines and add null characters in the separating line between the attack and then it will reform the attack string past the regex that worked.

Justin Gardner (26:39.64)
Yeah, hit me, man. Hit me.

Justin Gardner (26:45.188)
Right, right.

Justin Gardner (26:56.218)
Mmm.

Jason Haddix (27:00.586)
And then data encoding the payload in Base64. Both worked. The JavaScript. The JavaScript. Yeah. Yeah, yeah.

Justin Gardner (27:00.622)
Wow.

Justin Gardner (27:05.21)
data encoding what part of the payload? That's whack, dude. The problem isn't the actual JavaScript content. The problem is the onload handler or the onerror handler. Wow, that was not a great fix then. That's crazy. Yeah.

Jason Haddix (27:18.232)
Yeah, Yeah, yeah, no, no. It was literally a regex fix to the first thing, right? So anything that passed the original tag string regex worked. So, Yeah.

Justin Gardner (27:27.532)
Nuts, man. That's nuts. All right. Cool, man. Well, I won't, I won't, you know, juice you for any, any more information about all that, but I, it's very exciting. I think, I think up until this point, I've very much been a, you know, in the loop, human in the loop sort of guy where I'm like, yeah, you know, I think these things can be really helpful in, in helping us perform more effectively. but I think lately I've really been seeing the vision of like, especially these smaller scale tasks.

Jason Haddix (27:33.432)
No, it's all good. It's all good.

Jason Haddix (27:44.867)
Yeah, yeah.

Jason Haddix (27:56.664)
Yeah. Yeah.

Justin Gardner (27:56.888)
It's still human in the loop, but it's like, you know, and then I'm just delegating this one little piece. And I think that is really big. And then, you know, eventually at some point we may be able to, you know, we may be able to delegate our whole piece of it, but I think it's gonna be those small pieces for a really long time. Yeah.

Jason Haddix (28:09.998)
Yeah. I don't. Yeah, I mean, I think that I think that that is the power of the agentic architecture, right? It doesn't it doesn't add anything that like that is super special in the architecture. What it does is it allows us to let each bot focus on its own little task, which makes the you know, the context you feed it more powerful and the output you get from it more powerful. Because if I just ask a bot like how do I hack this website and I give it some HTTP traffic, right?

Justin Gardner (28:18.724)
Mm-hmm.

Jason Haddix (28:39.674)
That output comes out bad. And that's what that's most hackers first experience with using an LLM to hack is they're like, okay, let me feed you this whole page. you tell me like, that's what they want, right? They want it to be like, it's, you know, hack this website for me. And that's, that's not how it works, right? It's, it's like, okay, take, let's take the website, let's parse it. Let's identify all the inputs. Let's then read those contextually for what types of vulnerabilities we think they're going to be statistically relevant to them.

Justin Gardner (29:05.646)
Mm.

Jason Haddix (29:07.31)
then break that down, send them to agents that are specialists in those vulnerabilities, and then somehow execute the HTTP requests, and then have an agent parse the response, and then feed it all back to me to do any manual testing I need to do. I have three workflows that go on. have the parsing one, I have the fuzzing one, and then I have one that feeds me manual testing ones.

Justin Gardner (29:21.758)
my gosh.

Justin Gardner (29:26.586)
So let me ask you this, how much does this cost? know, like if you're using SOTA models for all this, it's gonna get expensive, I imagine. because that's one of the things we're running into a little bit with Shift is like, okay, how do I narrow down this massive amount of data that comes in with every single request? know, like if, just to double click a little bit into how Shift works. And let me backtrack a little bit. Shift is is Kaido AI plugin for anybody that haven't heard of it. You can, it's in closed beta. You can check it out at shiftwaitlist.com.

Jason Haddix (29:42.564)
Yeah.

Jason Haddix (29:46.842)
Yeah. Yeah.

Justin Gardner (29:55.578)
But essentially it just integrates AI seamlessly into Kaido so you can use it in your HTTP proxy. And the way that it works is it takes all of these different pieces of Kaido's state, right? It takes the request, the response, all of your workflows you have defined, your scope, all of that stuff, and it builds it into the context and it shoots the context up to the AI along with your query. And then it decides out of the set of tools we've given it what actions should be taken, pushes that back to the proxy.

the proxy takes the actions and then the user's intent is sort of accomplished there. And so, yeah, think that sort of piece of it where you are taking those various actions from the AI and you're executing them and doing that in a recursive way with these smaller agents, it's big, man, it is big.

Jason Haddix (30:46.766)
Yeah. Yeah. It's, I think shift is going to be like a massive force multiplier for, you know, human in the loop testing. And I mean, that's the kind of stuff that I, I output to just a GPT, right? But you know, there's, there's some other things, you know, in there, I'll talk to you offline about some things I think that you guys should add. Yeah. Yeah, for sure. So yeah.

Justin Gardner (31:03.056)
I'm excited. Yeah, dude, dude, that's great. Yeah. Yeah. Like, man, I always want to, I always want to do brainstorming on the pod, but it is that, it is that, you know, that trade off of like, let's, let's serve the community, but let's validate some of these things first, you know, just, just for validation purposes, you know? yeah. So, I mean, I do want to get your thoughts on that on shift a little bit more. Like, I guess what, cause I'm tempted.

Jason Haddix (31:17.538)
Yeah, yeah, yeah, for sure. Yeah, yeah, yeah.

Justin Gardner (31:30.008)
We could go in a couple directions at this point. Where Shift is currently at is we can modify stuff in Replay, which is Kaido's version of Repeater. We could create automate stuff, which is Intruder. We can create those. You can do match and replace stuff, which is really cool. can forge HTTPQL queries, that sort of thing. And there's a couple places we could go. One, obviously, I think we need to implement

the autocomplete, you know, inside the actual on the lines, you know, sort of like cursor, right? Where you just kind of press tab or copilot and press tab and it just knows what you want. Definitely going to do that. But then I'm sort of tossed up. Do I go the chat route or do I go the route of like, let me integrate some of these microagents that we've been talking about directly into the HTTP proxy.

Jason Haddix (32:17.754)
I think the more valuable thing for testers is the microagents into the proxy, I think. you are gonna hit, I mean, the more agents you create, the more traffic you have to parse, right? So it means that you're gonna hit those costs again. I mean, my personal setup, you have to realize I have transitioned more away from Bug Bounty into Red Teaming. So I still do a lot of web testing, it's just done on contract. But I'm not testing like every day of the week, I'm testing maybe a week on, week off or something like that. So I only spend...

Justin Gardner (32:22.07)
Mm. Yeah, man.

Justin Gardner (32:27.898)
Right.

Jason Haddix (32:45.69)
like, you know, five, maybe $500 a month or $400 a month on my token usage across all AIs. So, yeah.

Justin Gardner (32:50.67)
Yeah, thank you. I'm glad you came back to that because I got off the wrong track, but that's really interesting. you're paying, see this is, that sounds like a lot to me, right? $500 a month sounds like a lot, but this is the same sort of problem that we were running into back in the day when people like Eric started spending like two grand a month on servers and stuff to do mass recon. Totally worth it. Everybody knows that it's worth it now. And so I think being early to that is big. 500 bucks, wow, that is a little bit more than I expected though.

Jason Haddix (32:55.278)
Yeah.

Jason Haddix (33:09.08)
Yeah. Mass automation. Yeah. Yeah. Yeah.

Jason Haddix (33:19.994)
Yeah, I mean, I'm paying for the new model for OpenAI, so I just increased my cost for it. I would say before paying for the new subscription on OpenAI, was probably more than 300. But yeah, have... Okay, so here's something that we didn't talk about, is different models benchmark different... benchmark well at different things, right? So for anything that's contextual and analysis-based, I use the OpenAI ecosystem for my agents, right? For anything that's write me code...

Justin Gardner (33:24.334)
Mm-hmm.

Justin Gardner (33:39.117)
Mm, yeah.

Jason Haddix (33:47.93)
or generate attack strings that actually I use Claude for Claude 3.5 Sonnet. Yeah. Anything that's search related where I want to use search, I have moved away from the default plugins in the OpenAI ecosystem and it moved to perplexity to feed that into the context window of GPT because it is has a better search bot. Yeah. So.

Justin Gardner (33:51.696)
It's gotta be Claude. Yeah, I agree.

Justin Gardner (34:05.739)
interesting.

Justin Gardner (34:09.648)
I haven't played around with perplexity that much. you you have you used Gemini at all? are your thoughts on Gemini?

Jason Haddix (34:15.301)
So Gemini forever burned itself for me when I figured out that the training data is partly from Reddit. And so they had that big snafu, it was publicly, it was like, my cheese is not sticking to my pizza. What do I do? And someone asked Gemini and Gemini was like, add Elmer's glue to your sauce. And so when someone dug into like, where did that come from?

Justin Gardner (34:21.598)
hahahaha

Justin Gardner (34:36.33)
my god.

Jason Haddix (34:41.69)
It turns out it was a Reddit comment like 10 years ago by someone who was trolling and that's the closest context the bot could get to pizza sauce needing to be stickier. so like, forever that is burned in my mind. So I haven't given it a chance. I know it has been benchmarking really, really high lately. Yeah.

Justin Gardner (34:58.128)
Yeah, I think it has. I think OpenAI and Gemini's definitely had its more than its fair share I'd say of dumb shit that it said. But I think OpenAI's models and Claude as well. I haven't heard as much about Claude, but OpenAI's definitely has said some dumb stuff too. Yeah.

Jason Haddix (35:05.538)
Yeah, yeah, yeah, yeah.

Jason Haddix (35:13.05)
yeah, yeah, for sure. They all have. mean, I need to go back and benchmark, Gemini. mean, there's, so many models, man. I mean, there's like, yeah, it's quick. Yeah. I mean, if you, if you look at like the whole scene of all of the models that are coming out, there's a great table that I have bookmarked in my presentation, but it's like, there are over like cloud SaaS based models that have had pre or post training for different specific tasks. There's 200, 300, you know, out there that you could use. And then.

Justin Gardner (35:20.312)
It's fast, man. Flash is quick. It is quick.

Jason Haddix (35:41.991)
If you want to make custom stuff in your home, can use Lama, all the new versions of Lama. But in general, I'm sticking with OpenAI and Anthropic Ecosystems most of the time.

Justin Gardner (35:45.808)
Mmm.

Justin Gardner (35:49.327)
Yeah.

Yeah, I'm excited for the local stuff to get better, man. I, you know, we really, I tried to benchmark some of the local stuff when I was built the fabric, right? Hacker one report fabric extension. And, and it just, it just wasn't good, man. It just wasn't good, but I know that they've, they've released some good stuff and, you know, maybe if I get a beefier machine with a better GPU, then it might be, it might be good. But yeah, I, I

I think that's the next frontier, man. If we can make everything local or if we can solve that problem where it's like, where we can encrypt the prompt and encrypt the response so that the, the, provider itself of the model doesn't, doesn't have introspection into that. but I think that that I was listening to a podcast, I think it was Lex Friedman's podcast with, the cursor team. that is a really hard problem to solve. I think is, is getting, is mapping that, that, you know, those vectors.

Jason Haddix (36:35.962)
Mm.

Jason Haddix (36:42.42)
Yeah, I know it's, yeah.

Justin Gardner (36:49.264)
into an encrypted space, you know, where it's not introspectable is like, that's going to be a while down the road, I think.

Jason Haddix (36:55.938)
It is, yeah, it is quite a bit away, I think. I think right now you have to assume that's anything in the training data and anything in the system prompting is subject to being leaked no matter what. So that's just what you have to assume right now.

Justin Gardner (36:58.852)
Yeah, yeah.

Justin Gardner (37:07.216)
Let me ask you this, what do you think about, know, obviously we've got the SOTA models and the state of the art models and I think they do a great job performing with cybersecurity tasks, but sometimes they will whine at you for, you know, trying to be a hacker or whatever. And I've seen some sort of custom models built around this like White Rabbit Neo or any of those and I'm wondering what your thoughts are on like,

Should we actually be building models specifically for security at a lower level, at sort of that AI model engineering level?

Jason Haddix (37:46.488)
Yeah, I I would hope that we could get there. My hope is, but my practicality and usage of the tool says that the big models trained on billions of parameters are just going to have the best context and training data. And they have always performed the best for me. Like you were saying, sometimes meta, you can just tell it has that uncanny valley feel to it, right? Like generic writing style, it's not very technical sometimes, even when you try to system prompt it well.

Justin Gardner (38:06.725)
Mm.

Justin Gardner (38:12.069)
Mm.

Jason Haddix (38:12.258)
It just doesn't do as well as the bigger, you know, SaaS models or, you know, other models. And so a lot of times in my security based prompting, like a couple of my bots that I put out on the store are, have gotten banned from like the automated systems of the GPT ecosystem. But when you're, when you're building these for your own usage, right, there's a couple of tricks, right? So first of all, tell it's tell the bot in the system prompting, it's working on a CTF.

Justin Gardner (38:17.038)
Hmm. Yeah.

Justin Gardner (38:27.61)
So annoying dude.

Jason Haddix (38:40.022)
And since most of the tools you're going to be using, so the CTF one works a lot of times. And since most of the tools you're going to be working on are, you're going to be using the API rather than the chat interface, you'll have access to pre-seed a user prompt. So you have a system prompt and your user prompt, which is usually what you chat to the bot, right? But you can pre-seed, can hard code in user prompts. And so what you can do is you can send the API request to the bot or the agent.

Justin Gardner (38:41.014)
Mmm. Cool.

Jason Haddix (39:08.056)
and start off with a prompt like, hey, I'm a cybersecurity student doing a CTF, will you help me? And then once the bot responds in the affirmative, that's in your context window. And then every subsequent, it replies in the affirmative, every subsequent request, it's more likely to just say, okay, we're gonna continue working on this problem.

Justin Gardner (39:14.298)
Mmm.

Justin Gardner (39:25.666)
I love that. That's amazing. The bots like, well, I said I'd help them. I got to, dude, that's those are great. Those are great tidbits, man. That, that, that, that will make the difference when we, when we get a little bit deeper into, into developing all this stuff. so, so I guess coming back around to shift, you know, currently we have a way to interact with a decent bit of the pieces of Kaido, thinking about implementing the agents. I think that would be pretty cool.

Jason Haddix (39:29.772)
I said I'd help him, yeah, so yeah. Yeah. Yeah.

Jason Haddix (39:41.924)
Yeah, yeah.

Justin Gardner (39:55.408)
What advice do you have for me on that? I'm thinking, you know, what way should I implement that in that it'll be most helpful to hackers? Like do you think I should implement a release like a laugh bypass bot or something like that? Or do you think we should try to build it in such a way that each individual person can write their own customized bots?

Jason Haddix (40:18.89)
I think you have to do both, right? Because there's a couple of custom things that I want to be able to ask my proxy, right? In the future state of the world, I want to be able to give very specific... In the future, I want to be able to talk to my interception proxy and give it specific context based around what I've noticed from the app already, right?

Justin Gardner (40:20.677)
Yeah.

Justin Gardner (40:27.924)
Hold on, I'm getting my notes ready. Hold on, this is great. No, no, no, this is exactly what I want. All right, hit me, hit me. Hit me.

Justin Gardner (40:39.92)
Dude, you say talk, you're using, I don't know if that's just a verbiage, but I mean, is it really important for you to be able to just like speak to it? Wow, that's okay. Yeah, yeah, Fair, yeah. Yeah.

Jason Haddix (40:47.5)
I'm a speaking type person, right? I mean, I've been on the pod four times now, right? So I'm good at talking. So, I mean, but you could type too. It doesn't matter. Like I need to be able to give the interception proxy my contextual knowledge in a quick way that's very specific to this app, right? I think that there's some stuff like, okay, so what if there's a previous bunch of reports that you have on this and you want to make sure that that context and how, you know, like how it's worked before, like what...

Justin Gardner (41:04.997)
Mm-hmm.

Justin Gardner (41:12.272)
Dude, that's a great idea. That is an excellent idea.

Jason Haddix (41:16.002)
Like what if, what if you know already which libraries on the backend are parsing URLs or something like that? That's context the bot can use in every fuzzing attack for SSRF or, you know, you know, whatever. and so me having one interface for automated fuzzers is great and building agents. That's cool. That's going to come, whether you do it or somebody else, that's coming, let's come into someone's going to build it. It's not, I've built it in the GPT like ecosystem, someone else is going to, so, but those are super useful.

Justin Gardner (41:37.443)
Mm-hmm. yeah.

Jason Haddix (41:44.77)
And they're easier to accomplish, honestly, than this contextual based stuff. And so I think that you have to have both. think that you have to have, you know, like, I'm working on this specific app. what about, okay, so one of my favorite examples is, I don't know, you were at this event. Do you remember two years ago at the Vegas event, it was, no, no, it's cool. I'm not gonna release the customer, but there was a Hacker One Live event in Vegas and the customer had,

Justin Gardner (42:05.988)
We can bleep it if you, yeah.

Mm. Mm.

Jason Haddix (42:14.446)
the customer had an app that dealt with telephony in certain parts of it. And someone figured out that you could call like this API and basically charge the company a bunch of money. Do you remember?

Justin Gardner (42:20.1)
Mmm.

Justin Gardner (42:27.704)
Yeah, I remember that bug. We've actually talked about that bug on the pod before. That is just legendary attack vector ideation. just,

Jason Haddix (42:31.607)
Yeah, yeah, yeah.

Yeah, yeah, right. so with contextual knowledge like that, when you can talk to your proxy or type to your proxy and be like, cool, here is actually what the site's meant to do. It can't parse that from the HTML. I mean, maybe it could read some text in the description of whatever, you dictating to the pro, it's the meta knowledge about what the business functions are for the app. Then it can get even better at finding some esoteric bugs, basically.

Justin Gardner (42:52.112)
It's like the meta-knowledge, yeah.

Justin Gardner (43:00.768)
very cool. Okay, so I need to be able to have it talk to me and talk to it mostly and I need to give it contextual knowledge about the app and I need to be able to ingest reports. Man, I really like that last one. That last one is super good. If you could just say like, all right, know, just ingest it right in.

Jason Haddix (43:05.934)
Yeah, yeah.

Yeah. Yeah. Yeah. Yeah.

Jason Haddix (43:17.444)
So we do it with our pen test reports, right? Like when we come back for the next year to do an annual pen tester red team, right? It's like, okay, here's the previous report. First of all, we need to check all these things to make sure if they're valid or they've regressed or there's some kind of bypass, but also it feeds our context for the assessment. Cause we've written down all this information that was very specific to this engagement. yeah.

Justin Gardner (43:23.534)
Mm-hmm.

Justin Gardner (43:43.202)
Mm, mm, very cool, yeah.

Jason Haddix (43:44.386)
And it's what most of the pen test companies are rushing to do right now. They're rushing to build internal systems to parse all of their reports, all of their tips and tricks out of a rag database. and then be able to build a, you know, assistant to help their red teamers and pen testers. Most consultancies I know right now are racing to build this. In fact, I've helped some of them build it themselves.

Justin Gardner (44:03.94)
Yeah, yeah, the rag stuff is really important, getting all that vectorized and stuff and understanding what needs to be vectorized and what needs to be actually in the prompt itself to highlight and inform the AI versus just direct to the AI versus like informing it with ragpoles. Yeah, it's interesting stuff.

Jason Haddix (44:13.87)
Yeah. Yeah.

Jason Haddix (44:20.11)
Yeah. Yeah. Yeah.

And then the fuzzers are pretty easy. So you go out and do the kind of research you do. I know when you hack, you spend a lot of time figuring out the components of how to hack certain things and the vulnerability class. And the other person I see do this really well is Greg for Bug Bounty Reports Explained. Yeah, yeah, me too. I'm a subscriber too. yeah. So that's the same way I... So my Bug Bounty methodologies, all my talks and stuff are based around just like...

Justin Gardner (44:31.376)
Mm-hmm.

Justin Gardner (44:41.776)
Dude, love it man. That's why I'm a subscriber dude. His data stuff is so good.

Jason Haddix (44:52.378)
diving deep into research around a couple things at a time and then building a methodology and understanding patterns, right? And since I'm an offensive security guy, it like makes it easy, but your fuzzers, you're gonna write the system prompts for those fuzzers for each individual vulnerability with all of your contextual knowledge, like what bypasses work these days, which ones don't, what is the workflow for bypassing a WAF versus bypassing a regex? There's differences sometimes, you know, what is the workflow for?

using different event calls, using different functions, all kinds of stuff. So you're going to write that into your prompts.

Justin Gardner (45:28.206)
Yeah, for sure. Let me ask you this, that makes me think of this. One of the things that I would really like Shift to be able to do is sort of watch your HTTP history or whatever and learn the things that you need to know like IDs. I very much should be able to just open a JSON blob or whatever and just say, all right, user ID, colon, double quote, right? And then I should have it be able to know.

what user ID I want to come after or give me an option of user IDs. Okay, user A has this user ID, user B has this user ID or just build out that whole request piece. And the way we've solved this right now in Shift is we've got this memory function where you can highlight some text and press Control Shift and it will take that piece of free form text and put it in its memory and that memory gets fed to...

the AI whenever you query. So you can say, all right, build out this request for me, and it will sub in all the requests. Or you could say build out this request with user A, and it will build out all the requests, But ideally, I would like the AI to identify itself what IDs are important. And that gets tricky, right? I mean, what are your thoughts on that?

Jason Haddix (46:23.332)
Yeah. Yeah.

Jason Haddix (46:39.086)
Well, yeah. Not too tricky. I mean, have you ever heard of this project called Hunt before that I did? Okay.

Justin Gardner (46:45.774)
Yeah, yeah, yeah, yeah, I have, of course. Of course, Jason, we all know you're legend. Come on, yes, I've tracked everything you've done since I was a beginner. Statistically probable, I'm gonna prove it to you. Statistically probable parameters for each individual vulnerability types, yes. Brilliant, yes.

Jason Haddix (46:51.546)
So, so you can take the statistically. Yeah. Yeah. Yeah. Yep. That's exactly it. I mean, you can, you can put that into the context window and have the AI identify it. So that actually my version, I just implemented that here. And so I have to keep up on, you know, common frameworks of authentication types to understand what the parameter names are. but I put that into the context window, the bot, it auto identifies that stuff for me now.

Justin Gardner (47:09.808)
Mm.

Jason Haddix (47:20.73)
It's like, okay, so you sent several queries with this user ID, with the user ID parameter or route, and here are the values for that. Would you like to reuse them, you know, for authentication attacks? And I'm like, yes. And then it'll build me out curl strings or I haven't set it up in a proxy like you guys have yet. So I use curl to do authentication testing and it'll auto fill those user IDs, the authorization headers, the cookies. If I need it, it'll build web requests that I can paste in the burp. So it does all of that for me. Yeah. Yeah.

Justin Gardner (47:34.232)
Mm. Mm-hmm.

Hmm.

Justin Gardner (47:46.008)
Awesome, man. That's, that's fricking great. Wow. And, and is that, is that a command line tool or using GPT for that? Nice. Wow. That, just integrates you just boom. And then it, it generates all this stuff. And then, the next step I guess would be, you know, getting it ingestible into a proxy and,

Jason Haddix (47:52.148)
It's a command line tool now, yeah.

Jason Haddix (48:02.2)
Yeah, there's still that step of me having to copy and paste, which sucks. Yeah, but it is, it is a lot better than that. Yeah. It's so much better. Yeah.

Justin Gardner (48:05.488)
Hey, but you know, it's a lot better than having to build out the request from scratch. Like so much better, man. With the, one of the game changer things with, with shift for me was just being able to, to copy a piece of JS code, shift space, paste it in and say, build this. And it just boom. And I'm like, my God, I love that. You know, cause it.

Jason Haddix (48:22.83)
Yeah, yeah, yeah. mean, the parser part, the free form parsing of routes and parameters, I mean, has been really good with AI. It'll build me an attack map of all routes and parameters for an application. And then if it's, you know, API based or or something else, you know, if I have like a swagger file or something like that, it can build me all the curl requests I need to test with the authorization header and without the authorization header. It'll.

automatically figure out what the schema is for the JSON, which I'm horrible at when I'm looking at stuff like that. It's hard sometimes. Yeah, yeah, the right indentation is like, what is top level? What is bottom level? It'll also even guess at, sometimes they don't give you, they give you the type, like integer or whatever, but they don't give you specific lengths or what is supposed to be in there. like,

Justin Gardner (48:53.456)
Yeah, it's hard. It's hard to get the right indentation and yeah, it's like...

Jason Haddix (49:13.894)
It'll guess at those and give me some possible things that I can fill into the payload types, which is fantastically useful. It doesn't sound useful when you say it out loud, but it is fantastically useful when you're actually doing API testing. Yeah, so there's all kinds of stuff it can do. And I use it a lot in a suggestion form too, right? Like you can break it out into sections. What do you know and what can you prove? And then what can you suggest? And then the suggestion part actually ends up winning a lot of times too as well. So yeah.

Justin Gardner (49:24.836)
Yeah.

Justin Gardner (49:41.104)
Dude, that's awesome. That is some great work, man. Yeah, I think I'm excited to have that in my proxy and I'm gonna continue building it out. Those are some great ideas. So let me ask you this as well. So my first little dabble into AI was, of course, after seeing Daniel Measler's fabric thing and being like, I need this in my life, you And then building the Write Hacker One report.

sort of extension for that. forget what they're called at this moment, know, patterns, thank you, that's the term. Writing that pattern, and that's been really helpful because H1 has a template that they normally use and it's pretty simple and I just kind of built that out and created a workflow in Kaido to just right click on a request, send it out, and then just give it a little bit of extra context and boom, it generates the report. The problem with that is that it is, it is,

Jason Haddix (50:12.708)
patterns.

Jason Haddix (50:27.96)
Yep. Yep. Yeah.

Justin Gardner (50:36.464)
the local models really, we're dealing with something a little bit more sensitive here. I'm a little bit less like, the AI is seeing all of these requests, it's like, okay, sure, and they see a thousand requests that don't work for every one request that does have a vulnerability in it, right? So I'm a little bit less paranoid about that, but when I'm writing a report, there's 100 % of vulnerability, right? And so giving that data out to AI is a little bit tricky. But man, I just,

benchmarked it against these local models and it's bad. It was bad. And so, I don't know, what are your thoughts on that? How do I fix that?

Jason Haddix (51:11.822)
Yeah. So here's a trick for you. So the architecture is you need an obfuscation bot. Basically the local model is the obfuscation bot and then the cloud model. So it's basically you take...

Justin Gardner (51:24.506)
Jason that's genius. That is freaking dis- I'm sorry Richard. I should turn my mic down. know Jason that is a ma- that's such a good idea. Why did I not think of that? Son of a bitch.

Jason Haddix (51:31.185)
No!

Yeah, so it'll it'll redact the domain redacts. Yeah, send it it local model redact the domain redact. I really I mean usually it's just the domain, the cookies, anything sensitive, then send it off to the cloud model which which will write your report for you. Then send it back to the report writer and the report writer will fill in back the information and give you your report.

Justin Gardner (51:39.61)
Dude, what? Of course, of course.

Justin Gardner (51:57.936)
Dude, I'm such a dunce. That's such a good idea. That is an amazing idea. How did I not think of that? Very, very good, man.

Jason Haddix (52:03.588)
So this is a common architecture for people working on internal stuff who still want to use the cloud models. There's two choices that you have when you're building a bot or a system internally, but you want to use your PII data. And so one is this architecture where you have an obfuscation bot that will basically put in placeholders or whatever, and then have a better model work on it from the SaaS vendors. The other one is that most of us have contractual.

Justin Gardner (52:09.808)
Hmm.

Jason Haddix (52:32.588)
obligations or contractual language with Microsoft already because we're corporations and we're using the operating system and Azure and everything like that. Azure has a hosted version of OpenAI that is yours only. And so if you already have the legal contractual knowledge with Azure, you could probably sue them to oblivion if they were ever to look at your traffic. you can just install the newest models on Azure for yourself.

Justin Gardner (52:33.504)
Mm. Mm-hmm.

Justin Gardner (52:53.008)
That's pretty cool. Yeah.

Justin Gardner (52:57.7)
Hmm. Yeah, that's good idea. You know, and I guess it's where it becomes like, what is our data and our agreements with Microsoft versus our targets data and our targets agreement with Microsoft. But who doesn't have an agreement? I've got an agreement with Microsoft. And so, yeah, no, that makes a lot of sense. All right, man. So I've picked your brain a ton on AI stuff. Let's just pivot away from that just for the end a little bit and get to the dark side. Let's talk about...

Jason Haddix (53:04.954)
Yeah, that's true. Yeah.

Yeah. Yeah.

Jason Haddix (53:21.498)
Okay, all right.

Jason Haddix (53:25.86)
Hahaha!

Justin Gardner (53:26.788)
Let's talk about your talk, The Dark Side of Buck Bounty. I don't know, man. You know, like, I listen to the talk and there's a lot of concerning things in there. But I you really, you think, I mean, I could definitely see the WAF people being around for sure, 100%. So let me just set the context a little bit for anybody who hasn't seen the talk. Jason was saying in this talk that there are WAF representatives that are among us.

Jason Haddix (53:34.03)
Mm-hmm.

Mm-hmm.

Jason Haddix (53:41.498)
Yeah. Yeah.

Justin Gardner (53:54.48)
and monitoring us for techniques, which I know for a fact is true. Yeah, I can definitely see that one. The Bug Bounty platforms training attack AI on our data, you think that's happening? my God, dude, really?

Jason Haddix (53:58.222)
Yes.

Jason Haddix (54:09.434)
It is 100 % happening.

So as soon as you click that submit button, you give all rights to your attack traffic, everything that happens to the platform, right? It's all in the terms of use of using the Bug Bounding platform. So they legally can do whatever they want with it. They are absolutely training models right now to take in that data and build automations, scanners for their other products. They're absolutely doing this right now.

Justin Gardner (54:15.312)
Mm-hmm.

Justin Gardner (54:21.402)
Mm-hmm.

Justin Gardner (54:40.816)
Yeah.

Jason Haddix (54:40.92)
And they would be dumb if they weren't doing it.

Justin Gardner (54:43.376)
I have talked to a representative from one of the big bug bounty platforms and they have categorically denied that. I have not talked to the other big bug bounty platforms, so take your pick here. But I do know that H1 does have high, that's public knowledge and that is definitely AI parsing our reports and our data and stuff like that. And that is, I think, to be expected. The thing that is a little bit...

Jason Haddix (54:53.13)
Mm. Yeah. Yeah.

Justin Gardner (55:12.79)
sketchy for me is like, they're actually using this for creating attack bots and, and attack AI. and I mean, I, I, I'll just ask again, I'm sorry. You know that this is happening. You think this is actually happening?

Jason Haddix (55:28.506)
Yeah, mean, it's actually happening. Has it been released? No. So Haya was the first instance of them looking at the traffic, right? That I think publicly has been kind of cool. I mean, even in Haya's design scope, I don't know if I feel great about it, but it's fine. But yeah, they're looking at building custom threat feeds for customers. They're looking at building attack bots that can recreate.

Justin Gardner (55:35.897)
Mm-hmm.

Justin Gardner (55:42.478)
Mm. Yeah.

Jason Haddix (55:55.086)
things for auto triage and then find the same vulnerability across multiple programs. Those are the key things that they're gonna try to do.

Justin Gardner (56:02.32)
Wow man, all right, well, you I know that platform representatives listen to this pod. So, hey guys, you need to make a statement on that. That is not okay. know, Jason has already talked about it in his Defcon talk, but I mean, I would love to have a statement from somebody just saying, hey, no, we're not doing that. Or hey, you know, it's what you agreed to, you know?

Jason Haddix (56:22.298)
Cool. And if they have since canceled those projects, I would be so happy, right? Because I feel like, I think there's two ways you can go, right? I said it in the talk, right? If you're gonna do that, give us a cut of everything you find, right? So if an automation that you built off of our attack traffic, you know, comes from our research, like kind of Detectify did, right? Like, yeah, you know, like, okay, give us, yeah, yeah, no, I'm gonna follow Franz, Franz 100%, yeah.

Justin Gardner (56:26.126)
Yeah. Me too.

Justin Gardner (56:34.618)
Mm-hmm. Yeah.

Justin Gardner (56:40.4)
Yeah, like, Detectify, yeah. Screw those guys, but whatever. Yeah. Yeah, same, fronds all day, man. Like, you don't mess with our boy. No, no, no, no, no.

Jason Haddix (56:52.608)
Yeah, exactly. Yeah. Yeah. But yeah, I mean, I would appreciate a cut, you know, because there's there's hundreds of programs that I don't have access to, right? That are private or whatever. And if they find something on those using my research, like, you know, that that'd be cool to get a kickback. That's one way to approach it. And the other way to approach is just not do it. Right? Like that's it's kind of shady. So yeah. Yeah.

Justin Gardner (57:00.41)
Mm-hmm.

Justin Gardner (57:11.13)
Yeah, yeah, for sure. Okay, last thing that I had from this talk before we wrap it is you mentioned one of the best ways to get your reports paid out better is to write out the CVSS in the impact assessment, very granularly if they're using CVSS. But man, that's a pain in the butt to do. When I get to the end of the report and I've done my full technical explanation, I get to the impact and I'm like, the impact speaks for itself. And I've been notoriously known to

Jason Haddix (57:24.719)
Yes.

Mm-hmm. Yep.

Yeah?

Jason Haddix (57:38.904)
Nope. Nope.

Justin Gardner (57:40.996)
you know, do like one line or two line impact statements, but really I need to be not doing this.

Jason Haddix (57:44.228)
Yep. Yeah. So the biggest gotchas on CVSS, both the old and the new one were like the access, right? And it's like, do you know, do I need privilege access or do I need not privilege access? And most people, most engineers think of privilege access as corporate access, right? And so they're like, they're like, or like, I think of, I think of, you know, privilege access as corporate access. I don't think of privilege access as me signing up.

for a free account using my Gmail address and getting access to your app. But that's where the big mistake comes in, right? People are like, well, you have to sign up for an account in order to get to the internal part of the app and exploit it like this. And so they downgrade that section of the report. And so I have built out an AI that will write out that section of the report.

Justin Gardner (58:13.348)
Right, right.

Justin Gardner (58:30.064)
Give it to me, man. Give it to the people, Jason. Where is the AI?

Jason Haddix (58:33.942)
Very simply. it's almost a template at this point. I think I've put so much context into the thing. I could probably just templatize it rather than use the bot, but sometimes I'm able to add contextual stuff to the bot about the application, about how free registration is open access to anybody in the internet. exactly. Yeah. And so then, yeah, explicitly writing out the CVSS for your reports is really important. Yeah.

Justin Gardner (58:45.392)
Mm-hmm. Yeah. You have to have those details to know, yeah.

Justin Gardner (58:58.576)
Yeah, okay, man, I want that bot, Jason. Come on, can I convince you to give me that bot?

Jason Haddix (59:00.858)
You

I don't know, man. I'm still at the cutting edge of this stuff. I kind of want to stay there for little bit longer. don't know. We'll talk. We'll talk.

Justin Gardner (59:07.396)
My man, you know, that's fair, dude. That's fair. We'll talk. All right. All right, man. You know, I think that's good. And I've talked about it the pod recently too. know, it's very important to be more thorough with your impact assessments. Like typically I try to be thorough with my POCs, you know? So I try to make the POC speak for itself. You run the script, boom. You click this link, boom. And it just takes everything and even cleans up after itself. It closes windows. You it does all sorts of good stuff.

Jason Haddix (59:20.43)
Yeah. Yeah.

Jason Haddix (59:31.482)
Yeah. So.

Justin Gardner (59:37.456)
But I think I also need to go that extra mile for those people that aren't there, hands on the keyboard, the script, running the script or clicking the link that are just reading the report and need to see that impact.

Jason Haddix (59:49.028)
Yeah. I think it's a sliding scale too, because I mean, let's say it's a program you've been working on for a long time. They're going to take what you have to say seriously, or if they know who you are, which was, also talked about in the presentation, if you're an InfoSec celebrity, they'll take it more seriously. But if you're nobody, they're going to harshly, more harshly review your report. And yeah, it has.

Justin Gardner (59:55.918)
Mm-hmm. Yeah.

Justin Gardner (01:00:08.89)
Yeah, I think that's gotten worse recently. It really, it really did. Cause I denied that pretty strongly for a long time. But recently I've literally built the POCs with my friend, you know, like collaborating and you know, I'm not in the specific thing or they don't have collaboration enabled or whatever. Anyway, he submits it and then they kick back and they're like, blah, blah, blah, blah, blah, blah. And I'm like, really? Like with that POC, I didn't think so, you know? So yeah.

Jason Haddix (01:00:14.498)
No, it is absolutely true.

Jason Haddix (01:00:19.574)
Uh-huh. Yep.

Jason Haddix (01:00:29.37)
Yeah, yeah, yeah, yeah, yeah, yeah. Same thing happens with me, right? Like, so some of my mentees will send reports in and get kicked back and then I'll submit it and they'll be like, cool, this is a great finding. Yeah, it is, yeah, yeah.

Justin Gardner (01:00:42.736)
The triage battle is hard, You know that better than most. You worked for Bug Crowd for a while. I think it's hard to get and keep good triagers that understand the whole flow. It is what it is, man. But I guess it's part of the game.

Jason Haddix (01:00:57.72)
Yeah, yeah.

Yeah, it is part of the game. The game can be played too. Make sure to watch the talk. If anybody hasn't seen it, called The Dark Side of Bug Bounding. It's out there on YouTube. I have a whole bunch of sections in there at the end about how to kind play the game a little bit better. So, yeah.

Justin Gardner (01:01:06.32)
Mm.

Justin Gardner (01:01:13.168)
Yeah, good stuff, man. All right, thank you so much for the great info, Jason. Appreciate you coming on the pod. All right, peace.

Jason Haddix (01:01:19.011)
Awesome, thanks everyone.

Episode 102: Building Web Hacking Micro Agents with Jason Haddix

Jason Haddix

Listen On

Recent Episodes