Security Unlocked

The Microsoft Security Podcast

Security Unlocked explores the technology and people powering Microsoft's Security solutions. In each episode, Microsoft Security evangelists Nic Fillingham and Natalia Godyla take a closer look at the latest innovations

Pearls of Wisdom in the Security Signals Report

Ep. 30
It’s our 30thepisode! And in keeping with the traditional anniversary gift guide, the 30thanniversary means a gift of pearls.Sofrom us to you, dear listener, we’ve got an episode with somepearlsofwisdom!On today’s episode, hostsNic FillinghamandNataliaGodylabringback returning champion,Nazmus Sakib, to take us through the newSecurity Signals Report. Sakib walks us through why the reportwasdoneand then helps us understand the findings and what they mean for security.In This Episode You Will Learn:How pervasive firmware is in our everyday livesWhy many people were vulnerable to firmware attacksHow companies are spending the money they allocate towards digitalprotectionSome Questions We Ask:What was the hypothesis going into the Security Signals Report?How do we protect ourselves from vulnerabilities that don’t exist yet?Wereany of the findings from the report unexpected?ResourcesNazmusSakib’sLinkedIn: Signals Report: Fillingham’sLinkedIn:’sLinkedIn: Security Blog: Unlocked: CISO Series with Bret Arsenault

Securing Hybrid Work: Venki Krishnababu, lululemon

Ep. 29
On this week’s Security Unlocked we’re featuring for the second and finaltime,a special crossover episode of our sister-podcast, Security Unlocked: CISO Series with Bret Arsenault.Lululemon has been on the forefront of athleisure wear since its founding in 1998,but while many of its customers look atitexclusively as a fashionbrand,ata deeper level thisfashion empire is bolstered by a well thought out and maintained digital infrastructure that relies on ahard workingteam to run it.On today’s episode, Microsoft CISO Bret Arsenault sits down with VenkiKrishnababu, SVP of Global Technology Services at Lululemon.Theydiscuss the waysin whichtechnology plays into the brand, how Venkileada seamless transition into the remote work caused by the pandemic, and how he’s using the experiences of the past year to influence future growth in the company.In This Episode You Will Learn:Why Venkifeels sopassionatelyabout leading withempathyWhy Venki saw moving to remote work as only the tip of the iceberg; and how he handled whatlaidbelow.Specific tools and practices that haveleadto Venki’ssuccessSome Questions We Ask:What is the biggest lesson learned during the pandemic?How doesone facilitate effective management during this time?Howdoes Lululemonviewthe future of in-person versus remote work?Resources:VenkiKrishnababu’sLinkedIn: Arsenault’s LinkedIn: Fillingham’sLinkedIn:’sLinkedIn: Security Blog: Unlocked: CISO Series with Bret Arsenault

Contact Us; Phish You!

Ep. 28
Threat actors arepeskyand, once again,they’reup to no good.A newmethodologyhas schemers compromising onlineformswhere userssubmittheir information like their names, email addresses,and, depending on the type of site, some queries relating totheir life.This new methodindicatesthat the attackers have figured out away around the CAPTCHA’s that have been making us all provewe’renot robotsbyidentifyingfire hydrantssince 1997.Andwhat’smore,we’renot quite surehowthey’vedone it.In this episode, hosts NataliaGodylaand Nic Fillingham sit down with Microsoftthreat analyst, Emily Hacker, to discuss what’s going on behind the scenes as Microsoft begins todigintothis new threat and sort through how best to stop it.In This Episode You Will Learn:Why this attack seems to be more effective against specificprofessionals.Why this new method of attack has a high rate ofsuccess.How to better prepare yourself for this method of attackSome Questions We Ask:What is the endgame for these attacks?What are we doing to protect againstIceIDin these attacks?Are we in need of a more advanced replacementforCAPTCHA?Resources:Emily Hacker: a Unique ‘Form’ of Email Delivery forIcedIDMalware Fillingham’sLinkedIn:’sLinkedIn: Security Blog: Unlocked: CISO Series with Bret Arsenaulthttps://SecurityUnlockedCISOSeries.comTranscript[Full transcript can be found at]Nic Fillingham: (00:08)Hello and welcome to Security Unlocked, a new podcast from Microsoft where we unlock insights from the latest in news and research from across Microsoft security, engineering and operations teams. I'm Nick Fillingham.Natalia Godyla: (00:20)And I'm Natalia Godyla. In each episode we'll discuss the latest stories from Microsoft Security, deep dive into the newest threat intel, research, and data science.Nic Fillingham: (00:30)And profile some of the fascinating people working on artificial intelligence in Microsoft Security.Natalia Godyla: (00:36)And now, let's unlock the pod.Nic Fillingham: (00:40)Hello, the internet. Hello, listeners. Welcome to episode 28 of Security Unlocked. Nic and Natalia back with you once again for a, a regular, uh, episode of the podcast. Natalia, how are you?Natalia Godyla: (00:50)Hi, Nic. I'm doing well. I'm stoked to have Emily Hacker, a threat analyst at Microsoft back on the show today.Nic Fillingham: (00:58)Yes, Emily is back on the podcast discussing a blog that she co-authored with Justin Carroll, another return champ here on the podcast, called Investigating a Unique Form of Email Delivery for IcedID Malware, the emphasis is on form was, uh, due to the sort of word play there. That's from April 9th. Natalia, TLDR, here. What's, what's Emily talking about in this blog?Natalia Godyla: (01:19)In this blog she's talking about how attackers are delivering IcedID malware through websites contact submission forms by impersonating artists who claim that the companies use their artwork illegally. It's a new take targeting the person managing the submission form.Nic Fillingham: (01:34)Yeah, it's fascinating. The attackers here don't need to go and, you know, buy or steal email lists. They don't need to spin up, uh, you know, any e- email infrastructure or get access to botnets. They're, they're really just finding websites that have a contact as form. Many do, and they are evading CAPTCHA here, and we talk about that with, with, with, uh, Emily about they're somehow getting around the, the CAPTCHA technology to try and weed out automation. But they are getting around that which sort of an interesting part of the conversation.Nic Fillingham: (02:03)Before we get into that conversation, though, a reminder to Security Unlock listeners that we have a new podcast. We just launched a new podcast in partnership with the CyberWire. It is Security Unlocked: CISO Series with Bret Arsenault. Bret Arsenault is the chief information security officer, the CISO, for Microsoft, and we've partnered with him and his team, uh, as well as the CyberWire, to create a brand new podcast series where Bret gets to chat with security and technology leaders at Microsoft as well as some of his CISO peers across the industry. Fantastic conversations into some of the biggest challenges in cyber security today, some of the strategies that these big, big organizations are, are undertaking, including Microsoft, and some practical guidance that really is gonna mirror the things that are being done by security teams here at Microsoft and are some of Microsoft's biggest customers.Nic Fillingham: (02:52)So, I urge you all to, uh, go check that one out. You can find it at the CyberWire. You can also go to, and that's CISO as in C-I-S-O. CISO or CISO, if you're across the pond,, but for now, on with the pod.Natalia Godyla: (03:12)On with the pod.Nic Fillingham: (03:18)Welcome back to the Security Unlocked Podcast. Emily Hacker, thanks for joining us.Emily Hacker: (03:22)Thank you for having me again.Nic Fillingham: (03:24)Emily, you are, uh, coming back to the podcast. You're a returning champion. Uh, this is, I think your, your second appearance and you're here-Emily Hacker: (03:30)Yes, it is.Nic Fillingham: (03:30)... on behalf of your colleague, uh, Justin Carroll, who has, has also been on multiple times. The two of you collaborated on a blog post from April the 9th, 2021, called Investigating a Unique Form-Emily Hacker: (03:43)(laughs)Nic Fillingham: (03:43)... in, uh, "Form", of email delivery for IcedID malware. The form bit is a pun, is a play on words.Emily Hacker: (03:51)Mm-hmm (affirmative).Nic Fillingham: (03:51)I- is it not?Emily Hacker: (03:53)Oh, it definitely is. Yeah.Nic Fillingham: (03:54)(laughs) I'm glad I picked up on that, which is a, you know, fascinating, uh, campaign that you've uncovered, the two of you uncovered and you wrote about it on the blog post. Before we jump into that, quick recap, please, if you could just reintroduce yourself to the audience. Uh, what, what do you do? What's your day-to-day look like? Who do you work with?Emily Hacker: (04:09)Yeah, definitely. So, I am a threat intelligence analyst, and I'm on the Threat Intelligence Global Engagement and Response team here at Microsoft. And, I am specifically focused on mostly email-based threats, and, as you mentioned on this blog I collaborate with my coworker, Justin Carroll, who is more specifically focused on end-point threats, which is why we collaborated on this particular blog and the particular investigation, because it has both aspects. So, I spend a lot of my time investigating both credential phishing, but also malicious emails that are delivering malware, such as the ones in this case. And also business email, compromise type scam emails.Nic Fillingham: (04:48)Got it. And so readers of the Microsoft Security Blog, listeners of Security Unlocked Podcast will know that on a regular basis, your team, and then other, uh, threat intelligence teams from across Microsoft, will publish their findings of, of new campaigns and new techniques on the blog. And then we, we try and bring those authors onto the podcast to tell us about what they found that's what's happened in this blog. Um, the two of you uncovered a new, a unique way of attackers to deliver the IcedID malware. Can you walk us through this, this campaign and this technique that you, you both uncovered?Emily Hacker: (05:21)Yeah, definitely. So this one was really fun because as I mentioned, it evolved both email and endpoint. So this one was, as you mentioned, it was delivering IcedID. So we initially found the IcedID on the endpoint and looking at how this was getting onto various endpoints. We identified that it was coming from Outlook, which means it's coming from email. So we can't see too much in terms of the email itself from the endpoint, we can just see that it came from Outlook, but given the network connections that the affected machines were making directly after accessing Outlook, I was able to find the emails in our system that contains emails that have been submitted by user 'cause either reported to junk or reported as phish or reported as a false positive, if they think it's not a phish. And so that's where I was actually able to see the email itself and determined that there was some nefarious activity going on here.Emily Hacker: (06:20)So the emails in this case were really interesting in that they're not actually the attacker sending an email to a victim, which is what we normally see. So normally the attacker will either, you know, compromise a bunch of senders and send out emails that way, which is what we've seen a lot in a lot of other malware or they'll create their own attacker infrastructure and send emails directly that way. In this case, the attackers were abusing the contact forms on the websites. So if you are visiting a company's website and you're trying to contact them a lot of times, they're not going to just have a page where they offer up their emails or their phone numbers. And you have to fill in that form, which feels like it goes into the void sometimes. And you don't actually know who it went to in this case, the, the attackers were abusing hundreds of these contact forms, not just targeting any specific company.Emily Hacker: (07:08)And another thing that was unique about this is that for some of the affected companies that we had observed, I went and looked at their websites and their contact form does require a CAPTCHA. So it does appear that the attackers in this case have automated the filling out of these contact forms. And that they've automated a way around these CAPTCHAs, just given the, the sheer volume of these emails I'm seeing. This is a good way of doing this because for the attacker, this is a much more high fidelity method of contacting these companies because they don't have to worry about having an incorrect email address if they have gotten a list off of like Pastebin or a list, you know, they purchased a list perhaps from another criminal. Emily Hacker: (07:52)A lot of times in those cases, if they're emailing directly, there's gonna be some, some false emails in those lists that just don't get delivered. With the contact form, they're designed to be delivered. So it's gonna give the attacker a higher chance of success in terms of being delivered to a real inbox.Natalia Godyla: (08:11)And so when we, we talk about the progression of the attack, they're automating this process of submitting to these contact forms. What are they submitting in the form? What is the, and what is the end goal? So there's malware somewhere in their-Emily Hacker: (08:27)Mh-mm-hmm (affirmative).Natalia Godyla: (08:27)... response. What next?Emily Hacker: (08:29)Yeah. It's a really good question. So the emails or rather the contact form submissions themselves, they're all containing a, a lore. So the contents themselves are lore that the attacker is pretending to be a, um, artist, a photographer, and illustrator, something along those lines. There's a handful of different jobs that they're pretending to be. And they are claiming that the company that they are contacting has used an image that belongs to the artist, illustrator, photographer on their website without permission. And so the attacker is saying, "You used my art without permission. I'm going to sue you if you don't take this down, if you wanna know what aren't talking about, click on this link and it'll show you the exact art that I'm talking about or the exact photo." What have you, all of the emails were virtually identical in terms of the content and the lore.Emily Hacker: (09:21)The attacker was using a bunch of different fake emails. So when you fill out a contact form, you have to put your email so the, the company can contact you, I guess, in reply, if they need to. And the attackers, almost every single email that I looked at had a different fake attacker email, but they did all follow a really consistent pattern in terms of the, the name, Mel and variations on that name. So they had like Melanie, I saw like Molina, like I said, there was hundreds of them. So the email would be Mel and then something relating to photography or illustration or art, just to add a little bit more credence, I think to their, to their lore. It made it look like the email address was actually associated with a real photographer. The, the attacker had no need to actually register or create any of those emails because they weren't sending from those emails. They were sending from the contact form. So it made it a lot easier for the attacker to appear legitimate without having to go through the trouble of creating legitimate emails. Emily Hacker: (10:16)And then the, um, the email itself from the recipients view would appear other than the fact that it felt fishy, at least to me, but, you know, I literally do this for a living. So maybe just everything feels fishy to me. Other than that, the email itself is going to appear totally legitimate because since it's coming through the contact form, it's not going to be from an email address. They don't recognize because a lot of times these contact forms are set up in a way where it'll send from the recipient's domain. So for example, a contact form, I don't know if this is how this works, but just as an example at Microsoft might actually send from or the other large percentage of these that I saw were sent from the contact form hosting provider. So there are a lot of providers that host is kind of content for companies. And so the emails would be coming from those known email addresses and the emails themselves are gonna contain all of the expected fields, all in all. It's basically a legitimate email other than the fact that it's malicious.Nic Fillingham: (11:17)And, and just reading through the sample email that you, that you have in the blog post here, like sort of grammatically speaking it's, it reads very legitimately like, the-Emily Hacker: (11:26)Mh-mm-hmm (affirmative).Nic Fillingham: (11:27)... you know, the s- the, the grammar and the spelling is, it's colloquial, but it's, but it seems, you know, pretty legitimate. The idea of a photographer, a freelance photographer, stumbling upon their images being used without permission. You know, you hear stories of that happening. That seems to be somewhat plausible, not knowing how to contact the, the infringing organization. And then therefore going to the generic contact us form like this all, this all seems quite plausible. Emily Hacker: (11:52)And, definitely. And it's als one of those situations where even though, like I said, I do this for a living, so I read this and I was like, there's no way that's legit. But if my job was to be responsible for that email inbox, where stuff like this came in, it would be hard for me to weigh the consequences of like, is it more likely that this is like a malicious email? Or is it yeah. Is it possible that this is legit? And if I ignore it, my company is gonna get sued. Like, I feel like that kind of would give the recipient that, that weird spot of being like, "I don't want to infect the company with malware, or, you know, I don't wanna click on a phishing link if that's what this is, but also if I don't and then we get sued, is it my fault?"Emily Hacker: (12:33)I just, I, I feel for the recipient. So I, I understand why people would be clicking on this one and infecting themselves. And speaking of clicking on that is the other thing that's included in this email. So that was the last bit of this email that turns us from just being weird/legitimate, to totally malicious. All of the emails contain a link. And, um, the links themselves are also abusing legitimate infrastructure. So that's, uh, the next bit of abused, legitimate infrastructure that just adds that next bit of like believability if that's a word to this campaign.Nic Fillingham: (13:05)It is a word.Emily Hacker: (13:06)Okay, good believability. Is that the, the links, you know, we're, if you don't work insecurity, and even if you do work in security, we're all kind of trained like, "Oh, check the links, hover over the links and make sure it's going somewhere that you expect and make sure it's not going to like bad site dot bad, dot bad or something," you know, but these don't do that. All of the emails contained a link. And I've looked at literally hundreds of these, and they all contain, um, a different URL, but the same domain. If you click on the link, when you receive the email, it'll take you actually to a legitimate Google authentication page that'll ask you to log in with your Google credentials, which again, every step along the way of this, of the email portion of this, of this attack, the attacker just took extra steps to make it seem as real as possible, or to almost like every piece of security advice. Emily Hacker: (14:01)I feel like they did that thing. So it seemed more legitimate because it's not a phishing page. It's not like a fake Google page that's stealing your credentials. It's a real where you would log in with your real Google credentials. Another thing that this does outside of just adding an air of legitimacy to the emails, it also can make it difficult for some security automation products. So a product that would be looking at emails and detonating the link to see if they're malicious and this case, it would detonate the link and it would land on, you know, a real Google authentication page. And in some cases it may not be able to authenticate. And then it would just mark these as good, because it would see what it expected to see. So, outside of just seeming legit, it also makes, you know, security products make this think it's more legit as well. But from there, the, uh, user would be redirected through a series of attacker own domains and would eventually download a zip file, which if they unzipped, they would find the IcedID payload.Emily Hacker: (15:06)So in this case, it's delivering IcedID, although this technique could be used to deliver other stuff as well, but it's not necessarily surprising that it's delivering IcedID right now, because pretty much everything I feel like I'm seeing lately as I study. And I don't think I'm alone in that there's murmurings that IcedID might be replacing Emotets now that you Emotet has been taken down in terms of being, you know, the annoyingly present malware. (laughs) So this is just one of many delivery methods that we've seen for IcedID malware lately. It's certainly in my opinion, one of the more interesting ones, because in the past, we've seen IcedID delivered a lot via email, but, um, just delivered via, you know, the normal type of malicious email if you will, with a compromised email sending with a, a zip attachment, this is much more interesting.Emily Hacker: (15:56)But in this case, if the user downloaded the payload, the payload would actually do many things. So in this case, it was looking for machine information. It was looking to see what kind of security tools were in place to see what kind of antivirus the machine was running. It was getting IP and system information. It was getting, you know, domain information and also looking to access credentials that might be stored in your browser. And on top of that, it was also dropping Cobalt Strike, which is another fun tool that we see used in every single incident lately. It feels like, um, which means that this can give attacker full control of a compromised device.Natalia Godyla: (16:38)So, what are we doing to help protect customers against IcedID? In the blog you stated that we are partnering with a couple of organizations, as well as working with Google.Emily Hacker: (16:52)Yes. So we have notified Google of this activity because it is obviously abusing some of their infrastructure in terms of the sites at And they seem to be doing a pretty good job in terms of finding these and taking them down pretty quickly. A lot of times that I'll see new emails come in, I'll go to, you know, click on the link and see what it's doing. And the site will already be taken down, which is good. However, the thing about security is that a lot of times we were playing Catch Up or like, Whack-A-Mole, where they're always just gonna be a step ahead of us because we can't pre block everything that they're going to do. So this is still, um, something that we're also trying to keep an eye on from, from the delivery side as well. Emily Hacker: (17:34)Um, one thing to note is that since these are coming from legitimate emails that are expected is that I have seen a fair bit like, uh, a few of these, uh, actually, um, where the, the customers have their environment configured in a way where even if we mark it as phish, it still ends up delivered. So they have a, what is like a mail flow rule that might be like allow anything from our contact form, which makes sense, because they wouldn't wanna be blocking legitimate requests from co- from customers in their contact form. So with that in mind, we also wanna be looking at this from the endpoint. And so we have also written a few rules to identify the behaviors associated with the particular IcedID campaign. Emily Hacker: (18:16)And it will notify users if the, the behaviors are seen on their machine, just in case, you know, they have a mail flow rule that has allowed the email through, or just in case the attackers change their tactics in the email, and it didn't hit on our rule anymore or something, and a couple slipped through. Then we would still identify this on the endpoint and not to mention those behaviors that the rules are hitting on are before the actual IcedID payload is delivered. So if everything went wrong in the email got delivered and Google hadn't taken the site down yet, and the behavioral rule missed, then the payload itself is detected as I study by our antivirus. So there's a lot in the way of protections going in place for this campaign.Nic Fillingham: (18:55)Emily, I, I wanna be sort of pretty clear here with, with folks listening to the podcast. So, you know, you've, you've mentioned the, the a, a couple of times, and really, you're not, you're not saying that Google has been compromised or the infrastructure is compromised simply that these attackers have, uh, have come up with a, a, you know, pretty potentially clever way of evading some of the detections that Google, uh, undoubtedly runs to abuse their, their hosting services, but they could just evasively has been targeting OneDrive or-Emily Hacker: (19:25)Mh-mm-hmm (affirmative).Nic Fillingham: (19:25)... some other cloud storage.Emily Hacker: (19:25)That's correct. And we do see, you know, attackers abusing our own infrastructure. We've seen them abusing OneDrive, we've seen them abusing SharePoint. And at Microsoft, we have teams, including my team devoted to finding when that's occurring and remediating it. And I'm sure that Google does too. And like I said, they're doing a pretty done a good job of it. By the time I get to a lot of these sites, they're already down. But as I mentioned, security is, is a game of Whack-A-Mole. And so for, from Google point of view, I don't envy the position they're in because I've seen, like I mentioned hundreds upon hundreds of these emails and each one is a using a unique link. So they can't just outright block this from occurring because the attacker will just go and create another one.Natalia Godyla: (20:05)So I have a question that's related to our earlier discussion. You, you mentioned that they're evading the CAPTCHA. I thought that the CAPTCHA was one of the mechanisms in place to reduce spam. Emily Hacker: (20:19)Mh-mm-hmm (affirmative).Natalia Godyla: (20:19)So how is it doing that? Does this also indicate that we're coming to a point where we need to have to evolve the mechanisms on the forms to be a little bit more sophisticated than CAPTCHA?Emily Hacker: (20:33)I'm not entirely sure how the attackers are doing this because I don't know what automation they're using. So I can't see from their end, how they're evading the CAPTCHA. I can just see that some of the websites that I know that they have abused have a CAPTCHA in place. I'm not entirely sure.Nic Fillingham: (20:52)Emily is that possible do you think that one of the reasons why CAPTCHA is being invaded. And we talked earlier about how the, sort of the grammar of these mails is actually quite sophisticated. Is it possible? This is, this is a hands on keyboard manual attack? That there's actually not a lot of automation or maybe any automation. And so this is actually humans or a human going through, and they're evading CAPTCHA because they're actually humans and not an automated script?Emily Hacker: (21:17)There was another blog that was released about a similar campaign that was using the abusing of the contact forms and actually using a very similar lore with the illustrators and the, the legal Gotcha type thing and using That was actually, it was very well written and it was released by Cisco Talos at the end of last year, um, at the end of 2020. So I focused a lot on the email side of this and what the emails themselves looked like and how we could stop these emails from happening. And then also what was happening upon clicks over that, like I said, we could see what was happening on the endpoint and get these to stop. Emily Hacker: (21:55)This blog actually focused a lot more on the technical aspect of what was being delivered, but also how it was being delivered. And one thing that they noted here was that they were able to see that the submissions were performed in an automated mechanism. So Cisco Talos was able to see that these are indeed automated. I suspected that they were automated based on the sheer volume, but I Talos is very good. They're very good intelligence organization. And I felt confident upon reading their blog that this was indeed automated, how it's being captured though, I still don't know.Natalia Godyla: (22:35)What's next for your research on IcedID? Does this round out your team's efforts in understanding this particular threat, or are, are you now continuing to review the emails, understand more of the attack?Emily Hacker: (22:52)So this is certainly not the end for IcedID. Through their Microsoft Security Intelligence, Twitter account. I put out my team and I put out a tweet just a couple of weeks ago, about four different IcedID campaigns that we were seeing all at the same time. I do believe this was one of them. They don't even seem related. There was one that was emails that contained, um, zip files. There was one that contained emails that contained password protected zip files that was targeting specifically Italian companies. There was this one, and then there was one that was, um, pretending to be Zoom actually. And that was even a couple of weeks ago. So there's gonna be more since then. So it's something that, like I mentioned briefly earlier, IcedID almost feels to be kind of, it feels a little bit like people are calling it like a, the next wave of replacement after Emotech are taken down. Emily Hacker: (23:43)And I don't know necessarily that that's true. I don't know that this will be the new Emotech so to speak, Emotech was Emotech And IcedID is IcedID but it does certainly feel like I've been seeing it a lot more lately. A lot of different attackers seem to be using it and therefore it's being delivered in different ways. So I think that it's gonna be one that my team is tracking for awhile, just by nature of different attackers using it, different delivery mechanisms. And it'll be, it'll be fun to see where this goes.Nic Fillingham: (24:13)What is it about this campaign or about this particular technique that makes it your Moby Dick-Emily Hacker: (24:17)(laughs) Nic Fillingham: (24:17)... if I may use the analogy.Emily Hacker: (24:20)I don't know. I've been thinking about that. And I think it has to do with the fact that it is so, like, it just feels like a low blow. I don't know. I think that's literally it like they're abusing the company's infrastructure. They're sending it to like people whose job is to make sure that their companies are okay. They're sending a fake legal threat. They're using legit Google sites. They're using a legit Google authentication, and then they're downloading IcedID. Like, can you at least have the decency, descend to crappy like unprotected zip attachment so that-Nic Fillingham: (24:49)(laughs)Emily Hacker: (24:49)... we at least know you're malicious, like, come on. It's just for some reason it, I don't know if it's just 'cause it's different or if it's because I'm thinking back to like my day before security. And I, if I saw this email as this one that I would fall for, like maybe. And so I think that there's just something about that and about the, the fact that it's making it harder to, to fully scope and to really block, because we don't want to block legitimate contact emails from being delivered to these companies. And obviously they don't want that either. So I think that's it.Nic Fillingham: (25:22)What is your guidance to customers? You know, I'm a security person working at my company and I wanna go run this query. If I run this, I feel like I'm gonna get a ton of results. What do I do from there?Emily Hacker: (25:33)That's a good question. So this is an advanced hunting query, which can be used in the Microsoft Security portal. And it's written in advanced hunting query language. So if a customer has access to that portal, they can just copy and paste and search, but you're right. It is written fairly generically to a point where if you don't have, you know, advanced hunting, you can still read this and search and whatever methodology, whatever, you know, searching capabilities you do have, you would just have to probably rewrite it. But what this one is doing the top one, 'cause I, I have two of them written here. The first one is looking specifically at the email itself. So that rejects that's written there is the, um, Hacker: (26:16)All of the emails that we have seen associated with this have matched on that rejects. There was this morning, like I said, I was talking to a different team that was also looking into this and I'm trying to identify if she found, um, a third pattern, if she did, I will update the, um, AHQ and we have, we can post AHQ publicly on the Microsoft advanced hunting query, get hub repo, which means that customers can find them if we, if we change them later and I'll be doing that if that's the case, but point being this rejects, basically it takes the very long, full URL of this and matches on the parts that are fairly specific to this email.Emily Hacker: (27:02)So they all contain, you know, some of them contain ID, some of them don't, but they all contain that like nine characters, they all contain view. It's just certain parts of the URL that we're seeing consistently. And that's definitely not by itself going to bubble up just the right emails, which is why have it joined on the email events there. And from there, the, I have instructed the users to replace the following query with the subject line generated by their own contacts, their own websites contact submission form. What I have in there are just a few sample subject lines. So if your website contact form generates the subject line of contact us or new submission or contact form, then those will work. But if the website con-, you know, contact form, I've seen a bunch of different subject lines. Then what this does is that it'll join the two. So that it's only gonna bubble up emails that have that with that specific pattern and a subject line relating to the contact form. Emily Hacker: (28:02)And given the searching that I've done, that should really narrow it down. I don't think there's going to be a ton in the way of other contact emails that are using that are showing up for these people. I wouldn't be surprised if this did return one email and it turned out to be a malicious email related to this campaign. But if the contact form generates its own subject line per what the user inputs on the website, then, you know, the screenshots that are in the blog may help with that, but it might be more difficult to find in that case. There's a second advanced hunting query there, which we'll find on the endpoint.Natalia Godyla: (28:37)And I know we're just about at time here, but one quick question on endpoint security. So if a customer is using Microsoft Defender for endpoint, will it identify and stop IcedID?Emily Hacker: (28:49)Yes, it will. The IcedID payload in this case, we're seeing Defender detecting it and blocking it. And that was what, one of the things I was talking about earlier is that Defender is actually doing such a good job. That it's a little bit difficult for me to see what's, uh, gonna happen next because I'm limited to, um, seeing kind of what is happening on customer boxes. And so, because our products are doing such a good job of blocking this, it means that I don't have a great view of what the attacker was going to do next because they can't, 'cause we're blocking it. So it's of mostly a win, but it's stopping me from seeing if they are planning on doing, you know, ransomware or whatever, but I'd rather not know if it means that our customers are protected from this.Nic Fillingham: (29:32)Well, Emily Hacker, thank you so much for your time. Thanks to you and Justin for, for working on this. Um, we'd love to have you back again on Security Unlocked to learn more about some of the great work you're doing.Emily Hacker: (29:41)Definitely, thank you so much for having me.Natalia Godyla: (29:47)Well, we had a great time unlocking insights into security, from research to artificial intelligence. Keep an eye out for our next episode.Nic Fillingham: (29:54)And don't forget to tweet us @msftsecurity or email us at, with topics you'd like to hear on a future episode. Until then, stay safe.Natalia Godyla: (30:05)Stay secure.

Securing the Cloud with Mark Russinovich

Ep. 27
On this week’s Security Unlocked, we’re pullinga baitand switch! Instead of our regularly scheduled programming, we’re going to be featuring the first episode of our newpodcast,Security Unlocked: CISO Serieswith Bret Arsenault. Each episode is going to feature Microsoft’sCISO Bret Arsenault sitting down withother top techies in Microsoft and other companies in the industry.In its inaugural episode – which we’re featuring onthisepisode – Bret sits down with MarkRussinovich,Chief Technology Officer of Microsoft’s Azure.Mark has a unique perspective on cloud technologies and offers insight into the changes that have occurred over the past few years due toadvancing technologyandthe unique challenges brought aboutduringthe coronavirus pandemic.Enjoy this first episode of the new series and remember to subscribe so you catch all the rest that are yet to come.In This Episode You Will Learn:Theinitialism FFUUEE and why it’s important in understanding people’s resistance to adopting newer securitycapabilitiesMarkRussinovich’sthree points of advice for those looking to become moresecureTheories on improving MFA adoption across theboardSome Questions We Ask:How do we think of cloud security now versus ten years ago?What does a leading engineer think of moving toward a hybrid workforce?How do you find and screen potential newteam membersin a remote world?ResourcesCISO Series with Bret Arsenault: Arsenault’s LinkedIn: Russinovich’s LinkedIn: Fillingham’sLinkedIn:’sLinkedIn: Unlocked: CISO Series with Bret Arsenault

Ready or Not, Here A.I. Come!

Ep. 26
Remember the goodoledays when wespent youthfulhours playing hide and seek with our friends in the park?Wellit turns out that game of hide and seek isn’t just for humans anymore.Researchers have begunputting A.I. to the test by having it play this favorite childhood gameover and overandhavingthe softwareoptimize its strategiesthrough automated reinforcement training.In today’s episode,hosts Nic Fillingham and Natalia Godyla speak with Christian Seifert and Joshua Neil about their blog postGamifying machine learning for stronger security and AI models,and how Microsoft is releasing this new open-sourcedcode to help it learn and grow.In This Episode, You Will Learn:What is Microsoft’sCyberBattleSim?What reinforcement learning is and how it is used in training A.I.How theOpenAIGym allowed for AI to be trained and rewarded for learningSome Questions We Ask:Is an A.I. threat actor science fiction or an incoming reality?What are the next steps in training the A.I.?WhowastheCyberBattleSimcreated for?Resources:OpenAIHide and Seek:OpenAIPlays Hide and Seek…and BreaksTheGame! 🤖Joshua and Christian’sblog post:Gamifying Machine Learning for Stronger Security and AI ModelsChristian Seifert’sLinkedIn: Neil’sLinkedIn:’sLinkedIn:’sLinkedIn: Security Blog: Unlocked: CISO Series with Bret Arsenaulthttps://SecurityUnlockedCISOSeries.comTranscript[Full transcript at]Nic Filingham:Hello and welcome to Security Unlocked! A new podcast from Microsoft where we unlock insights from the latest in news and research from across Microsoft Security Engineering and Operations Teams. I'm Nic Filingham.Natalia Godyla:And I'm Natalia Godyla. In each episode, we'll discuss the latest stories from Microsoft Security. Deep dive into the newest threat intel, research, and data science.Nic Filingham:And profile some of the fascinating people working on artificial intelligence in Microsoft Security.Natalia Godyla:And now, let's unlock the pod.Nic Filingham:Hello, Natalia! Hello, listeners! Welcome to episode 26 of Security Unlocked. Natalia, how are you?Natalia Godyla:Thank you, Nic. And welcome to all our listeners for another episode of Security Unlocked. Today, we are chatting about gamifying machine learning, super cool, and we are joined by Christian Seifert and Joshua Neil who will share their research on building CyberBattleSim, which investigates how autonomous agents operate in a simulated enterprise environment by using high-level obstruction of computer networks and cyber-security concepts. I sounded very legit, but I did just read that directly from the blog. Nic Filingham:I was very impressed.Natalia Godyla:(laughs)Nic Filingham:If you had not said that you read that from the blog, I would've been like, "Wow". I would to like to subscribe to a newsletter. Natalia Godyla:(laughs)Nic Filingham:But this is a great conversation with, with Christian and Joshua. We talked about what is reinforcement learning. Sort of as a concept and how does that gonna apply to security. Josh and Christian also walked us through sort of why this project was created and it's really to try and get ahead of a future where, you know, malicious actors have access to some level of automated, autonomous tooling. Uh, and so, this is a new project to sort of see what a future might look like when there all these autonomous agents out there doing bad stuff in the cyber world.Natalia Godyla:And there are predecessors to this work, at least in other domains. So, they used a toolkit, a Python-based Open AI Gym interface to build this research project but there have been other applications in the past. OpenAI is, uh, well-known for a hide-and-seek. There is a video on YouTube that shows how the AI learned over time different ways to obstruct the agent and the simulated environment. Things like, blocking them off using some pieces of the wall or jumping over the wall.Nic Filingham:The only thing we should point out is that this CyberBattleSim is an open source project. It's up on GitHub and attained very much want researchers, and really anyone who's interested in this space to go and download it, go and run it, play around with it, and help make it better. And if you have feedback, let us know. There is contact information, uh, through the GitHub page but you can also contact us at Security Unlocked at Microsoft dot com and we can make sure you, uh, get in contact with the team. And with that, on with the pod?Natalia Godyla:On with the pod!Nic Filingham:Welcome to Security Unlocked, new guest, Christian Seifert. Thanks for joining us and welcome returning guest, Josh Neil, back to the podcast. Both of you, welcome. Thanks for being on Security Unlocked.Christian Seifert:Thanks for having us!Joshua Neil:And thanks, Nic.Nic Filingham:Christian, I think as a, as a new guest on the podcast, could we get a little introduction for our listeners? Tell us about, uh, what you do at Microsoft. Tell us about what a day to day look like for you.Christian Seifert:Sure, so I'm a, uh, research lead on the Security and Compliance team. So our overall research team supports a broad range of enterprising consumer products and services in the security space. My team in particular is focused on protecting users from a social engineering attack. So, uh, think of, like, fishing mails for instance. So we're supporting Microsoft Defender for Office and, um, Microsoft Edge browser.Nic Filingham:Got it, and Josh, folks are obviously familiar with you from previous episodes but a, a quick re-intro would be great. Joshua Neil:Thanks. I currently lead the Data Science team supporting Microsoft threat experts, which is our managed hunting service, as well as helping general res... cyber security research for the team.Nic Filingham:Fantastic, uh, again, thank you both for your time. So, today in the podcast, we're gonna talk about a blog post that came out earlier in this month, on April 8, called Gamifying Machine Learning for Stronger Security in AI Models, where you talk about a new project that has sort of just gone live called CyberBattleSim. First off, congratulations on maybe the coolest name? For, uh, sort of a security research project? So, like, I think, you know, just hats off there. I don't who came up with the name but, but great job on that. Second of all, you know, Christian if, if I could start with you, could you give us a sort of an introduction or an overview what is CyberBattleSim and what is discussed in this blog post?Christian Seifert:As I... before talking about the, the simulator, uh, the... let me, let me kind of take a step back and first talk about what we tried to accomplish here and, and why. So, if you think about the security space and, and machine learning in particular, a large portion of machine-learning systems utilized supervised, uh, classifiers. And here, essentially, what we have is, is kinda a labeled data set. So, uh, for example, a set of mails that we label as fish and good. And then, we extract, uh, threat-relevant features. Think of, like, maybe particular words in the body, or header values we believe that are well-suited to differentiate bad mails from good mails. And then our classifiers able to generalize and able to classify new mails that come in. Christian Seifert:There's a few, uh, aspects to consider here. So, first of all, the classifier generalizes based on the data that we present to it. So, it's not able to identify completely unknown mails. Christian Seifert:Second, is that usually a supervised classification approach is, is biased because we are programming, essentially, the, the classifier and what it, uh, should do. And we're utilizing domain expertise, red teaming to kind of figure out what our threat-relevant features, and so there's bias in that. Christian Seifert:And third, a classifier of who has needs to have the data in order to make an appropriate classification. So, if I have classifier that classifies fish mail based on the, the content of the mail but there is the threat-relevant features are in the header, then that classifier needs to have those values as well in order to make that classification. And so, my point is these classifiers are not well-suited to uncover the unknown unknowns. Anything that it has not seen, kinda new type of attack, it is really blind to it. It generalizes over data that, that we present to it. Christian Seifert:And so, what we try to do is to build a system that is able to uncover unknown attacks with the ultimate goal then to, of course, develop autonomous defensive component to defend against those attacks. So, that gives it a little bit of context on why we're pursuing this effort. And this was inspired by reinforcement learning research and the broader research community, mostly that is currently applied kinda in the gaming context. Christian Seifert:So OpenAI actually came out with a neat video a couple of years ago called Hide and Seek. Uh, that video is available on YouTube. I certainly encourage listeners to check it out, but basically it was a game of laser tag where you had a kinda, uh, a red team and a blue team, uh, play the game of laser tag against each other. And at first they, of course, randomly kind of shoot in the air and run around and there is really no order to the chaos. But eventually, that system learned that, “Hey, if a red team member shoots a blue team member, there's a reward.” and the blue team member also learned while running away from the red team member is, is probably a good thing to do. Christian Seifert:And so, OpenAI kinda, uh, established the system and had the blue team and the red team play against each other, and eventually what that led to is really neat strategies that you and I probably wouldn't have come up with. 'Cause what the AI system does, it explores the entire possible actions base and as result comes up with some unexpected strategies. So for instance, uh, there was a blue team member that kinda hid in a room and then a red team guy figured, “Hey, if I jump on a block then I can surf in that environment and get into the room and shoot the blue team member”. So that was a little bit an inspiration because we wanted to also uncover these unknown Christian Seifert:Unknownst in the security context.Nic Filingham:Got it. That's great context. Thank you Christian. I think I have seen that video, is that the one where one of the many unexpected outcomes was the, like, one of the, the, blue or red team players, like, managed to sort of, like, pick up walls and used them as shields and then create ramps to get into, like, hidden parts of the map? Uh, uh, am I thinking about the right video? Christian Seifert:Yes, that's the right video. Nic Filingham:Got it. So the whole idea was that that was an experiment in, in understanding how finding the unknown unknowns, using this game, sort of, this lazar tag, sort of, gaming space. Is, is that accurate?Christian Seifert:That's right, and so, they utilized reinforcement learning in order to train those agent. Another example is, uh, DeepMind's AlphaGo Zero, playing the game of Go, and, and here, again, kind of, two players, two AI systems that play against each other, and, over time, really develop new strategies on how to play the game of Go that, you know, humans players have, have not come up with. Christian Seifert:And it, eventually, lead to a system that achieved superhuman performance and able to beat the champion, Lisa Dole, and I think that was back in 2017. So, really inspiring work, both by OpenAI and DeepMind.Nic Filingham:Got it. I wonder, Josh, is there anything you'd like to- before we, sort of, jump into the content of the blog and, and CyberBattleSim, is there anything you'd like to add from your perspective to, to the context that Christian set us up on? Joshua Neil:Yeah. Thanks, Nic. I, I mean, I think we were really excited about this because... I think we all think this is a natural evolution of, of our adversaries, so, so, currently, our adversaries, the more sophisticated ones, are primarily using humans to attack our enterprises and, that means they're slow and they can make mistakes and they don't learn from the large amount of data that's there in terms of how to do attacks better, because they're humans.Joshua Neil:But I think it's natural, and we just see this, uh, everywhere and, all of technology is that people are bringing in, you know, methods to learn from the data and make decisions automatically, and it's- so it's a natural evolution to say that attackers will be writing code to create autonomous attack capabilities that learn while they're in the enterprise, that piece of software that's launched against the enterprise as an attack, will observe its environment and make decisions on the fly, automatically, from code. Joshua Neil:As a result, that's a frightening proposition because, I think the speed at which these attacks will proceed will be a lot, you know, a lot more quick, but also, being able to use the data to learn effective techniques that get around defenses, you know, we just see data science and machine learning and artificial intelligence doing this all over the place and it's very effective that the ability to consume a large amount of data and make decisions on it, that's what machine learning is all about. And so, we at Microsoft are interested in exploring this ourselves because we feel like the threat is coming and, well, let's get ahead of it, right? Let's go experiment with automated learning methods for attacks and, and obviously, in the end, for defense that, by implementing attack methods that learn, we then can implement defensive methods that will, that will preempt what the real adversaries are doing, eventually, against our customers.Joshua Neil:So, I think that's, sort of, a philosophical thing. And then, uh, I love the OpenAI Hide-and-Seek example because, you know, the analogy is; Imagine that instead of, they're in a room with, um, walls and, and stuff, they're on a computer network, and the computer network has machines, it has applications, it has email accounts, it has users, it's got a cloud applications, but, in the end, you know, an attacker is moving through an environment, getting blocked in various ways by defenses, learning about those blockings and detections and things and finding gaps that they can move through in, in very similar ways. So, I just, sort of, drawing that analogy back, Hide-and-Seek, it is what we're trying to do in cyber defense, you know, is, is Hide-and-Seek. And so the, I think the analogy is very strong.Nic Filingham:Josh, I just wanna quickly clarify on something that, that you said there. So, it sounds like what you're saying is that, while, sort of, automated AI-based attacking, attackers or attacking agents maybe aren't quite prevalent yet, they're, they're coming, and so, a big part of this work is about prepping for that and getting ahead of it. Is, is, is that correct?Joshua Neil:That's correct. I, I'm not aware of sophisticated attack machinery that's being launched against our enter- our customers yet. I haven't seen it, maybe others have. I think it's a natural thing, it's coming, and we better be ready.Christian Seifert:I mean, we , we see some of it already, uh, in terms of adversarial machine learning, where, uh, our machine learning systems are getting attacked, where, maybe the input is manipulated in a way that leads to a misclassification. Most of that is, is currently more, being explored in the research community.Natalia Godyla:How did you apply reinforcement learning? How did you build BattleSim? In the blog you described mapping, some of the core concepts of reinforcement learning to CyberBattleSIm, such as the environment, that action space, the observation space and the reward. Can you talk us through how you translated that to security?Christian Seifert:Yeah. So, so first let, let me talk about reinforcement learning to make sure, uh, listeners understand, kinda, how that works. So, as I mentioned, uh, earlier in the supervised case, we feed a label data set to a learner, uh, and then it able to generalize, and we reinforcement learning works very differently where, you have an agent that sits within an environment, and the agent is, essentially, able to generate the data itself by exploring that environment.Christian Seifert:So, think of an agent in a computer network, that agent could, first of all, scan the network to, maybe, uncover notes and then they're, maybe, uh, actions around interacting with the notes that it uncovers. And based on those interactions, the agent will, uh, receive a reward. That reward actually may be delayed by, like, there could be many, many steps that the agent has to take before the reward, uh, manifests itself. And so, that's, kinda, how the agent learns, it's, e- able to interact in that environment and then able to receive a reward. And so that's, kinda, what, uh, stands, uh, within the core of the, the CyberBAttleSim, because William Bloom, who is the, the brains behind the simulation, has created an environment that is compatible with, uh, common, uh, reinforcement learning tool sets, namely, the OpenAI Gym, that allows you to train agents in that environment.Christian Seifert:And so, the CyberBattleSim represents a simple computer network. So, think of a set of computer nodes, uh, the, the nodes represent a computer, um... Windows, Mac OS, sequel server, and then every node exposes a set of vulnerabilities that the agent could potentially exploit. And so, then, as, kind of, the agent is dropped into that environment, the agent needs to, first, uncover those nodes, so there's a set of actions that allows to explore the state space. Overall, the environment has a, a limited observability, as the agent gets dropped into the environment, you're not necessarily, uh, giving that agent the entire network topology, uh, the agent first needs to uncover that by exploring the network, exploiting nodes, from those nodes, further explore the network and, essentially, laterally move across the network to achieve a goal that we give it to receive that final reward, that allows the agent to learn.Natalia Godyla:And, if I understand correctly, many of the variables were predetermined, such as, the network topology and the vulnerabilities, and, in addition, you tested different environments with different set variables, so how did you determine the different environments that you would test and, within that particular environment, what factors were predetermined, and what those predetermined factors would be.Christian Seifert:So we, we determined that based on the domaine expertise that exists Christian Seifert:... is within the team, so we have, uh, security researchers that are on a Red Team that kind of do that on a day-to-day basis to penetration tests environments. And so, those folks provided input on how to structure that environment, what nodes should be represented, what vulnerabilities should be exposed, what actions the agent is able to take in- in terms of interacting and exploring that, uh, network. So our Red Team experts provided that information. Nic Filingham:I wonder, Christian, if you could confirm for me. So there are elements here in CyberBattleSim that are fixed and predetermined. What elements are not? And so, I guess my question here is if I am someone interfacing with the CyberBattleSim, what changes every time? How would you sorta define the game component in terms of what am I gonna have to try and do differently every time? Christian Seifert:So the- the CyberBattleSim is this parametrized, where you can start it up in a way that the network essentially stays constant over time. So you're able to train an agent. And so, the network size is- is something that is dynamic, that you can, uh, specify upon startup. And then also kinda the node composition, as well as ... So whether ... how many Windows 10 machines you have versus [inaudible 00:19:15] servers, as well as the type of vulnerabilities that are associated with each of those nodes. Nic Filingham:Got it. So every time you- you establish the simulation, it creates those parameters and sort of locks them for the duration of the simulation. But you don't know ... The agent doesn't know in advance what they will d- they will be. The agent has to go through those processes of discovery and reinforcement learning. Christian Seifert:Absolutely. And- and one- one tricky part within reinforcement learning is- is generalizability, right? When you train an agent on Network A, it may be able to learn how to outperform a Red Team member. But if you then change the network topology, the agent may completely flail and not able to perform very well at all and needs to kind of re- retrain again. And that- that's a common problem within the- the re- reinforcement learning research community. Natalia Godyla:In the blog you also noted a few opportunities for improvement, such as building a more realistic model of the simulation. The simplistic model served its purpose, but as you're opening the project to the broader community, it seems l- that you're endeavoring to partner with the other researchers to create a more realistic environment. Have you given some early thought as to how to potentially make the simulation more real over time? Christian Seifert:Absolutely. There is a long list of- of things that we, uh, need to think about. I mean, uh, network size is- is one component. Being able to simulate a- a regular user in that network environment, dynamic aspects of the network environment, where a node essentially is added to the network and then disappears from the network. Uh, all those components are currently not captured in the simulation as it stands today. And the regular user component is an important one because what you can imagine is if we have an attacker that is able to exploit the network and then you have a defender agent within that network as well, if there is no user component, you can very easily secure that network by essentially turning off all the nodes. Christian Seifert:So in- a defender agent needs to also optimize, uh, to keep the productivity of the users that are existing on the network high, which is currently not- not incorporated in- in the simulation. Nic Filingham:Oh, that's w- that's amazing. So there could be, you know, sort of a future iteration, sort of a n- network or environment productivity, like, score or- or even a dial, and you have to sort of keep it above a particular threshold while you are also thwarting the advances of the- of the agent. Christian Seifert:Absolutely. And I mean, that is, I think, a common trade off in the security space, right? There are certain security m-, uh, measures that- that make a network much more secure. Think of like two-factor authentication. But it does u- add some user friction, right? And so, today we're- we're walking that balance, but I'm hoping that there may be new strategies, not just on the attacker's side, but also on the defender's side, that we can uncover that is able to provide higher level of security while keeping productivity high. Nic Filingham:I think you- you- you have covered this, but I- I'd like to ask it again, just to sort of be crystal clear for our audience. So who is the CyberBattleSim for? Is it for Red Teams? Is it for Blue Teams? Is it for students that are, you know, learning about this space? Could you walk us through some of the types of, you know, people and- and roles that are gonna use CyberBattleSim?Christian Seifert:I mean, I think that the CyberBattleSim today is- is quite simplistic. It is a simulated environment. It is not ... It'-s it's modeled after a real world network, but it is far from being a real world network. So it's, uh, simplistic. It's simulated, which gives us some advantages in terms of, uh, scalability and that learning environment. And so at this point in time, I would say, uh the simulation is really geared towards, uh, the research community. There's a lot of research being done in reinforcement learning. A lot of research is focused on games. Because if you think about a game, that is just another simulated environment. And what we're intending to do here with- with some of the open source releases is really put the spotlight on the security problem. And we're hoping that the- the reinforcement learning researchers and the research community at large will pay more attention to this problem in the security domain. Nic Filingham:It's currently sort of more targeted, as you say, as- as researchers, as sort of a research tool. For it to be something that Red Teams and Blue Teams might want to look at adopting, is that somewhere on a road map. For example, if- if you had the ability to move it out of the simulation and into sort of a- a- a VM space or virtual space or perhaps add the ability for users to recreate their own network topology, is that somewhere on your- your wishlist? Christian Seifert:Absolutely. I think there's certainly the goal to eventually have these, uh, autonomous defensive agent deployed in real world environments. And so in order to get to that, simulation needs to become more and more realistic in order to achieve that. Joshua Neil:There's a lot of work to be done there. 'Cause reinforcement learning on graphs, big networks, i- is computationally e- expensive. And just a lot of raw research, mathematics and computing that needs to be done to get to that real- real world setting. And security research. And in incorporating the knowledge of these constraints and goals and rewards and things that ... T- that takes a lot of domain research and getting- getting the- the security situation realistic. So it's hard. Christian Seifert:In the simulation today, it provides the environment and ability for us to train a Red Team agent. So an agent that attacks the environment. Today, the defender is very simplistic, modeled probabilistically around cleaning up machines that have been exploited. So as kinda the next point on the wishlist is really getting to a point where we have the Red Team agent play against a Blue Team agent and kinda play back and forth and see kinda how that influences the dynamic of the game. Natalia Godyla:So Christian, you noted one of the advantages of the abstraction was that it wasn't directly applicable to the real world. And because it wasn't approved as a safeguard against nefarious actors who might use CyberBattleSim for the wrong reason. As you're thinking about the future of the project, how do you plan to mitigate this challenge as you drive towards more realism in the simulation? Christian Seifert:That is certainly a- a- a risk of this sort of research. I think we are still at the early stages, so I think that risk is- is really nonexistent as it stands right now. But I think it can become a risk as the simulation becomes more sophisticated and realistic. Now, we at Microsoft have the responsible AI effort that is being led at the corporate level that looks at, you know, safety, reliability, transparency, accountability, e- et cetera, as kind of principles that we need to incorporate into our AI systems. And we, early on, engaged the proper committees to help us shape the- the solution in a responsible fashion. And so at this point in time, there weren't really any concerns, but, uh, as the simulation evolves and becomes more realistic, I very much expect that we, Christian Seifert:... be, uh, need to employ particular safeguards to prevent abuse. Nic Filingham:And so without giving away the battle plan here, wh- what are some other avenues that are being, uh, explored here as part of this trying to get ahead of this eventual point in the future, where there are automated agents out there in the wild? Joshua Neil:This is the- the core effort that we're making, and it's hard enough. I'll also say I think it's important for security folks like us, especially Microsoft, to try hard things and to try to break new ground and innovation to protect our customers and really the world. And if we only focus on short-term product enhancements, the adversaries will continue to take advantage of our customers' enterprises, and we really do need to be taking these kind of risks. May not work. It's too ... It's really, really hard. And t- and doing and in- in purposefully endeavoring to- to- to tackle really hard problems is- is necessary to get to the next level of innovation that we have to get to. Christian Seifert:And let me add to that. Like, we have a lot of capabilities and expertise at Microsoft. But in the security space, there are many, many challenges. And so I don't think we can do it alone. Um, and so we also need to kinda put a spotlight on the problem and encourage the broader community to help solve these problems with us. And so there's a variety of efforts that we have pursued over the last, uh, couple of years to do exactly that. So, about two years ago we published a [inaudible 00:28:52] data science competition, where we provided a dataset to the broader community, with a problem around, uh, malware classification and machine risk identification and basically asked the community, "Hey, solve this problem." And there was, you know, prize money associated with it. But I really liked that approach because we have ... Again, we have a lot of d- expertise on the team, but we're also a little bit biased, right, in- in terms of kinda the type of people that we have, uh, and the expertise that we have. Christian Seifert:If you present a problem to the broader research community, you'll get a very different approaches on how people solve the problems. Most likely from com- kind of domains that are not security-related. Other example is an RFP. So we funded, uh, several research projects last year. I think it was, uh, $450,000 worth of research projects where, again, we kind of laid out, "Here are some problems that are of interest that we wanna put the spotlight on, and then support the- the research community p- to pursue research in that area." Nic Filingham:So what kind of ... You know, you talk about it being, uh, an area that we all sort of collectively have to contribute to and sort of get b- behind. Folks listening to the podcast right now, going and reading the blog. Would you like everyone to go and- and- and spin up CyberBattleSim and- and give it a shot, and then once they have ... Tell us about the- the types of work or feedback you'd like to see. So it's up on GitHub. What kind of contributions or- or feedback here are you looking for from- from the community? Christian Seifert:I mean, I'd really love to have, uh, reinforcement learning researchers that have done research in this space work with the CyberBattleSim. Kinda going back to the problem that I mentioned earlier, where how can we build agents that are generalizable in a way that they're able to operate on different network topology, different network configuration, I think is an- an- an exciting area, uh, that I'd love to see, uh, the research community tackle. Second portion is- is really enhancing the simulation. I mentioned a whole slew of features that I think would be beneficial to make it more realistic, and then also kinda tackle the problem of- of negatively impacting potential productivities of- of users that operate on that network. So enhancing the- the simulation itself is another aspect. Nic Filingham:Josh, anything you wanted to add to that? Joshua Neil:Yeah, I mean, I- I think those were the- the major audiences we're hoping for feedback from. But a- al- also like Christian said, if a psychologist comes and looks at this and has an idea, send us an email or something. You know, that multidisciplinary advantage we get from putting this out in the open means we're anticipating surprises. And we want those. We want that diversity of thought and approach. A physicist, "You know, this looks like a black hole and here's the m- ..." Who knows? You know, but that's- that's the kind of-Nic Filingham:Everything's a black hole to a physicist- Joshua Neil:(laughs) Yeah. Nic Filingham:... so that's, uh ... Joshua Neil:So, you know, I think that diversity of thinking is what we really solicit. Just take a look, yeah. Anybody listening. Download it. Play with it. Send us an email. We're doing this so that we get your- your ideas and thinking, for us and for the whole community. Because I think we- we also believe that good security, uh, next generation security is developed by everybody, not just Microsoft. And that there is a- there is a good reason to uplift all of humanity's capability to protect themselves, for Microsoft but for everybody, you know? Natalia Godyla:So Christian, what are the baseline results? How long does it take an agent to get to the desired outcome? Christian Seifert:So the s- simulation is designed in a way that also allows humans to play the game. So we had one of our Red Teamers to actually play the game and it took that person about 50 operations to compromise the entire network. Now when we take a- a random agent that kinda uninformed takes random actions on the network, it takes about 500 steps. So that's kind of the- the lower baseline for an agent. And then we trained, uh, a Deep Q, uh, reinforcement learning agent, and it was able to accomplish, uh, the human baseline after about 50, uh, training iterations. Again, network is quite simple. I wouldn't expect that to hold, uh, as kinda the- the simulation scales and becomes more complex, but that was, uh, certainly an encouraging first result. Joshua Neil:And I think the- the significant thing there is, even if the computer is- takes more steps than the human, well, we can make computers run fast, right? We can do millions of iterations way faster than a- than a human and they're cheaper than humans, et cetera. It's automation. Nic Filingham:Is there a point at which the automated agent gets too good, or- or is there sort of a ... What would actually be the definition of almost a failure in this experiment, to some degree? Joshua Neil:I think one- one is to- to sort of interpret your question as it could be overfed. That is, if it's too good, it's too specific and not generalized. And as soon as you throw some different set of constraints or network at it, it fails. So I think that's a- that's a real metric of the performances. Okay, it- it learned on this situation, but how well does it do on the next one? Nic Filingham:Is there anything else, uh, either of you would like to add before we wrap up here? I feel like I've covered a lot of ground. I'm gonna go download CyberBattleSim and- and try and work out how to execute it. But a- anything you'd like to add, Christian? Christian Seifert:No, not from me. It was, uh, great talking to you.Natalia Godyla:Well, thank you Josh and Christian, for joining us on the show today. It was a pleasure. Christian Seifert:Oh, thanks so much. Joshua Neil:Yeah, thanks so much. Lots of fun. Natalia Godyla:Well, we had a great time unlocking insights into security, from research to artificial intelligence. Keep an eye out for our next episode. Nic Filingham:And don't forget to tweet us at MSFTSecurity, or email us at, with topics you'd like to hear on a future episode. Until then, stay safe. Natalia Godyla:Stay secure.

Knowing Your Enemy: Anticipating Attackers’ Next Moves

Ep. 25
Anyonewho’severwatched boxing knows that great reflexes can be the difference between achampionshipbeltand a black eye.The flexing ofan opponent’s shoulder, the pivot of theirhip-a good boxer will know enoughnot only topredictand avoidthe incoming upper-cut, but willknow how to turn the attack back on theiropponent.Microsoft’s newestcapabilities in Defender puts cyber attackers in the ring and predicts theirnext attacks as the fight is happening.On today’s episode,hosts Nic Fillingham and Natalia Godyla speak with ColeSodja, Melissa Turcotte, and Justin Carroll(and maybe even a secret, fourth guest!)abouttheirblogposton Microsoft’s Security blogabout the new capabilities of using an see the attacker’s next move.In This Episode, You Will Learn:• What kind of data is needed for this level of threat detection and prevention?• The crucial nature of probabilistic graphical modeling in this process• The synergistic relationship between the automated capabilities and the human analystSome Questions We Ask:• What kind of modeling is used and why?• What does the feedback loop between program and analyst look like?• What are the steps taken to identify these attacks?Resources:Justin, Melissa’s, and Cole’s blog post: Carroll’s LinkedIn: Turcotte’s LinkedIn:’sLinkedIn: Neil’s LinkedIn:’sLinkedIn:’sLinkedIn: Unlocked: CISO Series with Bret Arsenaulthttps://SecurityUnlockedCISOSeries.comTranscript[Full transcript at]Nic Fillingham:Hello, and welcome to Security Unlocked, a new podcast from Microsoft, where we unlock insights from the latest in news and research from across Microsoft Security engineering and operations teams. I'm Nic Fillingham.Natalia Godyla:And I'm Natalia Godyla. In each episode, we'll discuss the latest stories for Microsoft Security, deep dive into the newest threat intel, research and data science.Nic Fillingham:And profile some of the fascinating people working on artificial intelligence in Microsoft Security.Natalia Godyla:And now, let's unlock the pod. Welcome, everyone, to another episode of Security Unlocked, and hello, Nic, how's it going?Nic Fillingham:It's going well, good to see you on the other side of this Teams call. Although, you and I were in person not 24 hours ago. You were here in Seattle, we were filming some more episodes of the Security Show. I don't think we've really given listeners of the podcast a full, meaty introduction to the Security Show, have we? Do you wanna let listeners know what they can find?Natalia Godyla:We play games and hang out with experts in the industry and we've done everything from building robots with folks, to building blocks, to painting our nails. You can find the Security Show on our YouTube channel, so, or you can go to We talk with Chris Wysopal, the CTO and co-founder of Veracode on modern secure software development, and Dave Kennedy, who comes to talk to us about SecOps and everything you need for a survival kit in SecOps, so come come check them out.Nic Fillingham:Bad news is you, you have to deal with, uh, Natalia and I on another, uh, media format. But before you go there, make sure you listen to today's episode of Security Unlocked. We have a couple of returning guests. We have Cole and Justin, who have been on before, as well as Josh Neil, who comes on in the, in the last few minutes. And new guest, Melissa. They're all from the Microsoft 365 Defender research team, and they all co-authored a blog from April 1st called Automating Threat Actor Tracking, Understanding Attacker Behavior for Intelligence and Contextual Alerting, which is exactly what it is but I think it buries the lead. Natalia, you had a great TL;DR, what did they do?Natalia Godyla:The team used statistics to predict the threat actor group and the next stage in the attack and really early in the attack, so that we could identify the attack and inform customers so that they could stop it. I think what's really incredible here is, not only the ability to predict that information, but to just do it so early in kill chain. Nic Fillingham:Within two minutes after an attack begin, using this model, Microsoft threat experts were able to send a notification to the customer to let them know an attack was underway. The customer was able to do, you know, the necessary things to get that attack shut down. We'd love, as always, your feedback. Send us emails, Hit us up on the Twitters. On with the pod. Natalia Godyla:On with the pod. Nic Fillingham:Well, welcome back to the Security Unlocked podcast, Cole and Justin, and welcome to the Security Unlocked podcast, Melissa. Thanks for joining us today. We have three wonderful guests, with maybe a, a fourth special guest appearing at the end. And today we're gonna be talking about a blog post appearing on the Security blog from April the 1st, called Automating Threat Actor Tracking, Understanding Attacker Behavior for Intelligence and Contextual Alerting. All of the authors from that blog are here with us. Cole, if I could start with you, if you could sort of reintroduce yourself to the audience, give us a little bit, uh, about your role, what you do at Microsoft, and then perhaps hand off to one of your colleagues for the next intro.Cole Sodja:Sure. Will do, thank you. So, hi, I'm Cole. I work in the Microsoft 356 Defender group. I'm a statistician. Primarily my responsibilities are driving, kind of, research and innovation in general, with supporting threat analytics, threat hunting, threat research in general. Yeah, been doing that for about three years now, and I love it, and I that's a little bit about myself, I'll hand it over to Melissa. Melissa Turcotte:All right. My name's Melissa, I work with Cole, so in the same group, Microsoft 365 Defender. I'm also a statistician by background. I've been in the cyber domain for about probably seven years now. I was working for Department of Energy research laboratory in their cyber research group for five years, and I joined Microsoft a year ago. I like all sorts of problems related to cyber. My expertise probably would be in anomaly detection, but anything related to cyber, and there's data in a problem, I like to be involved.Nic Fillingham:And Justin.Justin Carroll:Hey. I also work in the Microsoft 365 Defender team, doing threat intelligence. My main focus is uncovering new threats and actor groups and understanding what they're doing, different modifications to how they're conducting their attacks, and the outcomes of those attacks, and then figuring out the most effective ways to either, communicate that out to customers or action on detection capabilities to stop them from succeeding.Nic Fillingham:Listeners of the podcast will note that you have a super sweet ninja turtles tattoo, is that correct? Justin Carroll:This is accurate, this is definitely accurate. Nic Fillingham:And, and we may or may not have a super secret fourth guest on this episode, who may join us towards the end, who you would, you would know from an very early episode of the podcast, but perhaps we'll keep them secret until the very end. Thank you all for joining us, thank you for your time. Again, we're referring to a, a blog post that, that all of you authored from April 1st. This is a, quite a complex, and, and sort of technical blog post, which I know a lot of our audience will love. Nic Fillingham:I got a little lost in the math, but I, I absolutely was enthralled by what you all have undertaken here. Cole, if I could start with you, can you give us, give us an overview of what's covered in this blog post, and sort of what this project was, how you tackled it, and what we're gonna talk about, uh, on this episode today.Cole Sodja:Yeah. So if I step back, being someone kind of still fairly new in learning, uh, to cyber security, uh, I approached things pretty much with just using data, right? Doing data driven imprints, as I'd say. And through my research, what I started to, um, kinda ask myself is, can we kinda get ahead of cyber security attacks, you know, from a post-breach perspective? Once we see an adversary in a network, can we start to make some predictions, basically, on what they're likely gonna do? Who is the adversary, or is it human operated, is it an automated script, for example. And then if we recognize the adversary, kinda recognize their tactics, their techniques, their procedures, can we say, okay, we're, we're likely gonna see they're gonna ransom this enterprise, for example.Cole Sodja:So I tried to look at it as more of a data mining exercise initially, it's like, can I recognize these type of patterns, and then how predictive are these patterns that we're seeing in terms of what likely is gonna occur. Or put it another way, what type of threat is this, essentially, to the enterprise? So, so that's kinda the background, the motivation. Now, when I started this project, back with Justin and then with Melissa, it started really as let's look for particular, uh, threat actors that we're aware of, that we recognize, that we know about, and see, like, can we start, from a data perspective, classifying okay, is it this group, is it that group, and what does this group tend to do? Cole Sodja:And one of the challenges in that is, is sparsity. Basically, we don't have a lot of labels sitting around out there saying, it's threat actor group A, B, C, D, and so on. We have handfuls of those. Some of these actors, they don't tend to do attacks very frequently, right? They're extremely sparse. So, so one challenge of this, and one the motivation is, how can we actually partner with threat intelligence, for example, and our threat hunters, to try and essentially encode or extract some of their information to help us build models, to help us reason over the uncertainty, essentially. Cole Sodja:And when we say probabilistic modeling, that's what we mean. It's how do we actually quantify this uncertainty, both in what we believe about the actors, or the adversaries in general, as well as what they're gonna do, right, once they've breached your network. So that's kinda how it started, and what this blog's really about is kinda giving a walk-through, essentially, of what we did initially with this research. It started with, and Justin will talk about this in a moment, it started with looking at few, select threat actors that are very serious. Cole Sodja:We started to understand their behaviors more and more and we thought it was a good opportunity, initially, to try and build a model to, again, understand what they're doing, track what they're doing, because they do change their tactics over time, as well as just see if we could get ahead of them. Can we actually notify a customer in advance, before, uh, for example, their organization's ransomed? So, so that's one part of the blog that we'll discuss, and I'll hand it over to my good friend Justin to take it from here.Justin Carroll:So, like, one of the, the main challenges that we kinda face in the intelligence sphere is understanding the particulars of an actor and when they are present in an environment. A lot of times, you'll see the intelligence is really focused on a very particular indicator such as, like, a known IP address that's malicious, or a single behavior. But it's kinda difficult to frequently pivot them out to understand when a suspected attacker is in an environment. A lot of that is due because they don't always do the exact same behaviors when they are compromising... Organization or device. There will be some variation and it basically requires manual enrichment a lot of the times of devices to try and understand the specifics of the attacks and what Justin Carroll:... the final outcomes o- wh- out of that attack, so this opportunity presented one to work with data scientists to, like, really supercharge our efforts so that we could kinda come in understanding a much bigger picture and knowing, essentially, what behaviors that we saw occur and then which ones we might suspect. A lot of times with these human operated ransomware ones, the time to alert, to notify of the expected outcome is often fairly short, in particular with, uh, one of the ones that we worked on to kinda test this method out. We had seen very short instances from time to compromise to ransom, so, um, this was to try and see if we could have a, a highly confident method of enriching that intelligence, um, and then working with other teams to get those alerts out.Natalia Godyla:If I could jump in here for a moment. So, at the beginning of your description, you noted that typically you'd use manual enrichment. Can you talk a little bit about that? So prior to this probabilistic model, how did you go through that manual enrichment process to try to, uh, predict what threat actors they were or determine what stage of an attack it was?Justin Carroll:It would be something along the lines of, let's say, you had intelligence from either a partner team or open source intelligence that says, you know, "These threat actors are using this IP address as part of their attack," and then looking for the presence of that and then finding out what actually occurred on those devices to understand the entirety of the attack, or looking more generically and saying, like, "Okay, we know these attackers like to use a particular behavior as part of their credential theft," and then so looking for all sorts of instances of that credential theft and then kinda continuing to pivot down into one that is leading to the behavior that y- you're looking for. One of the difficulties that you'll see in particular with this and other actors is, like, they will use multiple shared open source tools and payloads. Um, many of them aren't even malware, they're clean tools with legitimate purposes, so it can make it difficult to try and suss out the ones from malicious versus administrative use, so you have to look for that combination of different behaviors to indicate something malicious is afoot.Nic Fillingham:Justin, if I look at the blog, I think it might be the first chapter here, there's a MITRE ATT&CK framework diagram, Figure One, and it, uh, outlines sort of the steps taken here for how this model was able to, with high confidence, identify the, the actor and, uh, send an alert to the customer who was able to shut it down. I wonder if you could sort of, could you walk us through this, these sort of six steps as an example of, of how this work, how this worked in, in sort of real life?Justin Carroll:Yeah. I can walk through basically from a model's perspective, essentially, how it works. Timing, that's more a function of, like, how the attack, uh, typically progresses with this actor. Technically speaking, what the model's really doing is it's encoding each behavior we have, in this case, each MITRE technique in particular in terms of what's the confidence that once we see, for example, initial access follow... Under, let's say, RDP brute force, followed by lateral tool transfer with subset of tools recognized, that particular sequence right there, that's where the model would be like, "Okay, the probability that it's this particular threat actor group conditional on those two things occurring in sequence will be X," and that sequence could occur in a matter of minutes or even days and weeks, dependent on the actor, of course, we're talking about. Justin Carroll:With the, the actor we're showing in this graph, this actor typically will penetrate a network through RDP brute force, but then w- sometimes the, they won't immediately transfer their tools. They might wait a day or two, or sometimes they'll, they'll do it very fast, like, once they basically compromise a log-in then, uh, they'll, they'll go to that machine, there might be some, um, discovery related commands before they transfer or they might just transfer their tools and then that will be the attack box, basically, in which they stage their attack, and then they'll do some additional things.Justin Carroll:So at each step, basically, or each stage of the attack, as we like to call it, the model is basically gonna then update its probabilities and say, "Okay, based on all the information I've seen up to this stage, the probability that it's this actor is P and now, conditional that it's this actor with probability P, the probability that we'll now see, for example, defense evasion and this 'tack will be Q," or, or we could even go further in the attack stage to say, "Now, given all this, what's the probability that we'll see, for example, ransomware or inhibit system recovery in the coming hour? Or in the coming, you know, X time?" Justin Carroll:So the model's able to do that, but it's primarily conditional on the stages it's observed up to a point in time, not so much in terms of the time it takes for the actors to do X.Natalia Godyla:So, in this blog and in our discussion today, we're gearing up to talk about probabilistic graphical modeling as a way to address the challenge that, Cole and Justin, you've set up for us today, and, and for any of our listeners who'd like to follow along in the blog, the blog is titled "Automating threat actor tracking: Understanding attacker behavior for intelligence and contextual alerting" and you can find it on the Microsoft Security blog. I'd love to dive into the probabilistic graphical modeling and perhaps start with a definition of what that means. So, M- Melissa, could you give us an overview of this approach?Melissa Turcotte:Yeah. We have this problem which what they are essentially saying is, we have a collection of things which... I'm a statistician so I often call them variables, but, you know, features, if you will, if that's m- more easy for you to understand, but we, th- these TTPs, th- right. The sets of things that the actors are doing, and we have a collection of them. And given some collection of these, we wanna make a statement about whether or not it's ransomware or whether it's not a specific threat actor, or a group of actors. Right? And this is, this is, like, a perfect, um, example of where probability can help you make these decision, and one thing I'd like to stress is that no one of these features gives you enough information about whether or not it's this actor or this, this group of actors, or it's ransomware, you know, whatever your variable interest is.Melissa Turcotte:It really is the collection of these together that, you know, kind of in Justin's mind, as an analyst, he's, he's making these connections in his head, and I wanna be able to replicate that in some sense, I wanna take into account his knowledge and kind of his decision making process, combined with the data that I have, to make these probabilistic statements about what I think is happening. And graphical models are really great here, probabilistic graphical models in particular, as they kind of provide this joint probability distribution over all these features, and the variable of interest, in this case, is kind of, maybe is it this actor, but not necessarily. I mainly wanna know something about any one of these other features. I may already know it's this actor, and I may wanna be like, "Wh- what are the common things I see this actor do?"Melissa Turcotte:So, so graphical models really shine in this case where you have this collection of things that you are observing, and you kind of want to ask questions about any subset of them. Given some observations of others, and so th- this is a really great tool to use in this setting, and it's also quite interpretable. So if you kind of look, if you're looking at the blog and you see this Figure Two, which is a toy example, but y- you kind of, as a human, you can look at that and you can kind of understand that, "Okay, so I'm seeing transfer tools and lateral movement are related." Um, and you can kind of understand sort of wh- what the relationships the model is making. Um, and so that kind of provides this extra, you know, benefit of this in that, yeah, I can talk an analyst through what this kind of is showing and then i- it's quite interpretable for them even if they don't understand the underlying maths, and that's kind of something we really wanna strive for. Um, you shouldn't have to understand the underlying maths to kind of understand the decisions that are being made.Melissa Turcotte:It's really attractive in this sense, and then the Bayesian networks, why I really like it is kind of, the Bayesian paradigm is... So you, you have, you know, statistics, generally, or data science, you have some data and you're kind of, you know, making inference given the set of data to make statements about things of interest. So the data tells you something about your beliefs and the state of the world, but you have your own subjective beliefs about wh- what you think could and could not happen. The, the Bayesian paradigm kind of combines those two things, so it's, you have your beliefs and then you have what the data is telling you, a- and your ultimate kind of predictions are based on the combination of those things. And generally, the, the way it works is the more data you have, the data will always win through.Melissa Turcotte:So this problem, bringing it back to attacker prediction, is a case where we don't have a lot of data, right? We don't... Companies get attacked... Or we say, companies get attacked all the time but not at the scale at which we collect the underlying data, so like, you know, we have, you know, you as a user are performing actions, logging into computers you use... You know, this shows up in the data thousands of times a day, whereas an attack happens kind of, like, on a monthly scale, so c- the scales of attacks to the data we're getting is just really small, and then when you go into attacks that kind of we've labeled as being attributed to a threat actor, I mean, that's even way smaller. So it's, it's kind of a small data problem, uh, in terms of the number of labels you have.Melissa Turcotte:But what we do have is this analysts who have spent years tracking these people and have their kind of, you know, beliefs about what they do and how they changed over time. And so we Melissa Turcotte:Wanna capture that. We definitely want to include the evidence we see and the data, but we wanna capture that really rich knowledge that we get from the analysts. And so kind of that's where the Bayesian network part becomes attractive because it, it provides a very principled way to, to capture the analysts' expertise, combine that information with the data we're seeing to make these ultimate predictions.Natalia Godyla:For our audience, could you really quickly describe a Bayesian network?Melissa Turcotte:So, a Bayesian network is a way of building a model for a collection of variables whereby the idea is that you have different variables which are related to each other. It, it, it kind of helps draw out or show what those relationships are so, like, in the graph, you know, if there's an arrow from impact... Or from transfer tools to impact that's saying if I see transfer tools, that has a direct impact... I'm gonna use the word impact twice here. Has a direct impact on whether or not I'm going to see impact. So, so it's kind of the way the variables relate to each other and the way the probabilities change according to those relationships. And so a Bayesian network encodes all this information. Nic Fillingham:If I can take another swing at that one... Thank you, Melissa. I'm wondering what were some of the other, uh, techniques that you either considered for this approach? Like, did you experiment with other methods and then ultimately chose Bayesian?Melissa Turcotte:Yes, um, in fact, uh, so the initial kind of... The perhaps most obvious thing to do is to c- to think of decision trees, right? You s- you're, you're, you're seeing, you know, these things over time. Okay, I saw, um, what was the first one? Initial access with this... You don't go as broad as initial access, but I saw initial access using this, you know, minor technique. And so you can kind of think, like, you, you, you have a tree that's kind of... Okay, I saw this, I didn't see this, but I saw this and I didn't see this, so now I think it's this actor. But kind of where this is preferable is the fact that, as Paul says, we don't want to see the whole attack happen before we make a statement about what we think it is. And Bayesian networks work really well in, in the absence of some observed variables. Cole Sodja:Yeah, I'll just quickly chime in. I agree with Melissa. So, I did experiments, for example, with several models including decision trees. Even, um, different forms of Bayesian decision trees like BART for example. And in addition to what Melissa is saying where, for example, predicting the probability that it's threat actor conditioned on certain variables we saw, uh, we might also, as Melissa pointed out, want to say, okay, let's predict, for example, that this threat actor is going to do impact or a certain form of impact. And with decision trees, that means basically you're building multiple decision trees to do that. You can't just build one decision tree... Well, let's put it this way. You can't easily build one decision tree to have multiple target variables. That's something you get for free with the Bayesian network. Another thing I'll say in addition to what, um... To marginalization is the Bayesian network is more general. So, it could actually handle kind of a broader graphical structure. The decision tree is a specific graph. Cole Sodja:So, it kind of already inhibits you, if you will, to learning a certain structure over the data. Whereas the Bayesian nets, they could give you a little more general structure. We could also build these models that are time dependent, what are called dynamic Bayesian networks. That's something much harder to do with tree models. So, it's just a more flexible model as well as I would say. In my experiments, the Bayesian network did perform better on average than the set of decision trees I considered.Nic Fillingham:I'd like to better understand the relationship between this model and folks like Justin. So, is Justin, as a very experienced threat analyst, is Justin helping you define labels and helping you sort of build some of the initial... I'm, gonna get the taxonomy wrong here, so please correct me. But the initial sort of properties of the model? Or is, is Justin, as an analyst, interpreting what you sort of think you have in the model? How, how do I understand the relationship between the analyst and, and how they're providing their expertise into, into this model?Melissa Turcotte:All three.Nic Fillingham:Oh, great. (laughs)Melissa Turcotte:All three things you said is actually correct. So, so hopefully we, we've explained it somewhat well. So, yes. The first stage, right Justin? The analysts are providing us our label data. So, yes. That's the first thing. And then they also help us kind of, you know, you have the raw data, but that's kind of... There's so much data processing that goes... That, that happens before it's kind of... This data's kind of in this tabular forms that's like, yes, we... You know, these are the features we are tracking, so think of your TTPs, the different notes in your graph. Getting the data into that, kind of that schema, the threat analysts help with. So, you know, help define what, what these tactics, techniques, and procedures are that we should track. Like you said, you, you can't be super broad. Lateral movement doesn't really have a lot of meaning, um, to kind of like the different ways in which someone can do lateral movement and how granular w- you want to go. Melissa Turcotte:So, we discuss with the analysts all the time to kind of build up, you know, the ontology, if you will. And then, you know, as a first stage, like I said, it's a small data sample, so we're like... Justin helps inform what the model thinks about in a probabilistic sense. So, you... One thing I might ask him, I, I would be like... If I saw net... you know I'm borrowing from our toy example, but if I saw a network scanning modify system process, transfer tools, but didn't see any of the others, do you think it would be this actor X? Or do you think it would be ransomware? And he would be like, hmm, I would probably 60% certain. I can take that information and encode that directly so that, in the absence of any data, the model would return 60%. It would... If I didn't see any data, it would return what Justin believed was the probability in the presence of a certain number of variables. Melissa Turcotte:And then we kind of see data and we update our beliefs over time based on that. And then, also, after we've kind of trained these things, I go back to Justin and say does this make sense to you? So, he, he's kind of involved in all three, the whole process.Nic Fillingham:Melissa, I think you're telling me you've built a virtual Justin. Melissa Turcotte:We... That, that is what we are literally trying to do. And back it up... And, you know, and back it up with data as well. I'd, I'd like to like... You know, I'm a firm believer that everyone has their subjective beliefs, Justin has beliefs as well. Oftentimes, I can prove analysts wrong. Be like, they think something, I'm like, well, the data is telling me something else. So, we need to figure out, you know, that discrepancy. But, yes. We are essentially trying to build virtual Jus- uh, Justins. Although, like, th- there... I don't think there's any stage upon which we won't need the analysts to constantly feed back in with the new information they have. Nic Fillingham:Got it. And then can it come full circle? Justin, how do you as an analyst, how do you get smarter and better at what you do by what this model is, is telling you? What's the feedback loop look like here for you?Justin Carroll:It's one of those where, basically, using the model kind of super-charged my abilities where, instead of having to look at this very granular kind of like ad hoc, oh, this may be interesting, now I have the instances already serviced to me, and I have a good understanding of what success rate through the kill chain the attacker was able to get. And maybe figure out which ones that I needed to enrich more to understand was there data that we can add into the model because they've done something different that we need to capture and then look for opportunities in that way. So, really, it's basically... It made it where, give or take, sometimes it would take anywhere from 10 to 20 minutes sometimes to try and figure out, like, is this who I think it is? And like, what have they done? What are their goals? To just looking at the result from the model. And within usually seconds, being like, yeah, that looks exactly right. That's... It's confirmed, I think that's spot on. Natalia Godyla:So, Justin, was there something that was the most surprising in working with this model? Something that the model taught you either about threat actors or any details about the features? Justin Carroll:One of the things was kind of reexamining My confidence levels on different parts of the attack. Um, where Melissa was stating, for instance, you know, the data suggesting this and the models coming to this conclusion, uh, you know, thinking that it's this probability, and there would be times where I'd have to kind of reevaluate and think, like, hmm, I might've been missing something or overestimating the prevalence of a particular thing and saying it's related to such. Like, uh, I can tend to get very biased based on my narrow scope of the attacks that I'm looking at and think that it's related to this thing, but the model was able to provide a lot of clarity to some of the behaviors that maybe I didn't think were as confident a signal or extremely confident signal and I wasn't giving them the appropriate weight. That's one of the advantages of using it to understand what the attacker's doing, is I let it do much of the leg work once everything's kind of coded in. And then occasionally, like if we found opportunities where it was like, hmm, this still isn't quite right, then it could be tuned as a c- um, as necessary. Justin Carroll:I think that was probably one of the biggest ones of kind of trying to work through and actually spell out, like, my own thinking processes when I'm evaluating the data. It was something that you just kind of do without thinking, where you're constantly, as an intelligence analyst, looking at data and making conclusions on that data. But you're not usually saying, like, okay, I saw this so I'm gonna give it a 60% probability that it's this. And like, you're, you're just kind of sometimes it's either gut intuition or working on it that way. But actually having the model encode and return back what it was understanding made a, a pretty big impact in trying to understand how my own decision processes work and basically how best to kind of think Justin Carroll:About these different, wide array of attacks that we're constantly investigating.Nic Fillingham:The types of indicators that you're building this model on, again please correct me on my taxonomy here, but you're not looking for, you know, NFO files or like ASCII art or, you know, the actual threat actors name being sort of hidden somewhere in the jpeg that they drop as a, as a for the LOLs, like, they're... You're not looking for a sort of a literal signature of these threat actor groups, you're, you're, what you're, what you're doing is you're, you're seeing the actions that have been taken and without any other way of attributing them to an individual group, you're piecing them together. Nic Fillingham:And as you, as you get more actions and you piece them together based on the, the labels that you get from people like Justin, you're able to, to ultimately have a high probability that it's this threat group actor and they're doing this thing and they're likely to do this thing next. Have I got that right? You're, they're... In no way shape or form are you actually finding a secret text file that has the name, you know, the, the, the handles for all the hackers who are doing it for the LOLs.Cole Sodja:So let me just quickly jump in, you pretty much nailed it. I'll say this, so, we wanted to do both actually, right, because we don't want to restrain the model if it's, if core's gonna add predictive power, so like you said, we're not actually searching, grepping for example, for a threat actor name and some file or image, certainly not that level. But, for example, some of the actors, maybe they have common infrastructure, maybe they use particular types of tools in their attack typically, right? Like, maybe there's a SHA-1 out there they've used a lot in their attack, or, or recurring IP addresses they use as part of brute forcing. Cole Sodja:Those are there, but those are very specific and if you just relied on those, like Melissa was saying, either one or a few of those, you're not gonna generalize. You'll probably miss that attacker, right? But we certainly don't want to exclude it from the model because, um, if we happen to see that, the model will, uh, come back with a different type of probability, right? It'd be like, okay. Now the model might be more confident early, rather than waiting to see how the rest of the kill chain progresses. On the more general side, we probably won't go to the MITRE categories, 'cause they're a little too general, right? But if we go to some of the sub techniques, we don't actually have to look at the particular types of executables, or tools, or IPs used. Cole Sodja:Sometimes just the timing and sequencing is enough actually, to narrow down to, maybe not a particular threat actor, but a group of actors or, more generally, we can say with high competence, you know, this is a human adversary. They're taking this amount of time to do discovery commands, they're, they're doing lateral these type of ways. And the model could recognize that, even without knowing the particular commands, it's just seeing the more general techniques involved, right? So we do a bit of both, actually. We tend to want to rely more on, kind of, the general attacks or indicators as you're saying, that's right. But, we certainly don't want to throw away specifics that are reuse because we could get ahead of the attack much earlier too. So it's a bit of both at the end of the day.Melissa Turcotte:So yes, Nic, if, if, if you have an evil bit, look for the evil bit. You don't need data science for that. Nic Fillingham:(laughs)Natalia Godyla:And how is this model being used today, meaning is this a model that's being used by our internal security team to protect Microsoft and its customers, is it being used by a Microsoft threat experts group or is this actually embedded in some of our solutions today, and our customers are feeling that benefit? And what is the future intent of the model?Justin Carroll:One of those... So, there are multiple uses that are in place for the model. So one of the big things for me, so in my own selfish interest, it's intelligence, it's one of the easiest ways that I can keep tabs on the attacker and continually build new profiles and understand, basically, reports out, this is what they're doing, this is how they're doing it, this is how active they are. Like, are we seeing, you know, large volumes of their attack, are they taking a break, that kinda stuff. Then, the Microsoft threat experts are using it as a signal to help understand attacks early on in the kill chain so that they can get those notifications out ideally before the ransom, which can be quite difficult a lot of the times depending on the adversary and how quickly they seek to ransom. A lot of times there isn't a great deal of time.Cole Sodja:Yeah, there's other products, for example, M365D. So, um, there are plans, uh, it requires some engineering, ultimately, because this is a big product, um, huge customer base and so on. But there are already plans in motion to take what we've built already, as part of this framework, and integrate that into that product. There's other products as well, both from a threat intelligence perspective, and possibly kind of from SOC alerting perspective as well, that I'm in active discussions with other products across Microsoft to do the POC, make sure it works with their data, make sure they're comfortable and then work with their engineering team to at least get that in the plan. Those are ongoing discussion but M365D does have, kinda, I'll say, in their planning cycle, to get this in the product. Nic Fillingham:I wonder if this might be a good time to bring our secret special guest on microphone, Josh, if you're there, I think I might ask, uh, might wonder if you could jump in on this one. I think you've understated the power of what you've built here. From everything that you've just explained, you know, within a couple of minutes of a threat actor getting initial access to have a high probability index to be able to contact the customer and say, here's who we think is inside your network, here's what we think they're gonna do next, so they can shut it down. This is the next level, right? And, and Josh, when we interviewed you on episode three, you were hinting at this, if I'm not mistaken. Is this, is this sort of what you guys have been working on?Joshua Neil:Yeah, I'm so proud that we, that we took it from concept to realized value for the customers and, and at this point we've had that impact with your customers in stopping human operations. And, and so it's really exciting and, and it's, it's on the journey but, you know, if I extract an overall theme from this, it's consistent with that podcast that we had before because I was sort of complaining about AI. And I was sort of complaining about what we see in some of the, in some of the branding and marketing that, that folks do in, in cyber security. And I think this team and, and the work they've done exemplifies the right applications of data driven methods. Joshua Neil:There is no magical, artificial intelligence today. What there is is, and this is a, an experience that all of us on the data science team have had over the, over the past few years, and really for me about 20 years, is we can use data and some mathematics and some computing to begin to automate and accelerate what the humans are doing. And so, by sitting very closely with, and working very hard with the human experts like Justin, we're explicitly encoding their knowledge into models. So that's one thing is that the data science we're doing is to automate some of the stuff they're doing today. But the intention is not to solve the world, not to give our customers a license to solve security, we're, we're not gonna be able to do that. What we are able to do is uplift the sophistication of our customers operations. Joshua Neil:So, you know, what Justin sort of reflected on, uh, he's able to do a more interesting job, a more sophisticated job, because we're taking the data and his knowledge and encoding it and accelerating and automating some of the stuff that he's having to do manually now. And that's where the real nuts and bolts, you know, and the real rubber meets the road here, is that there's no magic gun that's gonna blow away all the adversaries with, with AI. What there is is hard work between data scientists and threat expertise to uplift their capabilities and accelerate their effectiveness in the face of the adversary. And that's what I would like to get across to the, to the listeners, is that by hard work and careful and close collaboration between data science and threat expertise, that's how we really make progress in this space.Nic Fillingham:Thank you so much Josh. And I just wanted to quickly clarify, from a previous comment from Cole, so this model is in use now, correct? Folks like Justin, Microsoft threat analysts, they are using this model now to make the model better, and to be able to get that additional information and those confidence levels in, in, in doing their analyst work. And so Microsoft threat expert customers are directly benefiting from this work, as of today. That's correct, is it?Joshua Neil:That's correct. We've sent targeted attack notifications to customers based on this model.Nic Fillingham:You've all been very, very, generous. Natalia Godyla:Thank you for that. And, and thank you to the whole team here for joining us on the show today. Melissa Turcotte:Absolutely.Cole Sodja:My pleasure.Joshua Neil:It was a lot of fun as always. And, and thank you, Nic and Natalia for this.Natalia Godyla:Well, we had a great time unlocking insights into security, from research to artificial intelligence. Keep an eye out for our next episode.Nic Fillingham:And don't forget to tweet us at MSFTSecurity or email us at with topics you'd like to hear on future episode. Until then, stay safe...Natalia Godyla:Stay secure.