Enjoying the podcast? Don’t miss out on future episodes! Please hit that subscribe button on Apple, Spotify, or your favorite podcast platform to stay updated with our latest content. Thank you for your support!

Most of us have heard the term “AI agent” but it is only the minority of banks and fintechs that have done any kind of implementation. This is partly because it is not a trivial task and there are real risks to getting this wrong. What is needed is some kind of framework that helps reduce these risks and provides best practices for implementing AI agents in financial services.
My next guest on the Fintech One-on-One podcast is Simon Taylor, the Head of Strategy and Content for Sardine and the author of the Fintech Brainfood newsletter. Today, Sardine has released a white paper, titled The Agentic Oversight Framework – Procedures, Accountability, and Best Practices for Agentic AI Use in Regulated Financial Services. It is a how-to document for implementing AI Agents into your bank, credit union or fintech. We unpack the white paper in this podcast, with as little jargon as possible, making it approachable for any risk or compliance executive.
In this podcast you will learn:
- Simon’s background and why he joined Sardine in 2022.
- How banks are approaching BSA/AML compliance today.
- What Sardine is trying to achieve with this new white paper.
- What concerns people in financial services have about hiring an AI agent.
- How AI agents interact with humans in compliance departments.
- How much more effective AI agents can be.
- The six different processes in the Agentic Oversight Framework (AOF).
- How you manage the risk of hallucination in your LLMs.
- How you can scale the AOF beyond BSA/AML into other areas.
- Some examples of the AOF in action.
- If you are starting at zero how you start working with AI agents.
- Why you should jump in now even though the models will continue to get better.
- Why data security is such a critical component of the implementation of AI agents.
Read a transcription of our conversation below.
FINTECH ONE-ON-ONE PODCAST NO. 529 – SIMON TAYLOR
Peter Renton: Today’s episode is brought to you by Sardine. Sardine is the leading AI platform for fraud prevention, compliance and credit risk. They use device intelligence, behavior biometrics and AI to stop fraud in real time, to streamline compliance and stop scams before they happen. Banks, retailers, fintechs and payments companies in over 70 countries trust Sardine. Sardine is redefining risk management for the digital economy. Learn more at sardine.ai.
Simon Taylor: So the financial institution will hire an AI agent and this AI agent would be given the standard operating procedures and trained. What would you do if you hired a human agent? You would give them the standard operating procedures and you would train them. Turns out LLMs hallucinate an awful lot less if you give them a lot of context and a lot of boundaries. Standard operating procedures are context and boundaries. This is the aha moment. This is the breakthrough. It’s like, yes, you have spent forever writing down your standard operating procedures and most humans don’t understand them most of the time. To an LLM, that’s like gold dust. Like, yes, context, I needed that. That’s really, really helpful.
Peter Renton: This is the Fintech One-on-One Podcast, show for fintech enthusiasts looking to better understand the leaders shaping fintech and banking today. My name is Peter Renton and since 2013, I’ve been conducting in-depth interviews with fintech founders and banking executives. Today on the show, I am delighted to welcome Simon Taylor. He is the Head of Strategy and Content for Sardine, but most people will know him through his Fintech Brainfood newsletter. Today we talk about agentic AI and the new white paper that has just been released by Sardine. We introduce a new concept called the agentic oversight framework, which is really just the best practices for agentic AI in financial services. And don’t worry, Simon makes this content all very approachable as we do a deep dive into how to implement AI agents for compliance. Now let’s get on with the show.
PR: Welcome to the podcast, Simon.
ST: Thanks for having me Peter, good to be back.
PR: Let’s kick it off. We are recording this on April 9th. And this is actually a big day for you, Simon, because I just saw in my inbox literally 15 minutes ago that NerdCon is out in the wild. So before we get started on all the AI agent stuff and the Sardine stuff, I do want to at least acknowledge that, you know, I know you’ve been putting in a lot of effort. We’ve talked about it multiple times over the last several months. Congrats on getting NerdCon out in the wild, and also Sardine has your own event now. You’re becoming like me. You’re becoming an events guy.
ST: Who knew, right? SardineCon is on the 20th of August in the Bay Area in San Francisco. We are trying to solve scams and the scandemic, and I love anything that solves problems. And Fintech NerdCon is a thousand of the nerdiest people in fintech in Miami on the 19th and 20th of November, https://www.fintechnerdcon.com/ is where you find out more. And https://www.sardine.ai/sardine-con if you want to know more about either of those, yeah, I’m doing events now. Kind of scary, but people seem to like them.
PR: Yes, yes. Well, let me just tell you a little word of advice. I mean, events are very stressful, particularly that last month. But after it’s all over, it is the greatest feeling to have brought a whole bunch of important people together and to have a really great couple of days in person. I think there’s no other better activity, I don’t think, it’s, very, very fulfilling. But anyway, let’s get stuck into it. So, I mean, I think most of the listeners here will know about you from Fintech Brainfood. But why don’t you give us a little bit of the background. I think we met when you were, I think you were either still at 11:FS or just removed from there. But why don’t you tell us a little bit of the arc of your career to date.
ST: Very briefly. I actually started as a software engineer for many years, then moved into payments doing product management, project management, then change inside of a bank. So I worked at Barclays for many years, helped them with some of VC stuff they were doing, then became the consultant. So many folks will remember 11:FS and the Fintech Insider podcast, worked with everybody there from Stripe and Circle to HSBC and Grab and many interesting clients. And then in 2022 joined Sardine, the world’s best fraud and compliance AI that you hire as an API and working with Soups [Ranjan] and the guys to really root out scams. The story goes that my dad actually got scammed out of quite a bit of money. And I promise you the very next conversation I had was with Soups showing me how he would detect that particular type of scam. And I thought, well, that’s like the universe telling me that I have to go and join this company and be involved. And during all of that time, you know, sort of in the pandemic, I started something called fintechbrainfood.com, a weekly newsletter, 45,000 subscribers, everybody, the great and the good from bank CEOs to people just getting started in fintech, where I just rant at the world about fintech really, and people seem to like reading it. I cover four new fintech companies every week. And yeah, that’s, that’s kind of me. I’ve done a lot in the interim, as you can imagine, bit of a career journeyman, but it’s been a fun old journey, and I get to meet people like you and have conversations like this.
PR: Indeed. And just a quick plug, I mean, Fintech Brainfood is must-read content if you’re in the fintech space. I do enjoy it. It comes into my inbox at around 7am on a Sunday morning. So I often sit having breakfast and catching up with Fintech Brainfood. You and Jason Mickler hit my inbox within five minutes of each other every week. It is great, great reading. So what do you do exactly at Sardine?
ST: Great question. I ask myself that every single day. Practically, look, I am doing everything from tactical stuff like writing some of the blog posts, writing a lot of the social posts, helping prep some of the presentations that we’re giving the keynotes, and sort of the content management job, I suppose, is the closest thing to a day job. But I also act in an ambassadorial role. So I get lots of speaking gigs. I get to operate here in London, going and meeting clients and prospects. I’ve done some sales in my past. I know how to manage that sort of thing. So I spend a lot of time doing the ambassador type of conversations. And then, under the hood as it goes, I’m also pulling together all of our white papers, all of our long-form influencer strategy, how we can help the rest of the company grow its presence with regulators, and many, many others. I guess individual contributor, I do a mix of things. But in any one day I can be writing a white paper, having a conversation like this, speaking on stage, meeting with clients or prospects and everything in between.
PR: All right. So you have just recently launched a white paper. If all the stars aligned, it’ll be actually released the day this is published. But before we get into that, let’s just take a step back and talk about banks specifically, and your conversations and Sardine’s conversations with banks. How are they approaching, you know, BSA/AML compliance today?
ST: Ultimately, BSA/AML compliance is one of those things that you have to have to do and you cannot afford to get wrong. And the problem with that is often it’s a case of how many more bodies can I hire to make sure that I can hit this mark and get my policies, procedures and controls in place and ensure that my training’s in place and ensure that I am trying to get some straight through processing and more automation. But, historically, the cost of doing something wrong could be a multi-billion dollar fine. So you’re almost incentive-wise better off sticking with a less-than-perfect process and potentially hiring more people to deal with it than you are sort of trying to do anything fancy with automation that might break because the consequences could be really, really massive. But that’s not to say that financial institutions are doing nothing. They are. There are some cutting-edge machine learning models. There are some really cutting-edge bits of tech inside some of the large financial institutions that are trying to detect some of the most sophisticated financial crime networks around the world. There are very large teams poring over the data. But, it would typically go wrong in one of two or three places. Obviously, at KYC and onboarding, crucial control. When somebody is trying to sign up for an account, how do you know that this identity is real, and how do you know that that person is real? Then at the transaction monitoring stage, somebody is making a payment or moving money into their account. What happens is, inside the bank, there’s usually a transaction monitoring system, a sanction screening system, or a KYC system. And what it’s doing is flagging alerts. And that alert, that risk could be, this is obviously about the guy, let’s just straight decline this transaction. Let’s straight not onboard this customer. That is somebody in the North Korean military. They probably shouldn’t be opening an account in Idaho. You know, this is, this is something we should probably avoid. There are obvious no’s. And then there are things that look fairly comfortable, yeses: this is Peter Renton, he’s been dealing with this every day. He made a $2 transaction for, you know, a stick of gum. We’re not going to block that transaction. We’re not going to do a large investigation that this could be the beginning of money muling linked to state-sponsored terrorism. Right. So there are obvious noes and obvious yeses. And for the most part, you don’t notice the middle. But the middle is, this is an alert flag; a human needs to look at this because we can’t auto-decision on it. And the problem with these alerts, the problem with the flag for manual review is 95-plus percent of these are false positives. So, Peter is signing up for a bank account, but because he used a nickname or because somebody out there has a slightly similar name or a similar date of birth, you get flagged as a false positive. Now what happens there, is somebody has to come along and do a manual review because there are so many reasons why you could have got flagged. Maybe your name is in Aramaic. Maybe it’s in Hebrew. Maybe you’ve used a nickname. Maybe your date of birth is slightly wrong. Maybe there was a lot of glare when you took the photo of the picture and the machine couldn’t quite figure it out. It’s almost impossible to build if-then-else rules for all the things that could go wrong. So what do we do? We flag it to a human, and the human goes, yeah, that’s what it is, and deals with it quite quickly. So that’s kind of the reality is, the volume of these is going up and up and up faster and faster because more names are being added to sanctions lists.
There’s more pressure on BSA/AML. The fines are still there. There’s more volume coming through. But every time Peter gets stuck in an onboarding queue for two, three, four days, Peter’s not becoming a transacting customer. Peter’s getting frustrated. Just let me get into my account. And 95% of the time, you would have been a good customer.
PR: Okay. Let’s talk about the white paper that you just released. I had a read. It’s not a simple read because there are a lot of new concepts in there, things that I hadn’t really heard much about before. Maybe talk about what you were trying to achieve with this white paper.
ST: Sure, so we need to introduce a new character to the stage, our friend AI agents. And AI agents, or agentic AI, may be the most annoying buzzword in the world right now, but if you imagine it a little bit like hiring a person, it starts to make a bit more sense. But there’s a problem before you can hire these AI agents, unlike hiring a person, because when you hire an AI agent, everybody immediately worries about one of a few things. They worry about, well, will the regulator ever allow it? Maybe in this administration, it’s a little bit more allowable. But what happens in four years’ time when the administration changes? So there’s a real worry about, can I even do this? And then, people worry about hallucination risk. If this thing works a little bit like that ChatGPT thing that I’ve used that sometimes makes stuff up, I really don’t want that if the answer is going to be a $3 billion fine. So maybe I should avoid the hallucination tech in there. How do I kind of avoid that? And all of those are perfectly reasonable concerns. But if you stay with my metaphor here of like hiring a person, then you can kind of see what we’re proposing. We call it the agentic oversight framework because it needed a name. But honestly, the metaphor is probably more instructive. So if I’m going to hire a person, I’m going to tell them what their job is: Your job is to work this queue of sanctions alerts or adverse media alerts or transaction monitoring alerts. And you will sit there, and when this alert comes up, one of two things will happen. You’ll either flag it as a false positive and just like let the person go, or you’ll do a detailed investigation because this really looks like a crime. Now you need to start putting evidence together, and you need to start building a case, and you run through a case management system. Then you work with others, and then eventually, you file the suspicious activity report with FinCEN. So those are the two paths you can kind of take. Now you’ve got this new day job, you need to know what that process looks like. So somebody comes along and defines that process. They define that set of Standard Operating Procedures, typically, is the term given to them. And once you’ve done that, then you’re given access to all of the systems and the data that you need to be able to make a decision according to that process. And then what you do is you’ll get a case, and you’ll look at it and then you’ll make a decision. Is this, in fact, Peter? Yes or no? Does this look like an approve or decline? You’ll put all of your findings together, all of your evidence together, and then hit either the approve button or the decline button. And that’s kind of what we’re saying for the framework here with an AI agent. Instead of hiring a human and training them on a process and then giving them access to a system where they have to put their evidence together and approve or decline, you train an AI agent on your Standard Operating Procedures.
PR: So the you that you’re talking to could just as easily be an AI agent and not a human.
ST: Correct. So, the financial institution will hire an AI agent, and this AI agent will be given the Standard Operating Procedures and trained. What would you do if you hired a human agent? You would give them the Standard Operating Procedures and you would train them on them. So it turns out LLMs hallucinate an awful lot less if you give them a lot of context and a lot of boundaries. Standard Operating Procedures are context and boundaries. This is the aha moment. This is the breakthrough. It’s like, yes, train them on yours. You have spent forever writing down your Standard Operating Procedures and most humans don’t understand them most of the time. To an LLM, that’s like gold dust. Like, yes, context, I needed that. That’s really, really helpful. But then the second thing you have to do is like, okay, I understand what my process is. Now I need access to the data. And what we’re saying is that there’s a good way to do that and a bad way. The bad way to do it is give them access to laptops and let them just start running around inside of your computer systems and using all of your legacy software. And the good way to do it is to say, right, the AI agent operates inside this box, for Sardine, it operates inside the Sardine platform. And we can firewall it off from seeing anything it shouldn’t see. It can only see what you define it should be able to see. So train it, give it secure access to the data, then let it present its findings. Now when it presents its findings, unlike the human agent that would just make the decision, the agent goes to a human, hey, I’ve done the 45 minutes of legwork that it took to figure out that this was a false positive. Here’s why I think it’s a false positive, da da da da. Here’s my evidence. Do you agree? And what happens is you can compress five to 30 minutes per investigation down to two seconds. That time saving becomes enormous when you have a 40 day backlog of these things, which we see inside many, many institutions. So that is sort of the real example here of how AI agents can move you away from just dealing with false positives all day. Because here’s the thing, if you imagine your job was 95% of the time, it was a false positive, you get burned out, you get bored, the staff churn is unbelievable, whereas the real stuff is investigating the criminal networks. That’s actually really hard, really creative work where people should be spending their time, but they’re not because what are they doing? They’re pushing the, yep, that was a false positive, that took seven minutes. Oh, that was a false positive, that took a whole nine minutes. That just eats your day up, and it’s 95 % of the work.
PR: Right, right. So say you’re a human agent and you’re getting these 95% false positives. I totally get the time-saving here. How much more effective are they? Like, do they, of all the 95% of false positives, how many are they capturing?
ST: So, depends on the use case. The lowest performance we’ve seen is around 80% capture rate and the best is around 97%. But this was with one month of training runs and we’d barely, barely begun to tweak the evals and kind of retrain them. So, what you’re aiming to do is have the human reinforcement of the approves/declines gradually, over time build a feedback loop so that you can continually train and retrain those agents to get better and more performant. So we’re in the 80 to 97% window, depending on the use case. So for step-up KYC, for instance, we got upwards of 97% precision on true false positives. So a promising start, but a long way to go.
PR: Right, it certainly is promising. So, so let’s dig into this Agentic Oversight Framework AOF. There’s a new acronym for you, everybody. In the white paper, you talk about six different processes. Maybe you could just give us a little summary about what’s involved there.
ST: So I’ve actually talked about the first three, which is number one, we say define the agentic pathways. So essentially, what you’re saying with number one is here are the things you can do AI agent. You can do this and then this and then this and then this. And you’re really constraining them to its particular use case. It might be step-up KYC. It might be sanctions alert. It might be adverse media alerts. It might be, whatever it is that they’re doing, define that pathway and train them on the standing operating procedures of that pathway. That’s number one. So, kind of that training and definition of the use case. Number two is, what data is the AI agent allowed to access? And this is a crucial one because everybody’s going to worry about privacy. They’re going to worry about data leakage. So, how is it accessing that data? And this is where we say there’s a good way to do it, which is, constrain it inside of a platform. Don’t give it unfettered access to your data. And I think that is number two, defining the agent data access. Then number three is, give it a consistent way of presenting its findings and recommendations to a human, so there’s a human in the loop for every single transaction. What we’re not saying is, let the AI agents also decision. I think we will get there one day, but it’s still early in this technology. If I had to be sort of reporting out to regulators, if my career was at risk from a multi-billion dollar fine, I’d certainly prefer to go this route. And the other thing with having a human in the loop is it really, really helps you with oversight. And I think accountability and oversight is like the key question in compliance. What happens when something goes wrong, and who’s accountable, and what are you going to do about it. Well, this answers that question because really what happened is, there was bit of software that helped the human agent go a little bit faster at their job. So your accountability doesn’t actually change. Chance is one of those big questions. That’s one, two and three. So catching up: define the agentic pathways, train the agent, two – define what data access you have, and then three, it presents its findings and you’ll approve or decline it. Number four is make sure you’ve got an audit trail of everything it did. So every click, every data element, every decision it made, everything should be fully logged and available in an audit trail, which then becomes very helpful so that you can report your kind of success rates, your failure rates, your model drift, your feature explainability, kind of all of that, to your group risk and control functions. And then that leaked into number six, which is, having a framework for that explainability, which is just grounded in data science best practices. We forget that AI agents really are a form of data science. And in data science, we have best practices for explainability, which is understanding all of the inputs and then really deeply understanding every step and following that audit trail all the way through the process, and then being able to show the features that were involved in making the decision and then presenting that back as findings and letting the human make the decision. So if you are able to do those six things, that gives you a really solid grounding. It’s not everything. But a really solid grounding in how you can use AI agents. And the paper goes into an unbelievable amount more detail than that, but I was trying to keep it high level.
PR: Yes, yes, indeed. So then let’s talk about hallucination because that is obviously going to be a big concern for any bank taking on an AI project like this. Is it really, do you minimize the amount of hallucination purely because you’re only training it on certain things? I mean, how is it that you can manage that risk?
ST: You separate out what the LLMs are doing, right? So some LLMs happen to be really good at optical character recognition, so OCR, and it’s quite hard for them to hallucinate the smaller the task. So that’s one risk mitigation. Two is, use different LLMs for different feature sets. Three is, have them cross-check each other for consistency. And so you’ve got all of these fail-safes throughout. Another fail-safe is yes, train it and tightly constrain it to outputs, ensure your prompt is checking those outputs and ensure you have statistical models underneath it that are checking the quality of those outputs where you can. So the classic data science, the classic machine-learned models, the classic rules, the classic software is all still there, and it’s all still checking things. And then, sort of later down the process, you’re a bit like in machine learning; you would check for model drift, and you would have feature attribution. Here you’re looking for agent drift and sort of agent feature attribution. Like explain how you got to your decision, please agent, turns out to be a really useful way for it to think about, has it hallucinated? Asking it which data elements that it used, that had available to it, to draw that conclusion serves two purposes. One, that improves your accuracy because you go and look at the data itself. And then two, it allows you to ensure that because it is looking at the data, it’s more likely to respond based on the data. And , that’s why we’re getting the results we are. It’s a lot more to it than that, but those are some examples.
PR: So something you said there I thought was really interesting. You talked about how an agent can check another agent’s work. So these agents are interacting with each other, it sounds like. How does that work, and does that entail an even deeper level of oversight?
ST: No, so you’re not dealing with agent swarms. Really what you’re doing is, an agent here is a piece of software that actually is calling several LLMs inside of an individual workflow. Right? If we really unpack what’s happening, you might have Gemini doing your optical character recognition. And then you might have GPT-4 checking the output of what Gemini said to ensure that it believes it’s still the same. And then you might have a smaller model and da, da, da, da, da. So it’s not multiple agents that are working together in like a team. It’s like one single agent character. But, if the problem with the term AI agent is, there’s not really such a thing as an agent. There’s just software that leverages LLM, packaged as workflows.
PR: The agent is just sort of a term we’re using, but it’s code basically, right?
ST: It’s code, but it can do things that code historically couldn’t, that we relied on humans to do. So the definition to me of something that’s agentic is where you couldn’t have possibly written software historically, either a machine learning model or if-then-else statements and software, couldn’t have possibly solved it because there were just a Cambrian explosion of edge cases and possibilities. And there are many, many of these in financial services. Wherever you see humans doing basic stuff like that, that’s typically a great place for an LLM.
PR: Right, right. So the agentic oversight framework that we’re talking about here, it’s really, it’s focused on, you know, KYC, sanction screening, know, BSA/AML. Can you kind of scale it beyond those functions into other compliance or even non-compliance functions?
ST: Absolutely. And we’re proposing that people consider it for that. We just happen to have done it for a lot of compliance functions because there’s a lot of manual work there. Once you move out towards credit, you start getting into ECOA and fair lending, and the space gets significantly more complex, as I’m sure you know well. But that would be another area that eventually you could start to look at. But wherever there is a workflow, that there are lots of human bodies, then it’s absolutely plausible to start considering this framework if you wanted to demonstrate you have control. This is a framework we believe will do just that.
PR: Right. And in the white paper, you also provide some examples of the AOF in action. Maybe you could just run us through a couple of those.
ST: Yeah, sure. So there’s couple of case studies as well. Yeah, so I’ve talked a couple of times about KYC alerts. So when you’re onboarding, name, address, date of birth and SNN all have to be collected and verified. But when there are mismatches, it means that the consumer failed to provide something correct in a form, or it could be synthetic stolen identities. So the best practice generally is to ask the customer for additional stuff. So a passport, a national ID card, a driver’s license, and a selfie and aliveness check. And then just check that all of that matches. And then this is where you get into like thousands of edge cases. Could be across different cultures, date formats, all kinds of kind of different languages. And so, how do you deal with all of that? You train an agent on the CIP process that’s followed at the bank. And then the AI agent will take those steps that they see in that CIP set of procedures and then evaluate matches using context from the passport versus what the user inputted versus what it can find online versus da da da da da da. And it comes to a decision, you know, is the date of birth mismatched, but we still think this is a true connection? Or was an underlying database wrong about something? Or is this a synthetic identity? It will actually make that recommendation to the agent. And so we sampled this against 100 cases for one client and got a 97% precision on accept and a 92% precision decline for an overall accuracy of around 90%. So, that’s preliminary. But we think we can go a lot further. We’ve also been doing this in sanctions, PEPs, and adverse media alerts. And we have a little bit of it going on now with some merchant risk screening as well. So that’s one use case. I think we’ve run this now with a couple of clients. One is a Digital Asset Platform, who’s using it for sanctions and PEP alerts. And there’s another one, which is a fintech cards program, who’s using it for the step-up KYC use case. So this is in production and live.
PR: Okay. Okay. So say someone’s listening to this and they’re thinking, okay, this sounds interesting. I have nothing like this at my bank or fintech or whatever. How do I get started? What’s involved? What should I expect as sort of a rollout into production of this?
ST: The next phase of this white paper is a “how do you get to production.” And this white paper sort of gives you the clues without the checklist. We pulled that back. I’ll be honest, we ran out of time. Getting anything live in production, a lot depends on your financial institutions, internal tech estate, your data availability, your data readiness, your existing legacy providers, and your sort of organizational willingness. The thing I would say is, go get some demos. Like if you’re starting at zero, come get a Sardine demo. And even if all you do is use that and the white paper to start a conversation internally, I think that’s going to unblock you. And I’m sure there will be many Sardine competitors that will do something similar like this in one day. And we’ll end up with a better industry as a result. That would be step number one, get yourself a demo and send around this white paper. The real idea of it was to get people thinking differently. And step number two, if you did want to be so bold, would be to look at who are the people doing things like this? Sardine is one, there are others. And could I do a bit of a bake-off and start to see if I have my CIP procedures on my standard operating procedures, how could I get confidence using this framework that this is something that I could control and roll out into production?
PR: Okay, so let’s close with looking at the future. I think, and mind you, it’s hard because I don’t think there’s ever been an area of technology that’s moving as fast as AI and generative AI specifically. It’s just staggering the announcements that come up daily. But I think it’s fair to say we’re still in the 1.0, maybe 1.1 phase of agentic AI. So how much better will these tools be, let’s just say, 18 to 24 months out?
ST: It’s impossible to say, because how much will the base models change? The thing that won’t change is probably what you can anchor in. And it’s very unlikely that BSA/AML goes away. It’s very unlikely that your customers stop using mobile apps. It’s very unlikely that people… So you’re almost better off anchoring in that stuff. And then, if you assume that the models are going to get better, the question then becomes, how do I use those models with the risks that we can already see and sort of intercept that 18-month, 24-month timeframe where they are a lot better? So my argument would be if the precision and the accuracy is going to get better, if we’re going to learn more and more how to deal with hallucination risk, this then becomes a skill you have to learn. Yes, the models might get better. The models might be able to do more stuff themselves, but will they be able to do it in a way that’s regulated, private and secure. I think the biggest shift in the past 18 months has been from, “The foundation models will eat everything; why would you ever build at the application layer?” to, “Wow, that looks like a game that burns capex and is kind of really easy to replicate.” This application layer seems to be doing really, really well. And if you think about that second thing, then you start thinking about your critical third parties and you start thinking about policies, procedures, controls. How do I change all of that now from a standing start, knowing that I’m going into a September, October budget cycle for next year to be able to meaningfully adopt some of these technologies? That’s kind of how I’d be thinking if I was a CXO at a bank.
PR: You know, it’s almost like you, you want to dive in now because even though the things might get better over the next 18 to 24 months, it’s going to also get better if it has your particular application has been trained on your particular data. You’ve got results; you’re tweaking it yourself. That really makes sense to dive in because you can train, and your model can get better based on your own application, right?
ST: In the age of AI, context is king. The problem with context is it’s private, confidential. And that context, if it leaked, is potentially leaking all of your customers’ data. Do you want to send that to an LLM that’s in a cloud somewhere? Or do you want that inside of a secure platform that you own, control, can manage, have complete audit and governance acceptance over from a critical third-party standpoint? And then you can bring secure bits of LLMs plumbed into that in a defined agentic pathway. So that’s kind of our argument, which is like, just put this thing into a space where you can manage the risk. And the benefit of that is you also improve its performance because you bring it much closer to your context. And that’s the hardest thing to do in a secure way.
PR: Right. Well, it’s good place to leave it. Simon, it’s always great to chat with you, my friend. I appreciate you coming on the show, and best of luck with the rollout of your AOF and of course, best of luck with NerdCon and SardineCon.
ST: Thank you, sir. Doing all the things. You wait forever for Simon to launch something and he launches three things in the space of a week.
PR: Exactly. Okay. See you later.
I hope you have a little more understanding now about AI agents and how they can benefit your organization. If you’re not doing so already, the time to start experimenting with this is now. I think we are quickly moving away from the, “Let’s throw more bodies at this problem,” way of doing business. If you need to scale your compliance operation today, you can do that with AI agents today. This is no longer a thing that is happening in the future.
Anyway, that’s it for today’s show. If you enjoy these episodes, please go ahead and subscribe, tell a friend or leave a review. And thanks so much for listening.