Transcript:
I’m going to be sharing some of the work that we’ve been doing at Microsoft. In reinforcement learning, we’ve been looking at a lot of applications of AI and thoroughly you’re building all these systems and an apps and services that take advantage of all the capabilities that are coming to light these days. If you look at the sort of things we can do, it’s amazing how, with speech and language and understanding what’s in our data, we can expand the power of our applications, but if we look at what these services and what these capabilities truly do, they help us interact with the physical world and with users better, they help us get this kind of like x-ray vision and understanding of what’s going on in the data, but there’s something that makes whatever efforts we put at making our systems with. Ai kind of capped kind of it sucks. No matter how better we go that we add our data, which is in the end in the middle of all this. There are business rules right there. Here’s where the dirty little secret of our apps is that for all the machine learning on one side and all the fantastic user interaction on the other there is this like bunch of nested ifs sitting in the middle, whereas a bunch of hard-coded values and thresholds and assumptions about how the world works then in the end cap. How many results you can get out of your applications? So the question is like then? How can we bring? Ai to that area to the area of the business rules and how the business operates and the type of Ai that can do that is called reinforcement learning in reinforcement learning. Essentially, you take the notion of an agent that is operating in an environment that includes both your users and your data and there’s a notion of a world or an environment and the agent tries things decides to take certain actions on the world and then learns by observing the outcome of those actions and seeing how well they track to something you’re trying to achieve with it and through this cycle, it learns how to decide and reinforcement learning is a type of. Ai that deals with finding out these good decisions. Now we say, okay, what happens if we want to do this? In the real world, you you read a lot about reinforcement learning, you see like how it went, boar games and how you can, you know, get superhuman performance in video games and a lot of that happens in simulation and simulations great. So what happens if we want to do it in the real world well? I think turns out there’s a lot of things that become really complicated if you want to train things outside simulation and bust AI outside of that. So some of the things that happen is that first of all reality hits you with full complexity, you can’t make baby steps in reality, we really don’t understand the curriculum, so as we’re trying to build services that help you make these business rules with. Ai, we need to be able to learn very fast and adapting from a world that hits you day one in the face with it’s all its messiness and its richness. Then time is inexorable, there’s. No, rewind this, no. What is we can’t say stops up, show this other thing to this user to see what they would have done, right, and you cannot compare. We have to make the most in our services to extract as much information as possible and harvest as much conclusions for every data point, then baselines are shifting. The world is changing. You don’t know if what you have got last week, you should expect from next. It’s better, not even to assume that then there’s a real constraint in which in simulations you can just, you know, insert another quarter in the cloud, and it gives you a billion data points and that’s fine in the real world you have these few users coming in, and you have to be able to respond in milliseconds and train to every single event in real time. You can’t wait a week to push out a new modeling example. We push out a new model in MSN News. Every 15 minutes trained on every single event that has just happened and finally that user data is valuable and scarce as we mentioned. And importantly, there’s no undo things have true consequences like you can’t. Just try things and say. Oh, look, that crazy thing found out this crazy conclusion that was funny to look in an animation. When you’re talking about things that operate in the real world, you have to understand that your exploration has consequences and that the ethics has to be designed from the ground up in these systems, you know, from the team culture, all the way up through like guidelines for users and the monitoring tools we use so with all this, these considerations, what can we provide? We can we’re building a series of services that allow you to do Real-time personalization, contextual optimization, real-time adaptation of autonomous systems to their context and even learning from each other, like, imagine, you could have a system that teaches your support staff how to close cases based on the performance of your best support workers. These are all applications of real enforcement learning in the real world that we’re working on and to wrap it up a bit. You know, when we’re talking about the agent and the world, we can’t forget that the agent and we, the people who make them are part of this world, and you know, as we bring out. Ai that has the ability to achieve any result. We ask for it. We hope that you have a place for the world. The people in it in the goals, you give it. Thank you very much you.