Transcript:
Hi, there! My name is mana cvetic. I recently graduated from MIT. Where model DB was part of my Phd thesis We’ve done start a company called Verda to continue working on model. DB and related tools. So today I’ll be giving a quick overview on where we are with model DB show a demo of v2 and point out some areas where we’re looking for help. Let me just preface by saying it’s super exciting to see so much interest around metadata Two years ago. When we started thinking about this, no one really cared, so this is. This is a good step, all right, so why model management, right, everyone here hopefully knows that model development is iterative. You’re gonna try hundreds of models before finding one that works on kaggle, you’ll routinely find out the top Cockers will submit hundred to, you know, almost 400 models before they win a competition and currently there isn’t a good way to keep track of these models and this causes problems that have been brought up before lack of reproducibility. How do I share my work with teammates? How do I look at history and search through models for the ones I care about? And how do I learn From Previous experiments? So our solution? There is a model management system. It’s a central repository that tracks model throughout their lifecycle, so not just training but also deployment and monitoring as we as we go along so model. DB was one of the first open source systems for model management were closely tied to the MIT database group stole. There’s research going on. There were open source. The version two is gonna be a patchy vtu license, so just putting that out there and we’ve been super lucky. In that. Large companies, particularly financial and insurance companies, have adopted this as a way to track their models and provide audit logs around Mo and since December of last year Model DB is available as a plugin for Coop flow, and that is actually introduced us to really awesome members of this community. Here’s a very quick overview of how model DB works. If you’re in the Jupiter notebook or a Python script, you import our library and then you logging calls so that that data, as well as artifacts can go sit in the model TV back-end the back end talks to the database, an artifact store and then everything is visualized through our web app. Here’s the status of model. DB a little bit for repeat, maybe. But v1 was integrated into code flow. December 18 v2 should be available in about one or two weeks and the key things we did. There was to rewrite the system from the ground up. We are very focused on scikit-learn and spark ML will rebuilt it. We have not extended it to support any Python based ML framework, we also have a more robust artifact store and the ability to plug in multiple storage backends, which is important for this community as well. We have ongoing integration with. Khateeb, where we’re trying to make the backend of Khateeb integrable. Not only with my sequel as it stands right now, but also with model. Zb cool, so I’m going to run a short video of our v2 demo. It’ll be in the words of my colleague Machine sitting over there. So like, feel free to find him later. You guys see ok, ok? Workflow, let’s say you’re performing grid search with a fully connected neural network on an end missed multi classification task anywhere. You’d like you can make calls to the model. DB client and log hyper parameters data sets you’re using recurring observations such as loss over time metrics such as accuracy and artifacts produce from your workflow, such as images and even the model itself afterward with the same library, you can recover your experiment and query runs by a hyper parameter sort biometric and recover all associate information after logging data to model DB. You can use our web app to see your projects complete with descriptions and tags. You can view an organized dashboard containing a summary of runs within the project, with data log from the client, You can even set filters using drag and drop and you can add and remove columns of information as needed. Finally, you can access more detailed views of runs. You’re interested in, okay, awesome, um? I hope that gives a good sense of what we’re up to you. Let me jump back, okay, all right, so that was. Michael, feel free to bug him later. If something looked interesting, all right, so that’s what that is. What were you will be releasing? In Model-t BB2 were also releasing a model. DB managed service for folks who are not on Coop Flow and would like to still use model management. The things in our roadmap are deeper integrations with TFP George Scikit-learn and there’s research projects going on so to open questions. I wanted to highlight, and these have already come up. One is how to drive community standards because a lot of people are realizing that metadata is important and we’re super excited to see that interest getting to some sort of best practices around how to engage, or how can we build something that can be used by multiple people at multiple companies and the second one is more on the education front model metadata or model management is not necessary. The first thing a data scientist thinks about so part of the thing that I think we need to do as a community is also educated. A scientist that this is in need and this is where we’re looking for a use cases to understand how maybe model management, health or lack of it hard to your company in the past. So definitely come chat with us. If that sounds relevant, we are an open source project so contribute contributions very welcome. I’m listing a few of them here. Mainly the Backend performance optimizations deeper integrations with other coop flow modules, a lot of feedback and co-authoring blog posts, too. We are based in Palo Alto. Now we moved here from Boston, so we’re holding a rather informal model DB hackathon. It’ll be a chance to just hang out with people in both stuff, so please feel free to find me or sign up there, so that’s model. DB plug for Verta and come. Find us later, thank you.