Transcript:
Welcome, this eight-minute video will give you a brief overview of desc, a parallel computing library for Python tasks, paralyzes many libraries in the Python ecosystem like Numpy pandas, scikit-learn and many others, Daskal aoz. Them to scale either on your laptop with multi-core and largely memory, parallelism or on large distributed clusters on the cloud or otherwise all while providing consistent user experience that stays true to the existing Python community of projects, libraries like Numpy pandas and scikit-learn are popular today because they combine high-level usable and flexible apis with high performance implementations. However, these libraries were not originally designed to scale beyond a single CPU or to data that doesn’t fit in memory, this often results in memory errors or switching libraries when you run into larger data sets, This is what desk can help fix if you replace the Numpy Library with tasks array, which uses an umpire under the hood. Then everything works well -. Garre can use numpy and scale it out to multi-core machines or large distributed clusters by integrating with existing libraries, tasks enables developers and data scientists to easily transition from traditional single machine workflows to parallel and distributed computing without learning new frameworks or word writing much code. This can be done anywhere we write. Python, including other libraries, automated scripts or Jupiter notebooks like this one where we read in some CSV data from the cloud into a distributed panda’s data frame. The desk framework provides everything we need to scale out distributed systems in an effective way as we’ve seen, it provides compatible AP is what we’re familiar with before It also provides access to cloud data stores or HDFS. It also provides ways to launch tasks on common deployment systems like committees or Yarn or Slurm or PBS. This helps us manage many workers in the cloud or elsewhere. These workers can perform some derivative computation, which we can see here, which task exposes to us with visible and interactive dashboards. Helping us understand what our computation does. All of these systems, combined to create a native and cohesive environment that enables Python users to scale their computation comfortably onto distributed hardware without needing significant expertise in distributed systems As a result of this cohesive experience, Daf has become a platform to build scalable computing into a broad variety of applications desk is flexible and so compared lies many different kinds of workflows, both in traditional Big Data, business computing space and also outside in many novel applications. Let’s see a few examples. We’ve already seen tasks. Data frame, which organizes many panas data frames in parallel for traditional tab or their computing and also Das Garre, which organizes many numpy arrays into a grid to paralyze multi dimensional computing, such as we often see in biomedical applications, simulation output, energy and the geosciences tasks also tightly integrates with scikit-learn through their comparable computing library job limb. Many scikit-learn models are already designed for parallelism on a single machine and can hand off this control to desk If it is installed in all of these cases, desk is tightly integrated with the core library, both in terms of user API the actual code that is run. Andal’s the communities behind the projects, the core Maintainer, Zuv, Numpy pandas and Scikit-learn are also the core Maintainer Z of tasks tasks is used in other situations as well, For example, the X-ray Jekt is commonly used in Earth System science. It uses desk for scalability but to find this own API that closely matches the needs of that community, they were able to plug in to ask internally to handle scalability while making different choices that made sense for them and their users. Similarly, the airflow developers when used asked when they made a successor Project Prefect for data flow automation again, they wanted to add in parallel execution to a framework that they were building and asked was a good fit. The Nvidia Rapid Project for GPU accelerated data science uses the asked under the hood for distributed execution networking load balancing while relying on their own GPU code for on node computing again, Dass was able to fit some of their needs, while giving them the space to build in their own systems. Finally, many people use just the internals of desk, a sophisticated, dynamic task scheduler to build a completely custom computations. This is very common in research or an algorithmic fields like finance. In all of these cases, tasks makes it easy to deploy on distributed. Hardware connecting data science users to compute resources, technically task is essentially managed distributed service with distributed execution and storage with the workers and peer-to-peer communication. It is easy to deploy on any major resource manager today, including kubernetes or other common cloud API S yarn for older Hadoop spark systems or any of the common HPC job queuing systems like PBS. Lsf storm or SGE task is also trivial to install and use on your local machine on a laptop. It requires no setup to use and is easy to install with. Condor Pip. Tasks even comes pre-installed with anaconda by default. So it’s quite likely that you have it already. You can start with your laptop and then when you’re ready easily transition to your institutions. Hardware, using one of the resource managers above alongside other services that you might be using in your data science experience. Finally, it’s worth mentioning that tasks like most of the Python data. Science ecosystem is an open source and community governed project, fiscally sponsored by Nunn Focus with an active developer and maintenance community. We encourage you to get involved. If you would like to learn more, you should visit our website at Das org or view examples that examples Das that ORG or see additional videos about tasks on our Youtube channel. Thank you for your time on behalf of the entire Dass community. We sincerely hope that this project helps you in your work.