vrijdag 18 december 2020
This workshop about Spark is organized by and for Most Data, the Yacht community for Data professionals. For this workshop we team with Pipple’s CTO Ruud Mullers, who will lead the workshop. It is part of a series of three online data science workshops: 1) Coding Standards and Logging in Python, 2) Machine Learning Exploration in Python and 3) Spark.
More and more data is generated in every company, which means that we are increasingly running against the limits of our servers / computers. Fortunately, when increasing your CPU, GPU or RAM becomes too expensive or no longer sufficient, there are nowadays techniques to scale horizontally, i.e. ways to use multiple computers / nodes. During this workshop you will get an introduction to distributed working using Hadoop and Spark, with a focus on reprocessing and training models using pyspark.
After the basic information has been explained during the theory piece, you will work in a group with a large dataset and corresponding case on a Spark cluster. Also showing the difference between single node and distributed work. At the end of the workshop, the results of the different teams are compared and a team is declared the winner.
During the workshop Google Colab will be used. Make sure you bring a laptop with Google Colab properly running. This workshop requires you to have at least basic experience in working with Python. The workshop will be held via a Google Meet/Hangout videocall. When you sign up via the form below, you will receive the link to join this workshop.
Women in Data Science (WiDS) Amsterdam is an independent conference organized by Pipple Thursday September 24th, under the banner of Stanford University. The event is followed by a series of online lectures in September and October. Yacht supports diversity and inclusion and are proud to be a partner of Women in Data Science Amsterdam.
Friday December 18th, 14:00 - 17.00 hrs.
Please contact me