London Kaggle hackathon: how to make friends and influence data

Kaggle hackathon is a meetup that gives data scientists the opportunity to make friends in the industry and solve challenging and exciting data problems.

The day is sprint of tackling cool data projects, sharing interesting ideas, as well as a chance to meet new people and build relationships across the sector. It ends with drinks, which is also a big part of the fun!

The events are organised by the London Data Science Workshop, and the projects that data scientists are invited to work on are part of Kaggle, an online community where data scientists get challenged with real-word machine learning problems.

If you want to know additional information about such events please get in touch with London Data Science Workshop.

The Kaggle Hackathon at ZPG

On 27 January 2018 about 60 data hackers, scientists and analysts met in ZPG’s trendy London office to take on several competitions available on the Kaggle platform. At the end of the day people had the chance to present their findings to the whole audience.

To give an idea of what types of machine learning problems you can collaborate on, here’s a brief summary of the three most popular ones from the January workshop.

The toxic comment classification

This challenge is about building a model that is capable of detecting different types of toxicity like threats, obscenity, insults, and identity-based hate. The dataset is wiki corpus dataset which was rated by human raters for toxicity.

The corpus contains 63 million comments from discussions relating to user pages and articles dating from 2004 - 2015. About ten people worked on this project, sharing ideas and early results.

Shannon, Head of Analytics at uSwitch ZPG, says about the challenge:

“I hadn’t used Python much before to transform free text fields so I was introduced for the first time to the nltk package.

“At the end of the morning I had a few new data transformation techniques under my belt and my faith in humanity had dropped considerably; there were some truly vile comments in the dataset, which hit home the need to accurately flag such comments in an automated way.”

All the information about this challenge can be found here.

The 2018 Data Science Bowl

The idea of this challenge is that to spot nuclei allows to speed up cures for people who suffer from diseases like cancer, heart disease, chronic obstructive pulmonary disease, Alzheimer’s, and diabetes. The 2018 Data Science Bowl propose the following mission: create an algorithm to automate nucleus detection.
Participants are asked to create a computer model that can identify a range of nuclei across varied conditions. By observing patterns, asking questions, and building a model, participants will have a chance to push state-of-the-art technology further.

Mercari price suggestion challenge

The challenge is about building an algorithm that automatically suggests the right product prices.
Participants are provided with user-inputted text descriptions of the products, including details like product category name, brand name, and item condition.
More details about this challenge can be found here.

Outside of these challenges, there were many people who focussed on the introductory Titanic dataset competition, found here. In this challenge, participants were asked to apply the tools of machine learning to predict which passengers survived the tragedy of Titanic. Skills that you may find useful to have for this challenge are binary classification, logistic regression, Python and R basics.