top of page

The Art of Analyzing Big Data - The Data Scientist’s Toolbox Course

Ben-Gurion University of the Negev

white-background-with-blue-tech-hexagon_1017-19366.jpg

The course discusses principles, methodologies, and techniques for mining massive datasets. In this course, we learned how to perform common data mining tasks, such as classification, clustering analysis, and recommendation on large datasets using the principles of parallel and distributed processing such as Map-Reduce. During the course, I had the opportunity to use state-of-the-art technologies for massive data mining. Each of the tasks tackles different machine learning and big data issues. Course web site.

It includes visualizing the daily Covid-19 cases in Ohio over time that gives us a high-level perspective of the infection rate and the so-called "waves". There are further in-depth analyses of hospitalization, vaccination, unemployment, and school data.

Test Results Reported Positive Map - County-level
Play Video
final_2.png
final_2.png
final_3.png
final.png

Identifing teams of 3 members, competing together for more then 10 competitions in Kaggle competiotions.
For each community, multiple centrality measures where calculated:

1. degree_centrality    2. pagerank    3. closeness_centrality

final_3
Play Video
final_2
Play Video

Predicting Sucess and extra ordinary achievement.

In this section a model is trained to predict an athlete achievement based on the physical features, sport type, and the athlete's country. If a sufficiently trained model predicts that an athlete would loss, and yet the athlete wins a gold medal, this is an extra ordinary achievement. This holds to the other direction as well. If a trained model predicts that the athlete would win a gold medal, and the athlete losses all medals, then this is a disappointing loss.

white-background-with-blue-tech-hexagon_1017-19366 (1).jpg
Assignments

Big Data Analysis

Assignment 1 - DB, SQL, various datasets, sqlite3 package

ass1.png

Collecting, Analyzing, and Visualizing Data with Python

Assignment 2 - Scraping with beautiful soup, working with API's and pandas, networkx.
Assignment 3 - Data visualization using turicreate, pandas and seaborn.
ass3.png

Analyzing Massive Graphs

Assignment 4 - Working with graphs.
Assignment 5 - Link predictions and graph analysis.
ass4_2.png
ass5.png
ass4.png

Big Data Visualization

Assignment 10 - pySpark, heatmap visualizations and folium.

​
ass10.png

From Unstructured Text to Structured Data

Assignment 6 - NLP and Sentiment analysis and classification.
Assignment 7 - From Unstructured Text to Structured Data - NLP, entity extraction, networks and visualization.
ass7.png
ass7_2.png

Working with GEOLocation Data

Assignment 8 - Geopandas, plotly express and foluim.
Assignment 9 - Extracting Data from Images and Sounds - working with pySpark, classifiers map visualization and more.
ass8.png
ass8_2.png
bottom of page