
The course discusses principles, methodologies, and techniques for mining massive datasets. In this course, we learned how to perform common data mining tasks, such as classification, clustering analysis, and recommendation on large datasets using the principles of parallel and distributed processing such as Map-Reduce. During the course, I had the opportunity to use state-of-the-art technologies for massive data mining. Each of the tasks tackles different machine learning and big data issues. Course web site.
It includes visualizing the daily Covid-19 cases in Ohio over time that gives us a high-level perspective of the infection rate and the so-called "waves". There are further in-depth analyses of hospitalization, vaccination, unemployment, and school data.





Identifing teams of 3 members, competing together for more then 10 competitions in Kaggle competiotions.
For each community, multiple centrality measures where calculated:
1. degree_centrality 2. pagerank 3. closeness_centrality


Predicting Sucess and extra ordinary achievement.
In this section a model is trained to predict an athlete achievement based on the physical features, sport type, and the athlete's country. If a sufficiently trained model predicts that an athlete would loss, and yet the athlete wins a gold medal, this is an extra ordinary achievement. This holds to the other direction as well. If a trained model predicts that the athlete would win a gold medal, and the athlete losses all medals, then this is a disappointing loss.

Assignments
Big Data Analysis
Assignment 1 - DB, SQL, various datasets, sqlite3 package

Collecting, Analyzing, and Visualizing Data with Python
Assignment 2 - Scraping with beautiful soup, working with API's and pandas, networkx.
Assignment 3 - Data visualization using turicreate, pandas and seaborn.

Analyzing Massive Graphs
Assignment 4 - Working with graphs.
Assignment 5 - Link predictions and graph analysis.



From Unstructured Text to Structured Data
Assignment 6 - NLP and Sentiment analysis and classification.
Assignment 7 - From Unstructured Text to Structured Data - NLP, entity extraction, networks and visualization.


Working with GEOLocation Data
Assignment 8 - Geopandas, plotly express and foluim.
Assignment 9 - Extracting Data from Images and Sounds - working with pySpark, classifiers map visualization and more.

