Paul Berg
🌞
🌛
GitHub icon
Spark
An analysis on US flights and cascading failures using PySpark
09/09/2020 · 7 minutes
Python
Spark
Introduction In this blog post, we are going to study a dataset of US only flights during the year 2007. The dataset was released by the American Statistical Association as part of their Bi-Annual Data exposition. During the competition, participants were asked to focus on only one question and try answering it by investigating the dataset. The question we are going to try to answer is: Can you detect cascading failures as delays in one airport create delays in others?