Data collection and analysis


The main purpose of this course is to learn tools for automatic data collection (primarily, in R and Python). Maksym will teach the first half of the course (focusing on R) starting with business, economic and financial applications of web scraping together with related ethical considerations. Then we will study simple tools for data collection and analysis which do not require programming skills: Excel, Power BI and OpenRefine. Finally, we will consider automated data collection in R with numerous applications.


The Python part will introduce handy tools of working with complicated data structures that come in different formats. We will focus on the advantages of Beautiful soup. Another key feature that makes Python popular among scholars and practitioners is a simple toolkit for the natural language processing.


Throughout the duration of the course student will work in groups on their assignments. The course will expand the knowledge of R and Python obtained in the first two courses in the sequence. Students are expected to bring a laptop to class (as a lot of class time will be spent in a “lab” style) with all necessary programs installed.