- Functional and self-contained notebook
- Happy to see GitHub repos (which you can use as your portfolio in the job market)
- Project report (30-ish pages - max. 45)
- Some study relation (but that is debatable and not necessarily required)
- Report is a (semi/non) technical documentation. Think about a corporate censor that you try to inform
Content
- Problem formulation with some practical and theoretical motivation (no huge literature discussion)
- Methodology (not a critical realist vs positivist discussion but some ideas about what can be concluded potentially)
- Data sourcing and pre-processing strategy
- Overall architecture of the model(s)
- Modelling (incl. finetuning)
- Results
- Discussion / Conclusion
Scope
- Uses different methods from the course (at least 2 modules) in a creative way
- Downloading data from kaggle/github and running an ML model is probably not enough for a good performance
- Creative combinations of methodologies, please:
- combine financial data with social media data to look at equity development
- extract information from text data and create networks. Use network indicators to supplement company data
- Evaluation will focus on correct application and communication of DS methods
- The level of “technicality” is as in the course with emphasis on application and intuition, not on ML engineering / mathematics
- However, you will need to demonstrate insight into statistics on a level that is required to discuss your assignment e.g. interpret and discuss performance indicators, outline strategies for improvement e.g. under/oversampling