Format

  • Functional and self-contained notebook
  • Happy to see GitHub repos (which you can use as your portfolio in the job market)
  • Project report (30-ish pages - max. 45)
  • Some study relation (but that is debatable and not necessarily required)
  • Report is a (semi/non) technical documentation. Think about a corporate censor that you try to inform

Content

  • Problem formulation with some practical and theoretical motivation (no huge literature discussion)
  • Methodology (not a critical realist vs positivist discussion but some ideas about what can be concluded potentially)
  • Data sourcing and pre-processing strategy
  • Overall architecture of the model(s)
  • Modelling (incl. finetuning)
  • Results
  • Discussion / Conclusion

Scope

  • Uses different methods from the course (at least 2 modules) in a creative way
  • Downloading data from kaggle/github and running an ML model is probably not enough for a good performance
  • Creative combinations of methodologies, please:
    • combine financial data with social media data to look at equity development
    • extract information from text data and create networks. Use network indicators to supplement company data
  • Evaluation will focus on correct application and communication of DS methods
  • The level of “technicality” is as in the course with emphasis on application and intuition, not on ML engineering / mathematics
  • However, you will need to demonstrate insight into statistics on a level that is required to discuss your assignment e.g. interpret and discuss performance indicators, outline strategies for improvement e.g. under/oversampling