Group Assignment 3

Portfolio Exercise 3: GPT Models

Note: M3 - Group Assignment 3 Deadline: Wednesday 28th of February at 12:00 PM

LangChain Cheat Sheet

Introduction

This assignment focuses on leveraging retrieval-augmented generation (RAG) techniques, particularly in the context of extracting and synthesizing information from various documents (or a document). You’ll be using Langchain to implement these concepts and create a system that not only generates responses but also retrieves relevant information from a database.

Objective

Task Description

Your task is to create a system that uses RAG for extracting information from a set of documents or a document which can be either a scientific paper or report. This involves integrating a database to store vectors of document information and designing customized prompts to effectively use GPT models for generation. Here are some project ideas:

  1. Build a QA system that retrieves information from a given set of documents (or a document) to answer complex queries.
  2. Develop a tool for summarizing research papers, where the system extracts key points from a database of paper vectors.
  3. Create a recommendation engine that suggests content based on user queries and retrieved document data.
  4. Explore other innovative applications of RAG, such as automated content generation, data analysis, or any other creative use case you can envision.

Key Components

  • Database Integration: Set up a database to store and retrieve vectors representing document information.
  • Customized Prompts: Design and implement prompts that effectively utilize GPT models for generation based on retrieved data.
  • RAG Implementation: Use Langchain to integrate retrieval-augmented generation in your system.

Data

  • Utilize open-source datasets or create your own corpus of documents for retrieval.
  • Ensure the chosen datasets are suitable for demonstrating the capabilities of your RAG system.

Delivery

  • Create a dedicated GitHub repository for this assignment.
  • Store all relevant materials, including the Colab notebook, in the repository.
  • Provide a README.md file with a concise description of the assignment and its components.
  • You may work individually or in groups of up to three members.
  • Submit your work by emailing a link to the repository to Hamid (hamidb@business.aau.dk).