📄️ Individual tool evaluation
📄️ Retrieval Accuracy Evaluation
Overview
📄️ Comparison Between Different Methodologies
Fact Generation Methodologies
📄️ Evaluation of Retrieval Accuracy Using Different Prompts and Models
This evaluation assesses retrieval performance on a Discrete Annotated Synthetic Data Set (0-1 scoring), using Queryloop Retrieval prompts and G-eval scoring prompts across multiple versions of GPT-4 models.
📄️ Evaluation on Annotated Test Dataset
Generation with Continuous Scoring for Retrieval Evaluation
📄️ Faithfulness Evaluation Methodology
Overview
📄️ Evaluation of MSRpar Dataset and Improvements to Queryloop
This document presents a detailed overview of the evaluation methods and prompts used to assess chatbot response faithfulness. We have implemented two evaluation methods—Average Absolute Difference and Score Bracket Accuracy—and introduced an improved Queryloop prompt to provide structured evaluation feedback.
📄️ Analysis of RAGAS Prompt, Fact Generation, and Evaluation Methodology
This document summarizes the evaluation methodology, fact generation prompts, and result analysis for RAGAS. The goal is to establish a reliable system for assessing semantic and factual alignment between generated and ground truth answers in a structured manner.
📄️ Source Return Evaluation Process
This document outlines the methodology, dataset preparation, and evaluation procedures for a Source Return Evaluation task. The task focuses on combining question-answer pairs, assessing the factual correctness of sourced information, and evaluating the generated answers' adherence to ideal response lengths.