Skip to main content

Introduction

Lighthouz is a multiagent AI platform where you can create bespoke AI agents to build AI applications. AI agents are meticulously designed to evaluate the accuracy, and to create fine tuning and synthetic testing datasets. This enhances the accuracy of AI applications.


Key Features

Lighthouz offers the following features:

1. EvalAgents: Launch bespoke AI agents that conduct semantic evaluations

  • Customize AI agents: You can customize and launch multi AI agents that are specialized to semantically evaluate AI applications.
  • Launch agents as API: Host agents on API endpoints with one-click to easily integrate with your code. Test & integrate quickly.
  • Pre-defined templates: Use one of the many pre-defined templates to launch your AI agents quickly.
  • Provide feedback to improve agents: Provide feedback to the agents when they make mistakes, so they dont repeat the mistakes again. Human-in-the-loop feedback improves agents over time.

2. Evaluation Studio: Evaluate LLM Applications with AI agents

  • Comprehensive agentic evaluations: Use AI agents to comprehensively benchmark your LLM application.
  • Detailed metrics: Couple semantic evaluations of AI agents with pre-defined syntactic evaluation metrics.
  • Insightful Feedback: Gain precise insights to refine your application.
  • Comparative Analysis: Seamlessly compare app versions.
  • Rapid iterations: Quickly test the impact on performance of prompts, LLMs, hyperparameters, etc. on your app's performance.

3. SynthBench: Create custom benchmarks of high-quality synthetic tests

  • Augment existing benchmarks with synthetic tests: Multiply your existing benchmarks with custom high-quality synthetic tests, based on your data, to do comprehensive evaluations for accuracy and hallucinations.
  • Create new synthetic benchmarks: Create custom application-specific synthetic test cases to assess critical aspects including hallucinations, toxicity, out-of-context responses, consistency to variations, PII data leaks, and prompt injections.
  • Full control and customizability: Edit the synthetic tests as per your requirements.
  • Provide feedback to improve synthetic test creation: Provide feedback to the generation agents, so they learn how to create tests better for your requirements.

Supported Tasks and Applications

  • RAG application: Specialized support for Retrieval-Augmented Generation models.
  • Summarization app: Evaluate summarization capability of your LLM app.
  • Classification apps: Evaluate classification capabilities of your LLM app.
  • Knowledge Extraction app: Evaluate knowledge extraction capabilities of your LLM app.
  • Complex LLM Apps: Broad support for end-to-end evaluation of LLM apps.