lollms-webui/endpoints/docs/lollms_anything_to_markdown
2024-08-30 00:57:14 +02:00
..
DOC.md enhanced lollms client 2024-08-30 00:57:14 +02:00
README.md added new libraries 2024-08-19 23:12:58 +02:00

LoLLMs Anything to Markdown Library

Overview

The LoLLMs Anything to Markdown Library is a versatile JavaScript tool designed to convert various file types into Markdown format. This library simplifies the process of extracting text content from different file formats and presenting it in a universally readable Markdown structure.

Features

  • Supports multiple file formats including plain text, DOCX, PDF, and PPTX
  • Asynchronous file processing
  • Object-oriented design for easy extensibility
  • Basic Markdown conversion with room for customization
  • Error handling for unsupported file types and processing errors

Installation

To use the LoLLMs Anything to Markdown Library, include the following script in your HTML file:

<script src="path/to/lollms-anything-to-markdown"></script>

Make sure to also include the necessary dependencies for processing specific file types:

  • For DOCX: Mammoth.js
  • For PDF: PDF.js
  • For PPTX: A custom PptxTextExtractor library (not provided in this documentation)

Usage

Basic Usage

const lollmsFileLoader = new LollmsFileLoader();

async function handleFileUpload(event) {
    const file = event.target.files[0];
    if (!file) return;

    try {
        const markdown = await lollmsFileLoader.loadFile(file);
        console.log(markdown);
        // Use the markdown content as needed
    } catch (error) {
        console.error('Error processing file:', error);
        alert('Error processing file: ' + error.message);
    }
}

Supported File Types

The library supports the following file extensions:

  • Plain text: txt, md, markdown, rtf, log, csv, json, xml, html, htm, css, js, py, java, c, cpp
  • Microsoft Word: docx
  • PDF: pdf
  • Microsoft PowerPoint: pptx

API Reference

LollmsFileLoader Class

The main class of the library, responsible for loading and converting files.

Methods

constructor()

Initializes a new LollmsFileLoader instance.

async loadFile(file)

Loads and converts a file to Markdown.

  • Parameters:
    • file: File object to be processed
  • Returns: A Promise that resolves with the Markdown content of the file
  • Throws: Error if the file type is unsupported or if processing fails
readTextFile(file)

Reads the content of a text-based file.

  • Parameters:
    • file: File object to be read
  • Returns: A Promise that resolves with the text content of the file
readDocxFile(file)

Extracts text content from a DOCX file.

  • Parameters:
    • file: DOCX File object to be processed
  • Returns: A Promise that resolves with the extracted text content
readPdfFile(file)

Extracts text content from a PDF file.

  • Parameters:
    • file: PDF File object to be processed
  • Returns: A Promise that resolves with the extracted text content
readPptxFile(file)

Extracts text content from a PPTX file.

  • Parameters:
    • file: PPTX File object to be processed
  • Returns: A Promise that resolves with the extracted text content
convertToMarkdown(content, fileExtension)

Converts the extracted content to Markdown format.

  • Parameters:
    • content: String containing the extracted text content
    • fileExtension: String representing the original file's extension
  • Returns: A string containing the Markdown-formatted content

Extending the Library

The LoLLMs Anything to Markdown Library is designed to be easily extensible. Here are some ways you can extend its functionality:

  1. Add support for new file types by creating new read methods and adding the file extension to the supportedExtensions array.
  2. Enhance the Markdown conversion logic in the convertToMarkdown method to handle more complex document structures.
  3. Implement additional post-processing steps for specific file types.
  4. Add a caching mechanism to store processed files for quicker access.

Error Handling

The library uses a Promise-based approach for error handling. Errors are thrown in the following scenarios:

  • Unsupported file type
  • Failure to read or process a file
  • Missing dependencies (e.g., PptxTextExtractor for PPTX files)

It's recommended to wrap the loadFile method call in a try-catch block to handle these errors gracefully in your application.

Limitations

  • The current implementation provides basic Markdown conversion and may not capture all formatting details from complex documents.
  • Processing large files, especially PDFs with many pages, may be time-consuming.
  • The library relies on external dependencies for processing DOCX, PDF, and PPTX files, which need to be included separately.

Contributing

Contributions to the LoLLMs Anything to Markdown Library are welcome. Please ensure that your code adheres to the existing style and includes appropriate test coverage for new features or bug fixes.

License

[Specify the license under which the library is released, e.g., MIT, Apache 2.0, etc.]


This documentation provides a comprehensive overview of the LoLLMs Anything to Markdown Library, including its features, usage instructions, API reference, and guidelines for extension. You can further customize this documentation to fit the specific needs and policies of your project.