.. | ||
DOC.md | ||
README.md |
LoLLMs Anything to Markdown Library
Overview
The LoLLMs Anything to Markdown Library is a versatile JavaScript tool designed to convert various file types into Markdown format. This library simplifies the process of extracting text content from different file formats and presenting it in a universally readable Markdown structure.
Features
- Supports multiple file formats including plain text, DOCX, PDF, and PPTX
- Asynchronous file processing
- Object-oriented design for easy extensibility
- Basic Markdown conversion with room for customization
- Error handling for unsupported file types and processing errors
Installation
To use the LoLLMs Anything to Markdown Library, include the following script in your HTML file:
<script src="path/to/lollms-anything-to-markdown"></script>
Make sure to also include the necessary dependencies for processing specific file types:
- For DOCX: Mammoth.js
- For PDF: PDF.js
- For PPTX: A custom PptxTextExtractor library (not provided in this documentation)
Usage
Basic Usage
const lollmsFileLoader = new LollmsFileLoader();
async function handleFileUpload(event) {
const file = event.target.files[0];
if (!file) return;
try {
const markdown = await lollmsFileLoader.loadFile(file);
console.log(markdown);
// Use the markdown content as needed
} catch (error) {
console.error('Error processing file:', error);
alert('Error processing file: ' + error.message);
}
}
Supported File Types
The library supports the following file extensions:
- Plain text: txt, md, markdown, rtf, log, csv, json, xml, html, htm, css, js, py, java, c, cpp
- Microsoft Word: docx
- PDF: pdf
- Microsoft PowerPoint: pptx
API Reference
LollmsFileLoader Class
The main class of the library, responsible for loading and converting files.
Methods
constructor()
Initializes a new LollmsFileLoader instance.
async loadFile(file)
Loads and converts a file to Markdown.
- Parameters:
file
: File object to be processed
- Returns: A Promise that resolves with the Markdown content of the file
- Throws: Error if the file type is unsupported or if processing fails
readTextFile(file)
Reads the content of a text-based file.
- Parameters:
file
: File object to be read
- Returns: A Promise that resolves with the text content of the file
readDocxFile(file)
Extracts text content from a DOCX file.
- Parameters:
file
: DOCX File object to be processed
- Returns: A Promise that resolves with the extracted text content
readPdfFile(file)
Extracts text content from a PDF file.
- Parameters:
file
: PDF File object to be processed
- Returns: A Promise that resolves with the extracted text content
readPptxFile(file)
Extracts text content from a PPTX file.
- Parameters:
file
: PPTX File object to be processed
- Returns: A Promise that resolves with the extracted text content
convertToMarkdown(content, fileExtension)
Converts the extracted content to Markdown format.
- Parameters:
content
: String containing the extracted text contentfileExtension
: String representing the original file's extension
- Returns: A string containing the Markdown-formatted content
Extending the Library
The LoLLMs Anything to Markdown Library is designed to be easily extensible. Here are some ways you can extend its functionality:
- Add support for new file types by creating new read methods and adding the file extension to the
supportedExtensions
array. - Enhance the Markdown conversion logic in the
convertToMarkdown
method to handle more complex document structures. - Implement additional post-processing steps for specific file types.
- Add a caching mechanism to store processed files for quicker access.
Error Handling
The library uses a Promise-based approach for error handling. Errors are thrown in the following scenarios:
- Unsupported file type
- Failure to read or process a file
- Missing dependencies (e.g., PptxTextExtractor for PPTX files)
It's recommended to wrap the loadFile
method call in a try-catch block to handle these errors gracefully in your application.
Limitations
- The current implementation provides basic Markdown conversion and may not capture all formatting details from complex documents.
- Processing large files, especially PDFs with many pages, may be time-consuming.
- The library relies on external dependencies for processing DOCX, PDF, and PPTX files, which need to be included separately.
Contributing
Contributions to the LoLLMs Anything to Markdown Library are welcome. Please ensure that your code adheres to the existing style and includes appropriate test coverage for new features or bug fixes.
License
[Specify the license under which the library is released, e.g., MIT, Apache 2.0, etc.]
This documentation provides a comprehensive overview of the LoLLMs Anything to Markdown Library, including its features, usage instructions, API reference, and guidelines for extension. You can further customize this documentation to fit the specific needs and policies of your project.