mirror of
https://github.com/ParisNeo/lollms-webui.git
synced 2025-01-28 15:14:49 +00:00
158 lines
5.2 KiB
Markdown
158 lines
5.2 KiB
Markdown
|
# LoLLMs Anything to Markdown Library
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
The LoLLMs Anything to Markdown Library is a versatile JavaScript tool designed to convert various file types into Markdown format. This library simplifies the process of extracting text content from different file formats and presenting it in a universally readable Markdown structure.
|
||
|
|
||
|
## Features
|
||
|
|
||
|
- Supports multiple file formats including plain text, DOCX, PDF, and PPTX
|
||
|
- Asynchronous file processing
|
||
|
- Object-oriented design for easy extensibility
|
||
|
- Basic Markdown conversion with room for customization
|
||
|
- Error handling for unsupported file types and processing errors
|
||
|
|
||
|
## Installation
|
||
|
|
||
|
To use the LoLLMs Anything to Markdown Library, include the following script in your HTML file:
|
||
|
|
||
|
```html
|
||
|
<script src="path/to/lollms-anything-to-markdown"></script>
|
||
|
```
|
||
|
|
||
|
Make sure to also include the necessary dependencies for processing specific file types:
|
||
|
|
||
|
- For DOCX: [Mammoth.js](https://github.com/mwilliamson/mammoth.js)
|
||
|
- For PDF: [PDF.js](https://mozilla.github.io/pdf.js/)
|
||
|
- For PPTX: A custom PptxTextExtractor library (not provided in this documentation)
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
### Basic Usage
|
||
|
|
||
|
```javascript
|
||
|
const lollmsFileLoader = new LollmsFileLoader();
|
||
|
|
||
|
async function handleFileUpload(event) {
|
||
|
const file = event.target.files[0];
|
||
|
if (!file) return;
|
||
|
|
||
|
try {
|
||
|
const markdown = await lollmsFileLoader.loadFile(file);
|
||
|
console.log(markdown);
|
||
|
// Use the markdown content as needed
|
||
|
} catch (error) {
|
||
|
console.error('Error processing file:', error);
|
||
|
alert('Error processing file: ' + error.message);
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Supported File Types
|
||
|
|
||
|
The library supports the following file extensions:
|
||
|
|
||
|
- Plain text: txt, md, markdown, rtf, log, csv, json, xml, html, htm, css, js, py, java, c, cpp
|
||
|
- Microsoft Word: docx
|
||
|
- PDF: pdf
|
||
|
- Microsoft PowerPoint: pptx
|
||
|
|
||
|
## API Reference
|
||
|
|
||
|
### LollmsFileLoader Class
|
||
|
|
||
|
The main class of the library, responsible for loading and converting files.
|
||
|
|
||
|
#### Methods
|
||
|
|
||
|
##### `constructor()`
|
||
|
|
||
|
Initializes a new LollmsFileLoader instance.
|
||
|
|
||
|
##### `async loadFile(file)`
|
||
|
|
||
|
Loads and converts a file to Markdown.
|
||
|
|
||
|
- Parameters:
|
||
|
- `file`: File object to be processed
|
||
|
- Returns: A Promise that resolves with the Markdown content of the file
|
||
|
- Throws: Error if the file type is unsupported or if processing fails
|
||
|
|
||
|
##### `readTextFile(file)`
|
||
|
|
||
|
Reads the content of a text-based file.
|
||
|
|
||
|
- Parameters:
|
||
|
- `file`: File object to be read
|
||
|
- Returns: A Promise that resolves with the text content of the file
|
||
|
|
||
|
##### `readDocxFile(file)`
|
||
|
|
||
|
Extracts text content from a DOCX file.
|
||
|
|
||
|
- Parameters:
|
||
|
- `file`: DOCX File object to be processed
|
||
|
- Returns: A Promise that resolves with the extracted text content
|
||
|
|
||
|
##### `readPdfFile(file)`
|
||
|
|
||
|
Extracts text content from a PDF file.
|
||
|
|
||
|
- Parameters:
|
||
|
- `file`: PDF File object to be processed
|
||
|
- Returns: A Promise that resolves with the extracted text content
|
||
|
|
||
|
##### `readPptxFile(file)`
|
||
|
|
||
|
Extracts text content from a PPTX file.
|
||
|
|
||
|
- Parameters:
|
||
|
- `file`: PPTX File object to be processed
|
||
|
- Returns: A Promise that resolves with the extracted text content
|
||
|
|
||
|
##### `convertToMarkdown(content, fileExtension)`
|
||
|
|
||
|
Converts the extracted content to Markdown format.
|
||
|
|
||
|
- Parameters:
|
||
|
- `content`: String containing the extracted text content
|
||
|
- `fileExtension`: String representing the original file's extension
|
||
|
- Returns: A string containing the Markdown-formatted content
|
||
|
|
||
|
## Extending the Library
|
||
|
|
||
|
The LoLLMs Anything to Markdown Library is designed to be easily extensible. Here are some ways you can extend its functionality:
|
||
|
|
||
|
1. Add support for new file types by creating new read methods and adding the file extension to the `supportedExtensions` array.
|
||
|
2. Enhance the Markdown conversion logic in the `convertToMarkdown` method to handle more complex document structures.
|
||
|
3. Implement additional post-processing steps for specific file types.
|
||
|
4. Add a caching mechanism to store processed files for quicker access.
|
||
|
|
||
|
## Error Handling
|
||
|
|
||
|
The library uses a Promise-based approach for error handling. Errors are thrown in the following scenarios:
|
||
|
|
||
|
- Unsupported file type
|
||
|
- Failure to read or process a file
|
||
|
- Missing dependencies (e.g., PptxTextExtractor for PPTX files)
|
||
|
|
||
|
It's recommended to wrap the `loadFile` method call in a try-catch block to handle these errors gracefully in your application.
|
||
|
|
||
|
## Limitations
|
||
|
|
||
|
- The current implementation provides basic Markdown conversion and may not capture all formatting details from complex documents.
|
||
|
- Processing large files, especially PDFs with many pages, may be time-consuming.
|
||
|
- The library relies on external dependencies for processing DOCX, PDF, and PPTX files, which need to be included separately.
|
||
|
|
||
|
## Contributing
|
||
|
|
||
|
Contributions to the LoLLMs Anything to Markdown Library are welcome. Please ensure that your code adheres to the existing style and includes appropriate test coverage for new features or bug fixes.
|
||
|
|
||
|
## License
|
||
|
|
||
|
[Specify the license under which the library is released, e.g., MIT, Apache 2.0, etc.]
|
||
|
|
||
|
---
|
||
|
|
||
|
This documentation provides a comprehensive overview of the LoLLMs Anything to Markdown Library, including its features, usage instructions, API reference, and guidelines for extension. You can further customize this documentation to fit the specific needs and policies of your project.
|