Step 2: Collections and documents
To empower our assistant with knowledge, integrating extensive documentation serves as the foundation for its continuous learning. This task involves strategically loading documents that will be part of various collections.
For our example, we'll load documents containing museum information. The first step is to create a collection for these documents.
Create collection
Let's navigate to the Collections section from the sidebar menu, and once there, click on the button .
A window will open for us to fill in the information for our collection. We'll enter the following information:
Field | Description | Value |
---|---|---|
Title | Name of our collection | Museums |
Description | Brief description of the collection. | Documents containing information about museums |
Tags | List of tags related to the collection. | |
Access mode | Select the access mode to the collection's documents. Public access or members only. | Acceso público |
Members | List of members who will have access to the documents and their associated privilege. |
Once all the necessary information is provided, click on the Create button.
Now we'll have our empty collection ready to upload our documents about museums.
Uploading Documents
In our new museums collection, click on the button to add our documents containing all the information we want our future assistant to learn.
In this example, we've used documents about Spanish museums that you can find at the following link: museum documents.
Through a window, we can provide all the information about the document we are uploading. You can upload an external file (PDF, Word, Presentation, Spreadsheet) or you can write the text using the Markdown editor.
Let's now explore all the information we can provide about our document and the values we'll set for our example:
Field | Description | Value |
---|---|---|
Collection | Associated collection for the document. | Museums |
Title | Document name. | Guggenheim |
Description | Brief document description. | Information about the Guggenheim museum |
Document Source | Choose the document's source. Markdown editor or existing file. | Existing file |
Audiences | Target audiences for the document. | Global |
Language | Document language. | Spanish |
Do Not Advertise | Indicates if the document should be advertised or not. | Disabled |
Tags | List of document-related tags. |
We'll follow this procedure for all the documents we want to upload into our museum collection. Once all the documents are uploaded in our collection, we'll be able to access them in both the collections section and the documents section of the sidebar menu.
Museum collection | Museum documents |
Split method
Once the document is uploaded, on its detail page, you can view all its related information. You'll also be able to edit various fields such as the title, description, audiences,...
An important field to consider for your documents is the split method. This field determines how Ainhoa will extract information from the document based on its structure.
We have the following options available:
Option | Description |
---|---|
Auto | Ainhoa selects the most appropriate method based on the document. |
None | No splitting is done in the document text. Useful when the document is simple and doesn't require separation into defined sections or structures. |
Paragraph | This method is used when the document is organized into clear sections with distinctive titles. Ainhoa will extract information in the form of paragraphs, aiding comprehension and allowing for a more coherent reading of information. |
Block | When a document lacks a clear structure or is a scanned document, this method is useful. Ainhoa will extract the text in blocks. |
Selecting the appropriate splitting method is crucial for Ainhoa to understand and process the document's information optimally, adapting to the specific layout and format to provide accurate and coherent results.
In our example, we'll choose the Auto
option for all documents.
Train collection
After uploading our documents, we need to train them. This stage is crucial as it enables Ainhoa to learn and comprehend the content of these documents, allowing her to assimilate information effectively. During this process, Ainhoa actively processes the data, identifying patterns, concepts, and relationships within the document information. This learning enables her to respond accurately and contextually to user queries or requests based on the information contained in those documents.
We can carry out the training process document by document or, as in our case, train all the documents in the collection.
On the museums collection page, we click the Train button, and shortly afterward, all the collection's documents will be ready for use.
We can go to the detail of one of the documents and check in the Passages tab how Ainhoa has extracted the information from the document.
Great, now our museum information is ready. Next, let's explore how to perform searches on this information.