Knowledge Bases for Amazon Bedrock
You can provide FMs and agents with contextual information from your company’s private data sources with Knowledge Bases for Amazon Bedrock, enabling RAG to provide more precise, tailored, and pertinent responses.
Additional data interfaces are now supported by Knowledge Bases for Amazon Bedrock.
Foundation models (FMs) and agents can retrieve contextual data for Retrieval Augmented Generation (RAG) from your company’s private data source by using Knowledge Bases for Amazon Bedrock. FMs may respond with greater relevance, accuracy, and customization when they use RAG.
AWS have been steadily adding options for embedding models, vector stores, and FMs to Knowledge Bases over the last few months.
AWS is happy to announce that you may now link your web domains, Confluence, Salesforce, and SharePoint as data sources to your RAG apps, in addition to Amazon Simple Storage Service (Amazon S3).
Confluence, Salesforce, SharePoint, and web domains now have new data source connectors
To improve the relevancy, speed, and thoroughness of responses to user inputs, you can provide your RAG applications with access to your public data, such as your company’s social network feeds, by integrating your web domains. You can now incorporate your current company data source like Salesforce, SharePoint, and Confluence into your RAG apps by using the new connectors.
Allow me to demonstrate this to you. In the ensuing instances, they’ll incorporate a web domain and link Confluence to a knowledge base as a data source using the web crawler. There is a similar trend when connecting SharePoint and Salesforce as data sources.
Include a website as a data source
Go to the Amazon Bedrock console and build a knowledge base to give it a try. Give the name and description of the knowledge base, as well as the necessary AWS Identity and Access Management (IAM) permissions to either create a new service role or use an already-existing one.
Next, decide the data source to utilize. You may choice the Web Crawler.
The web crawler is configured in the following step. Lets specify the name and description of the data source used by the web crawler. They define the source URLs after that.
You’ve added the URL of my author page on my AWS News Blog, which contains a list of all of your posts, for this demo. Up to ten seed URLs, or starting points, of the websites you wish to crawl, can be added.
Custom encryption settings and a data deletion policy that specifies whether vector store data is kept or erased upon deletion of the data source are optional configuration options.
Just stick with the advanced options by default.
You can set the maximum number of URLs to crawl per minute, the amount of sync domains to use, and regular expression patterns to include or exclude specific URLs under the sync scope section.
Once the web crawler data source setting is complete, choose your preferred vector store and choose an embeddings model to finish the knowledge base setup. After creation, you can monitor the data source sync status by looking at the knowledge base details. You can test the knowledge base and view FM replies with web URLs as citations once the sync is finished.
The AWS Command Line Interface (AWS CLI) and AWS SDKs can be used to create data source programmatically.
Add Confluence as a data source connection
Let’s choose Confluence as their data source in the knowledge base configuration now.
In order to set up Confluence as a data source, you enter the Confluence URL, select the hosting option, and give the data source a new name and description.
You have the option to connect to Confluence using either basic or OAuth 2.0 authentication. You’ve gone with Base authentication for this sample, which requires a password (Confluence API token) and user name (your email address associated with your Confluence user account). You select the secret and keep the necessary credentials in AWS Secrets Manager.
Note: Verify that your IAM service role for Knowledge Bases has the necessary rights to access this secret in Secrets Manager and that the secret name begins with “AmazonBedrock.”
You can adjust the content chunking and parsing approach as well as the regular expression include and exclude patterns in the metadata settings to govern the scope of content you want to crawl. Once the Confluence data source configuration is complete, choose your preferred vector store and choose an embeddings model to finish the knowledge base setup.
After creation, you can monitor the data source sync status by looking at the knowledge base details. You are able to test the knowledge base once the sync has finished. you’ve updated my Confluence space with some fictitious meeting notes just for this demo.
Important information
Filters for inclusion and exclusion You can have precise control over the data that is retrieved from a particular source by using inclusion and exclusion filters, which are supported by all data source. Internet Scours Keep in mind that you can only use the web crawler on websites that you own or have permission to crawl.
Amazon Bedrock pricing
In every AWS Region where Knowledge Bases for Amazon Bedrock are accessible, the new data source connections are now available. For further information and upcoming improvements, see the Region list. Check out the Amazon Bedrock product page for additional information on Knowledge Bases for Amazon Bedrock . See the Amazon Bedrock pricing page for specific pricing information.