Biologists and chemists often find themselves trapped in a manual slog, staring at thousands of peptide sequences while trying to identify candidates with structural similarities. This process typically demands an exhausting amount of domain expertise and repetitive manual searching, creating a significant bottleneck in the early stages of drug discovery and protein engineering. The cognitive load required to interpret these sequences and map them to functional outcomes is immense, often slowing the pace of research to a crawl.
The Architecture of an AI Research Copilot
To break this bottleneck, a new research assistant has been developed by combining the Strands Agents SDK with Amazon Bedrock AgentCore. At the heart of this system is Amazon Bedrock AgentCore, a managed runtime designed for hosting AI agents that removes the burden of server infrastructure management from the developer. By integrating the Strands Agents SDK, the system can orchestrate a series of specialized tools to complete complex research workflows that would otherwise require multiple manual steps across different software packages.
The data layer relies on Amazon Aurora PostgreSQL-Compatible Edition, specifically utilizing the pgvector extension. This allows the system to store peptide structural characteristics as high-dimensional embeddings and perform high-speed similarity searches. To validate the system, researchers used a dataset of 1,000 linear peptide samples extracted from the IEDB virus epitope dataset. When a researcher enters a query in natural language, the system interprets the intent, extracts the necessary parameters, generates a corresponding embedding, and retrieves the most similar sequences from the database. The final output is not just a list of sequences but a scientifically contextualized summary, all delivered through a single conversational interface.
The Agents-as-Tools Pattern and Vector Logic
What makes this system different from a standard RAG pipeline is the implementation of an agents-as-tools pattern. A single orchestrator agent manages the entire workflow by calling three specialized tools: a parser, a searcher, and a summarizer. The orchestrator analyzes the user query to determine the optimal sequence of operations. The parser tool first converts the natural language input into structured parameters. To ensure the LLM understands how to use these functions, the developers used the @tool decorator along with precise docstrings and type hints.
@tool
def parser_tool(query: str):
"""Extracts structured parameters from a natural language query."""
return parser_agent.run(query)The searcher tool then bridges the gap between the LLM and the biological data. It leverages a protein embedding model hosted on SageMaker AI and combines it with pgvector similarity searches. The system converts the requested sequence into a vector and calculates the distance between that vector and the embeddings stored in the database to find the closest matches.
@tool
def searcher_tool(peptide_sequence: str, species: str = None):
"""Searches for similar peptides using embeddings and pgvector."""
embedding = sagemaker_endpoint.predict(peptide_sequence)
return db.similarity_search(embedding, species)Finally, the summarizer tool takes these raw search results and applies scientific context to generate a final analysis report. By wrapping the parser and summarizer as independent agents within the orchestrator's toolkit, the system maintains a separation of concerns where each agent has its own specialized prompt and expertise, while the orchestrator ensures the data flows logically from one stage to the next.
@tool
def summarizer_tool(search_results: list):
"""Summarizes search results with scientific context."""
return summarizer_agent.run(search_results)This logical flow is powered by the ESM-C 300M model developed by EvolutionaryScale. This model transforms amino acid sequences into 960-dimensional vectors that capture structural and functional properties. Because peptides with similar biological functions are positioned close to one another in this vector space, the system can perform similarity searches without the need for computationally expensive sequence alignment algorithms.
Optimizing for Serverless Inference and Deployment
To manage operational costs, the ESM-C 300M model is deployed on an Amazon SageMaker AI serverless endpoint. The configuration uses 6144 MB of memory, a maximum concurrency of 5, and a PyTorch 2.6.0 CPU inference container. The serverless nature of this setup ensures that costs are only incurred during active requests. To mitigate the inherent latency of serverless environments, the model weights are bundled directly into the deployment artifact rather than being downloaded from HuggingFace at runtime, which significantly reduces cold start delays.
The packaging process involves downloading weights via from_pretrained, saving the state dict, and bundling them into a model.tar.gz file compatible with the SageMaker AI code directory structure. The inference handler, predict_fn, encodes the input protein sequence and returns a mean-pooled embedding. While this setup is cost-effective, it introduces a cold start delay of 2 to 3 minutes after periods of inactivity. Once the container is warm, subsequent calls are completed within seconds. For environments requiring higher structural accuracy, the ESM-C 600M or ESM2 models serve as viable alternatives, depending on whether the priority is real-time response or depth of analysis.
Infrastructure automation is handled via AWS CloudFormation, which provisions the VPC, NAT Gateway, and VPC endpoints. By creating private subnets and dedicated endpoints for Amazon Bedrock, Amazon RDS Data API, and AWS Secrets Manager, the agent runtime operates entirely within a secure internal network. The use of the Amazon RDS Data API is particularly critical, as it allows the runtime to communicate with the database via HTTPS, eliminating the need for direct network connections or complex VPN and routing configurations.
The deployment pipeline uses AWS CodeBuild to build and deploy the containerized agent code, while the frontend runs on AWS Fargate. To maintain a lightweight footprint, the Fargate image contains only streamlit, pandas, and boto3, intentionally excluding heavy ML libraries to improve container start times and memory efficiency. The Streamlit UI communicates with the AgentCore runtime via the bedrock-agentcore boto3 client, providing researchers with a three-part interface: a section for parsed query parameters, a sortable table of similar peptides based on cosine distance, and a scientific summary section. This allows researchers to verify similarity scores and export results as CSV files for further downstream analysis.
Balancing Performance and Cost in Production
Scaling this system for production requires a careful calculation of the trade-off between memory and performance. For the database, Amazon Aurora Serverless v2 is configured to auto-scale between 0.5 and 4 ACU (Aurora Capacity Units), which corresponds to 1GB to 8GB of RAM. As the volume of peptide embeddings grows, the memory required for vector operations increases. Setting the ACU ceiling too low can lead to query failures or significant spikes in response time due to memory exhaustion.
While the initial validation used 1,000 samples, expanding to large-scale datasets introduces new cost variables. Processing tens of thousands of sequences requires repeated calls to the SageMaker AI endpoint, leading to cumulative inference costs. Furthermore, the time required for vector indexing via pgvector grows proportionally with the data size, necessitating a sophisticated migration and indexing strategy.
The final architectural choice depends on the research cadence. In environments where analysis is intermittent, the serverless endpoint is the most economical choice. However, as the scale of data increases, the focus shifts from simple search accuracy to the cost-efficiency of the embedding generation phase and the speed of index updates. The friction of manually comparing thousands of sequences is replaced by a single natural language query, provided the user is willing to trade a few minutes of cold-start latency for drastic cost reductions.
Conversational AI is shifting the biological research paradigm from manual sequence alignment to intuitive, vector-driven discovery.



