We are looking for a Senior Backend Engineer specialized in Generative AI to design agent workflows, optimize interactions with models (OpenAI, AWS Bedrock), and ensure the reliability of non-deterministic systems in production.
Tech Stack: Python (Asyncio), FastAPI, LangChain, LangGraph, Pydantic, Elasticsearch, AWS Bedrock / OpenAI API, LangSmith.
What You'll Do
Agent Architecture: Design and implement complex agent orchestration logic using LangGraph. You will define state management, conditional routing, and error handling within the agent graph.
Tool Engineering: Build and optimize the tool layer (function calling) that allows LLMs to interact with internal financial APIs and databases accurately.
Performance Optimization:
-Reduce end-to-end latency through asynchronous processing and streaming (SSE).
-Implement semantic caching strategies to minimize API costs and response time.
-Optimize token usage without sacrificing answer quality.
Observability & Evaluation: Implement automated evaluation pipelines using LangSmith. You will be responsible for setting up regression testing for prompts and agents to measure quality (correctness, faithfulness) before deployment.
Advanced RAG: Refine retrieval strategies. Work on hybrid search implementation (Keyword + Vector), re-ranking, and query expansion to feed the most relevant context to the model.
Requirements: Requirements:
Python Expert: Strong proficiency in modern Python. Deep understanding of asynchronous programming (asyncio) patterns is mandatory, as our entire I/O pipeline (Network, DB, LLM) is non-blocking. Experience with FastAPI and Pydantic (v2).
Agentic Frameworks: Production experience with LangChain. Hands-on experience or deep conceptual understanding of LangGraph (or similar state-machine based agent frameworks).
Deep LLM Expertise (What we mean by "Deep"):
Non-determinism Management: Strategies for handling LLM hallucinations and ensuring reliable outputs (e.g., self-correction loops, specific prompting techniques like CoT/ReAct).
Structured Outputs: Experience forcing LLMs to adhere to strict schemas (Pydantic/JSON mode) for reliable downstream processing.
Context Optimization: Advanced strategies for managing limited context windows (summarization chains, sliding windows, selective context injection) beyond simple truncation.
Inference Economics: Understanding the trade-offs between model size, latency, and cost (e.g., when to route to GPT-4 vs. a smaller/faster model).
Nice to Have:
Experience with Elasticsearch (DSL queries, analyzers).
Knowledge of vector databases and embedding models.
Background in FinTech or familiarity with financial data structures.
This position is open to all candidates.