Regular Flyer Buddy (RFB) Project

Introduction

Regular Flyer Buddy (RFB) is a flight search and analysis tool designed to assist frequent travelers in making optimal air ticket purchase decisions. The project originated from the need to manage and potentially reduce significant travel costs incurred due to frequent, long-distance travel between fixed origins and destinations. Recognizing the limitations of existing tools like manual flight tracking on platforms such as Google Flights or airline apps, RFB was conceptualized to provide more customized, timely, and actionable price insights. This document summarizes the project's journey from idea conception to the deployment of its Minimum Viable Product (MVP).

Problem and Solution

Many individuals travel regularly between specific locations for personal or professional reasons, often bearing the cost themselves. These travelers are typically price-sensitive but may have flexibility regarding exact travel dates. Current methods for tracking flight prices, such as setting up alerts on general platforms, lack the granularity, flexibility, and proactive analysis needed for true optimization. RFB addresses this gap by systematically collecting flight data for user-defined routes and dates, storing this information, and providing analytical tools to identify price trends and historical lows/highs, thereby enabling users to purchase tickets more strategically. The core challenge was accessing reliable flight data, as many official APIs are restricted. The project leveraged an available open-source flight data scraping tool (fast_flights) to overcome this hurdle.

Key Features

The RFB application provides several key functionalities accessible through both a command-line interface (CLI) and an interactive web dashboard:

  • Single Flight Search: Users can perform immediate searches for specific routes, dates, seat classes, and maximum stops.
  • Batch Processing: The system can automatically monitor multiple flight configurations (routes, date ranges, specific days of the week) defined by the user. This is achieved by generating configuration files (JSON format) that specify the search parameters. The batch processor then systematically queries flight information based on these configurations, storing the results for analysis.
  • Data Storage: All flight search results are persistently stored in a relational database (PostgreSQL), including details like query time, flight specifics (airline, departure/arrival times, duration, stops), and price.
  • Price Analysis: The stored historical data is processed to provide insights through various analytical views. These views track price trends over time, compare latest prices against historical highs and lows for specific routes and days of the week, and analyze price variations based on advance purchase timing.
  • Interactive Dashboard: A web-based dashboard (built with Streamlit) allows users to easily interact with the system, perform single searches, configure and initiate batch processing, and visualize the results of the price analysis through various charts and tables.
  • Automated Scheduling: A scheduler component automates the batch processing workflow, generating new configurations for future dates, running the batch searches, and refreshing the analysis views on a regular basis (e.g., daily).

Technical Architecture and Stack

The project evolved from a local setup to a containerized and cloud-deployed architecture.

Local Development: Initially developed and tested locally, the system comprised a Python backend interacting with a PostgreSQL database. The backend housed distinct service modules for flight searching (interfacing with fast_flights), configuration management, batch processing, and database operations (including defining table schemas and materialized views for analytics). A CLI provided direct access to backend functions, while Streamlit served as the frontend for user interaction and visualization.

Containerization (Local): To ensure consistency and ease of deployment, the application and database were containerized using Docker and managed locally via docker-compose.yml. This setup defined two main services: app (running the Python backend and Streamlit frontend) and db (running a PostgreSQL container). The app service depended on the db service, and database credentials were passed as environment variables. Initialization scripts for the database schema were mounted into the db container.

Cloud Deployment (AWS): The application was deployed to Amazon Web Services (AWS) leveraging a serverless and managed services approach for scalability and maintainability. The key components include:

  • Amazon ECR (Elastic Container Registry): Stores the application's Docker image built from the Dockerfile.
  • Amazon RDS (Relational Database Service): Hosts the PostgreSQL database, managing the underlying infrastructure. The database instance is configured within a VPC with no public access for security.
  • AWS Secrets Manager: Securely stores the database password, preventing exposure in code or environment variables.
  • Amazon ECS (Elastic Container Service) with AWS Fargate: Orchestrates and runs the application containers without requiring server management. A cluster (rfb-cluster) hosts the services.
    • Task Definitions: Define the container specifications (image, CPU/memory, environment variables, secrets, logging) for both the main application (rfb-app) and the scheduled tasks (rfb-scheduler). The task execution role is granted permissions to access ECR and Secrets Manager.
    • Service: Manages the running instances of the rfb-app task definition, ensuring the desired number of tasks are running and integrated with the load balancer.
  • AWS EventBridge Scheduler: Triggers the scheduler.py script periodically (e.g., daily) by running it as an ECS task based on a defined cron schedule. It passes necessary parameters like airport codes via command overrides.
  • Application Load Balancer (ALB): Distributes incoming web traffic to the running Streamlit application containers, providing a stable DNS endpoint.
  • Networking (VPC, Security Groups): A VPC provides network isolation. Security Groups act as firewalls, controlling traffic between the ALB, ECS tasks, and the RDS database (e.g., allowing the ALB to talk to ECS on port 8501, ECS to talk to RDS on port 5432).
  • IAM (Identity and Access Management): Defines roles and permissions for secure interaction between AWS services.
  • CloudWatch Logs: Collects logs from the running application and scheduler tasks for monitoring and debugging.

The following table summarizes the core technologies used:

Component Technology/Service Purpose
Frontend Streamlit Interactive Web Dashboard
Backend Logic Python Core application logic, services, CLI
Flight Data Source fast_flights (Python Lib) Interface for scraping flight data
Database PostgreSQL Storage of flight search results
Containerization Docker Packaging application and dependencies
Cloud Hosting AWS Infrastructure provider
Container Registry Amazon ECR Storing Docker images
Managed Database Amazon RDS (PostgreSQL) Hosting the PostgreSQL database
Container Orchestration Amazon ECS + AWS Fargate Running application containers serverlessly
Load Balancing Application Load Balancer (ALB) Distributing web traffic
Scheduled Tasks AWS EventBridge Scheduler Automating batch jobs and refreshes
Secrets Management AWS Secrets Manager Securely storing database credentials
Networking Amazon VPC, Security Groups Network isolation and traffic control
Logging Amazon CloudWatch Logs Monitoring application logs
CLI Framework Click (Python Lib) Building the command-line interface
Data Analysis Libs Pandas, Plotly (Python Libs) Data manipulation and visualization

Deployment Process and Challenges

The deployment involved setting up the AWS infrastructure described above. Key steps included creating an ECR repository, building and pushing the Docker image, provisioning an RDS database, configuring Secrets Manager, setting up VPC networking and security groups, defining ECS task definitions and services, creating an ALB, and configuring EventBridge for scheduled tasks.

Database initialization required running the init-db command from the CLI as a one-off ECS task, as the init scripts used in the local docker-compose setup aren't directly applicable to RDS.

Several challenges were encountered during development and deployment:

  • Initial attempts to use AWS CloudFormation for infrastructure provisioning proved complex and time-consuming, leading to a manual, step-by-step console approach.
  • Cross-platform Docker builds were necessary when building on macOS for a Linux deployment environment.
  • Database connection parameters (specifically the host) differ between local direct connections, local containerized connections, and RDS, requiring careful configuration management.
  • Ensuring correct IAM permissions and Security Group rules between ECS, RDS, and EventBridge was crucial and required troubleshooting.
  • Materialized views in the database required manual setup or initialization via the init-db script after the main table was created.

Conclusion

The Regular Flyer Buddy project successfully transitioned from an idea addressing a personal need to a deployed MVP application on AWS. It provides a practical solution for frequent travelers to monitor and analyze flight prices for specific routes, aiming to facilitate more cost-effective travel planning. The project demonstrates a complete development lifecycle, incorporating data sourcing, backend logic, database management, frontend visualization, containerization, and cloud deployment using managed services. While challenges were faced, particularly around cloud infrastructure setup and data initialization, the resulting application provides a solid foundation for future enhancements. It is important to note that this MVP serves as a proof of concept; its reliance on a scraping-based pseudo-API (fast_flights) for data sourcing means it may lack the reliability required for a production environment, as scrapers can break if the underlying website structure changes.