Customer: Researchers conducting systematic literature reviews (35 ICPs at UofC, 50 stakeholders engaged).
What we did: Automated web app using scraping and text analysis for data collection.
Status Quo: Traditional manual approach (12-18 months, $30,000-50,000) vs. automated approach (1 week, hundreds of dollars).
Outcomes: 98% timeline reduction and 99% cost reduction.
Context: Evidence-based research requiring efficient literature review processes. Conducted customer discovery to identify beachhead market and develop quantified value proposition.
Customer: Teams or individuals needing automated AI-generated recommendations delivered daily across multiple regions.
What we did: Built a serverless, event-driven pipeline that processes AI prompts through Claude, fans results out to email and a queryable data lake, and runs fully automated on a daily cron — no manual intervention required.
Status Quo: Manually running prompts, copy-pasting outputs, emailing results, and managing spreadsheets vs. a fully automated pipeline that ingests, processes, stores, and delivers AI results across any number of countries daily.
Outcomes: 100% elimination of manual prompt-to-delivery workflow. Horizontally scalable to any number of jobs with zero additional infrastructure changes.
Context: Event-driven architecture with decoupled producers and consumers — FastAPI (entry point with rate limiting) queues jobs to SQS instantly; Lambda workers pull jobs and call Claude API; SNS fans out results in parallel to an email queue (Lambda → SES) and an S3 queue (Lambda → Bronze JSON layer); EventBridge triggers a scheduler Lambda daily to auto-submit one job per country and kicks off a Glue Workflow that transforms Bronze JSON → Silver Parquet; Athena queries the Silver layer; Retool surfaces results in a dashboard. All infrastructure provisioned as code via AWS CDK.
| Service | Where & Why It Was Used |
|---|---|
| FastAPI | Entry point — exposes POST /submit, validates requests, enforces rate limiting (10 req/min per IP via slowapi), pushes jobs to SQS, returns 202 Accepted immediately |
| SQS (job_queue) | Decouples the API from processing — holds jobs until a Lambda worker picks them up; FIFO ensures order; DLQ captures failures after 3 attempts |
| Lambda (claude_worker) | Pulls jobs from job_queue, calls the Claude API with the prompt, publishes the structured result to SNS |
| Claude API | The AI brain — takes the prompt and returns the generated recommendation/result |
| SNS | Fan-out hub — receives the Claude result and simultaneously delivers it to both the email queue and the S3 queue in parallel |
| SQS (email_queue) | Buffers results destined for email delivery; decouples SNS fan-out from the email worker |
| Lambda (email_worker) | Triggered by email_queue — formats the result and sends it via SES |
| SES | Delivers the final AI-generated result to the recipient's inbox |
| SQS (s3_queue) | Buffers results destined for storage; decouples SNS fan-out from the S3 writer |
| Lambda (s3_worker) | Triggered by s3_queue — writes the result as a JSON file to S3 Bronze layer under recommendations/{date}/ |
| EventBridge | Two roles: (1) triggers scheduler_worker Lambda daily at midnight UTC to auto-submit one job per country; (2) triggers the Glue Workflow daily to run the ETL |
| Lambda (scheduler_worker) | Loops through configured countries and pushes one job per country into job_queue — fully automated, no human trigger needed |
| S3 | Three-zone storage — Bronze (raw JSON results), Silver (Parquet after ETL), Athena query results |
| Glue Crawler (Bronze) | Scans Bronze S3 prefix, infers schema, registers the table in the Data Catalog |
| Glue ETL Job | Transforms Bronze JSON → Silver Parquet for efficient querying |
| Glue Crawler (Silver) | Updates the Data Catalog after ETL writes new Silver partitions |
| Glue Data Catalog | Central metadata store — schema, location, and partitions for both Bronze and Silver tables |
| Athena | SQL engine over S3 — queries the Silver Parquet layer; results written back to S3 |
| Retool | User-facing dashboard — connects to Athena, lets users query, filter, and visualize AI results without writing SQL |
| CDK | Infrastructure as code — provisions every resource above with a single cdk deploy; outputs queue URLs, SNS ARN, and S3 bucket name |
Customer: Students in Himalayan regions of India lacking access to quality education.
What we did: Built web platform delivering evidence-based educational content in Maths, Life Sciences, and Computer Science.
Status Quo: Limited access to quality education resources in remote regions.
Outcomes: Bridged accessibility gap through scalable digital platform.
Context: Non-profit tech initiative addressing educational inequity in underserved communities.
Customer: Construction teams, facility managers, and AR collaborators requiring spatial understanding for TELUS Communication customer support workers.
What we did: Built real-time pipeline capturing spatial data via iPad LiDAR and rendering as 3D digital twins using Gaussian Splatting.
Status Quo: Manual site documentation and remote collaboration limited by 2D representations.
Outcomes: Real-time spatial capture enabling immersive digital twins for analysis and collaboration.
Context: Applications in smart construction, facility management, and distributed AR collaboration.
Customer: Remote support teams and field technicians requiring hands-free, context-aware assistance in TELUS Communication.
What we did: Built end-to-end pipeline scanning physical environments with iPad LiDAR, reconstructing as digital twins, streaming to Meta Quest with LLM-powered voice interactions.
Status Quo: Traditional remote support lacks spatial context and requires manual reference materials.
Outcomes: Hands-free, context-aware VR experiences enabling remote experts to assist with full environmental understanding.
Context: VR-based remote support for field operations requiring spatial reasoning and collaborative problem-solving.
Customer: Organizations in mission-critical industries (Healthcare, Defense, Law) managing internal documentation.
What we did: Built rapid prototype converting internal docs to structured HTML guides using AI.
Status Quo: Manual documentation conversion is time-consuming and error-prone.
Outcomes: Prototype developed within 6 hours; streamlines multi-guide conversion to HTML.
Context: Early-stage prototype; future improvements include semantic search for document clustering and comprehensive guide generation.
Customer: Utility companies and residential homeowners in smart home ecosystems.
What we did: Developed advanced time-series forecasting models for energy consumption prediction.
Status Quo: Traditional forecasting methods lack accuracy for demand-response optimization.
Outcomes: Improved energy usage predictions enabling optimized power generation, smarter demand-response systems, and reduced carbon footprints.
Context: Did Data integration in smart home energy management contributing to sustainability and grid efficiency.
Customer: Online content consumers and content platforms combating misinformation.
What we did: Built NLP-based detection system using LSTM neural networks for clickbait identification.
Status Quo: Manual content moderation insufficient for scale; poor-quality headlines erode trust.
Outcomes: LSTM model achieved 95.03% accuracy, outperforming Random Forest (93.89%) and Naive Bayes (93.32%).
Context: Research addressing information quality and user trust in digital media.
Customer: Power grid operators and critical infrastructure protecting against cyber threats.
What we did: Developed machine learning-based intrusion detection system using feature selection and ensemble methods.
Status Quo: Traditional rule-based systems miss novel attack patterns and have high false-positive rates.
Outcomes: Achieved 11.874% accuracy improvement using Random Forest feature selection with SVM, requiring only 30 features.
Context: Critical infrastructure cybersecurity data intergation enhancing power system resilience.
Customer: Job seekers and recruitment platforms combating fraudulent listings.
What we did: Designed a linguistic feature extraction pipeline from employer-authored text to classify spam job postings.
Status Quo: Manual review of job posts at scale is infeasible; fraudulent listings erode platform trust.
Outcomes: Effective classification using domain-specific linguistic cues derived from employer language patterns.
Context: NLP and feature engineering research at the intersection of labor market integrity and text classification.
Customer: Cybersecurity teams and endpoint protection platforms.
What we did: Evaluated feature selection and scaling strategies to optimize a Random Forest malware detection pipeline.
Status Quo: High-dimensional malware feature spaces degrade classifier performance and increase inference cost.
Outcomes: Targeted feature selection and proper scaling significantly improve detection accuracy and reduce model complexity.
Context: Security-focused data science research establishing best practices for ML-based malware classification.
Customer: Public health agencies and misinformation researchers monitoring vaccine sentiment.
What we did: Built a sentiment classification pipeline to identify negative vaccine-related tweets, with techniques to address severe class imbalance.
Status Quo: Negative vaccine content is rare but high-impact; class imbalance causes most models to underperform on the minority class.
Outcomes: Improved recall and F1 on negative tweet detection by applying SMOTE and re-weighting strategies.
Context: Applied NLP research supporting real-time public health surveillance during the Covid-19 pandemic.
Customer: End users and email/browser security platforms targeted by phishing attacks.
What we did: Built and benchmarked Naive Bayes and KNN classifiers on extracted URL and page-level features for phishing site detection.
Status Quo: Rule-based blocklists lag behind rapidly evolving phishing campaigns.
Outcomes: Demonstrated lightweight ML models as viable real-time phishing detectors with competitive accuracy and low inference overhead.
Context: Security data science research evaluating interpretable, low-latency classifiers suitable for edge deployment.
Customer: Browser security vendors and enterprise network defenders.
What we did: Conducted comparative analysis of feature selection techniques on SMOTE-balanced datasets for malicious website detection.
Status Quo: Imbalanced security datasets and noisy features reduce classifier reliability in real-world deployments.
Outcomes: Identified optimal feature selection approaches that maximize detection precision after data balancing.
Context: Data engineering and ML research establishing reproducible preprocessing workflows for web security classification.
Customer: Social network operators and users requiring trust and privacy guarantees.
What we did: Designed a machine learning framework addressing security threats and privacy leakage in social network data.
Status Quo: Social platforms face multi-vector threats — fake accounts, data harvesting, malicious content — with no unified ML defense layer.
Outcomes: Proposed and validated a cohesive framework integrating threat detection, anomaly scoring, and privacy-aware data handling.
Context: Systems-level research published in Cluster Computing, combining ML pipeline design with privacy engineering.
Customer: IoT system architects and edge infrastructure engineers.
What we did: Surveyed and analyzed ML deployment strategies for edge computing environments, identifying key architectural trade-offs.
Status Quo: Centralized ML inference creates latency and bandwidth bottlenecks unsuitable for real-time edge applications.
Outcomes: Mapped opportunities for lightweight model deployment at the edge, informing architecture decisions for constrained environments.
Context: Systems research bridging ML and edge computing infrastructure design.
Customer: Network security operations teams monitoring firewall activity.
What we did: Applied tree-based classifiers to structured firewall log data to detect and classify intrusion attempts.
Status Quo: Manual log analysis is slow and misses patterns across high-volume network traffic.
Outcomes: Demonstrated that tree-based models can accurately classify intrusion events from raw firewall logs with low overhead.
Context: Data pipeline and ML research applied to network security operations.
Customer: Power grid security engineers evaluating ML-based anomaly detection.
What we did: Benchmarked ensemble, distance-based, and tree-based classifiers for detecting anomalies in power system data.
Status Quo: No consensus on which ML family performs best for critical infrastructure anomaly detection.
Outcomes: Provided empirical guidance on model selection for power system security, identifying performance trade-offs across families.
Context: Comparative data science research supporting robust ML adoption in critical infrastructure.
Customer: Healthcare providers and clinical decision support systems.
What we did: Explored NLP-based approaches to extract and classify medical conditions from natural language text.
Status Quo: Unstructured clinical text contains high-value diagnostic signals inaccessible to traditional rule-based systems.
Outcomes: Developed intelligent system capable of identifying potential medical conditions from patient-authored and clinical text.
Context: Applied NLP research toward automating clinical triage and decision support.
Customer: Endpoint security vendors and enterprise IT defense teams.
What we did: Built and evaluated a robust ML pipeline for malware detection using static and behavioral features.
Status Quo: Signature-based antivirus fails against novel and obfuscated malware variants.
Outcomes: ML-based pipeline demonstrated strong generalization against unseen malware families with high detection rates.
Context: Security data science research establishing ML as a reliable layer in multi-stage malware defense.