Gazer
Developed an autonomous tracking system leveraging YOLO11n object detection to identify and localize human faces in real-time video streams. The system calculates precise error vectors between detected face coordinates and frame center, feeding these measurements into a custom-tuned PID control algorithm that generates smooth, cinematic camera movements.
Engineered a multi-threaded Python backend managing 9 concurrent execution threads for real-time vision processing, UDP command transmission to the DJI Tello drone, and state management. Implemented thread-safe locking mechanisms and dead-zone tolerance logic to eliminate micro-adjustments and jitter, ensuring professional-grade tracking stability.
Integrated a voice-controlled LLM interface combining OpenAI Whisper for speech recognition and Google Gemini for natural language understanding. Users can adjust PID coefficients (Kp, Ki, Kd) and distance thresholds (50-250px) through conversational commands, with ElevenLabs TTS providing real-time audio feedback.
Sensor Fusion
Implemented an Error-State Kalman Filter utilizing quaternion representation to avoid the singularities and gimbal lock inherent in Euler angle parameterizations. The filter maintains a 16-dimensional nominal state and a separate 15-dimensional error state, fusing measurements from 6-DOF IMU sensors and GPS receivers to produce optimal orientation and position estimates.
Derived complete Jacobian matrices for both prediction and update steps, enabling accurate covariance propagation through nonlinear system dynamics. Applied the Joseph form for covariance updates to guarantee positive semi-definiteness, while implementing multiplicative quaternion error injection to maintain unit norm constraints throughout the estimation cycle.
Architected a sensor fusion pipeline capable of processing asynchronous IMU measurements at 100Hz alongside lower-rate GPS updates. Noise parameters were characterized using Allan variance analysis, while coordinate transformations between body and navigation frames ensure geometrically consistent state estimates across all sensor modalities.
LSTM
Constructed a binary classification model using stacked LSTM layers with 40-dimensional word embeddings to process variable-length news articles. The architecture incorporates dropout regularization between recurrent layers and sequence padding to handle texts of arbitrary length while preventing overfitting on the training distribution.
Developed a custom threshold tuning algorithm that systematically evaluates classification decision boundaries across the continuous probability space. This approach identifies optimal cutoff points beyond the standard 0.5 threshold, significantly improving model accuracy by accounting for class imbalance and asymmetric misclassification costs.

