Projects & Case Studies

My work spans model and algorithm design, training at scale, and production deployment—from custom architectures and censored-regression methods to distributed GPU/TPU training and inference at enterprise scale.

That includes open-source censored-regression tools for scientific metadata; large-scale vision pipelines with distributed training and low-latency inference; enterprise anomaly detection grounded in innovative algorithms; and branching reinforcement learning for high-throughput sequential decisions under live production load.

For questions about any of the projects below, feel free to contact me on LinkedIn or by email.

AI & Deep Learning Research

Cloud decision platform DRL architecture with parallel macro/micro encoders, BDQ dual-head outputs, and GPU replay buffer

High-Throughput Deep Reinforcement Learning (DRL) Decision Engine (Stealth Pilot)

Designed and developed a high-throughput sequential decision engine for a cloud computing decision platform, delivering real-time sequential decisions for resource provisioning and scheduling under high-throughput, high-frequency runtime workloads. This ongoing stealth pilot applies deep reinforcement learning at systems scale—bridging algorithm design, distributed GPU training, and safety-aware policy alignment. Key Technologies: PyTorch · Branching Dueling Q-Network (BDQ) · HF Accelerate · DeepSpeed · Ray Job/Serve · DDP · Preference Alignment System Overview & Sequential Decision Loop The engine formulates provisioning and scheduling on the platform as a Markov Decision Process (MDP): a macro information encoder and a micro information encoder process the heterogeneous platform observability in parallel—the former aggregating platform level signals, the latter capturing fast local observations—before fusion into the Actor-Critic backbone. The primary head emits discrete control actions; a parallel preference alignment auxiliary network regularizes the shared policy representation for safety and preference constraints. Decisions run at millisecond-scale intervals, balancing throughput, tail latency, and resource utilization. ...

Autonomous Ransomware Protection SystemLarge-Scale Storage Security Engine with Deep Learning and Vector/Graph Analytics

As a key member of NetApp’s core AI & ML R&D team, I co-designed and filed multiple core patent applications that form the algorithmic foundation of NetApp ONTAP’s real-time Autonomous Ransomware Protection (ARP) engine. Our approach brings state-of-the-art deep learning and semantic analytics to real-time storage security, running reliably under stringent enterprise latency and compute budgets. Core Technical Architecture & Patent Portfolio The ARP engine comprises three cooperating advanced detection and monitoring layers: ...

Tobit censored regression prediction chart

Academic Publication & Open-Source Algorithms: Predicting Shared Digital Resource Lifecycles in Scientific Literature (Nature Portfolio: HSS Communications)

Academic Publication: Humanities and Social Sciences Communications (Nature Portfolio, 2025) Open-Source Repository: https://github.com/UKGANG/tobit This project spans a complete cycle of scientific inquiry—from statistical modeling, to publication in a Nature Portfolio sub-journal, to releasing a production-grade Python library to the open-source community. Funded by the U.S. Office of Research Integrity (Project Grants: ORIIR190049 and ORIIR180041), the work models and predicts the failure lifecycles of digital resources shared in scientific literature (e.g., databases, code repositories, and online tools). ...

Cell morphology dataset visualization interface

AI Platform for Single-Cell Morphology & Biological Classification

At Deepcell, I designed, developed, and scaled an ultra-high-performance cell image classification and intelligent screening platform. The platform enables biologists and pharmaceutical researchers to analyze, classify, and physically sort cell populations in minimal time based on raw morphological surface features—bridging laboratory hardware instruments and cloud deep learning clusters into an efficient end-to-end pipeline. Core Engineering Optimizations & Compute Acceleration Our core R&D focus was maximizing model throughput, aggressively minimizing cloud compute costs, and building highly robust ultra-low-latency inference pipelines. ...

Enterprise Systems & Legacy Projects

Supply–Demand Modeling for the Sharing-Economy Short-Term Rental Market

In collaboration with AirDNA, this project applied statistical and machine learning methods to Airbnb guest review text to model supply–demand dynamics and macro-level impacts in the sharing-economy short-term rental market, characterizing behavioral patterns across user segments. Supply–demand analysis via structural equation modeling This project applied NLP and a pretrained BERT model on the text side to quantify guest preferences and trust toward hosts, and on the structural modeling side combined SEM, PCA, exploratory and confirmatory factor analysis, generalized linear regression, and LDA to support host decision-making under post-pandemic market uncertainty. The findings informed AirDNA’s industry-facing data products and analytics for the short-term rental market.

2022-06

Biotech Patent Knowledge Graph with Sequence Retrieval

As team lead, I spearheaded the development of a knowledge graph for biotech patent literature—defining the research direction, aligning delivery milestones, and overseeing data pipeline and schema architectures. The project’s core goal was to transform unstructured patent text into structured genetic sequence records to power downstream IP analytics and interactive exploration. POS tagging and domain rules to locate sequence-bearing passages in biotech patent documents The team built an automated ETL pipeline to extract gene names, DNA/protein sequences, and taxonomic mentions from patent documents. These extracted entities were integrated into a unified knowledge graph schema and validated against BLAST+ databases for taxonomic classification—filtering out noise caused by OCR errors and inconsistent biological nomenclature. ...

2021-02

ImageAnnotatorJS: Research Integrity Figure Review Tool

GitHub: sciosci/ImageAnnotatorJS

ImageAnnotatorJS is a modular JavaScript library (AMD) for building web-based image annotation and review systems. Users draw and manage vector annotation shapes on an HTML5 canvas, supporting academic figure inspection, analysis, and research data collection. Development was partially funded by the U.S. Department of Health and Human Services (HHS) Office of Research Integrity (ORI) under grants ORIIIR190049 and ORIIR180041.

...

2020-08

Tampered Image Tagging System

Dashboard Annotation modal Screenshots of the tagging system The SOS+CD Laboratory’s image analysis application provides a user-friendly interface and web service for researchers and other applications. The application allows users to analyze and store images from a PDF file, as well as annotate these images using the application’s annotation feature. Additionally, the application offers multiple APIs in the backend for more advanced image processing and analysis. Through the use of machine learning algorithms, the application provides a high degree of accuracy in image recognition and annotation, allowing researchers to easily and efficiently parse large quantities of image data. The application’s user interface is designed to be intuitive and user-friendly, with an emphasis on ease of use and efficiency. With its powerful image analysis capabilities and flexible APIs, the SOS+CD Laboratory’s image analysis application is an invaluable tool for researchers and developers working with image data.

2020-05

Covid-19 Chatbot

This chatbot project involved developing a natural language processing (NLP) model to provide the latest updates on COVID-19 to the general public. As the team leader, my responsibilities included hosting weekly proof-of-concept (POC) demonstrations, guiding the team to break down tasks, tracking project progress, and overseeing code implementation. To build the application, we first had to collect data since we didn’t have any on hand. We built a web scraper to collect data from the Quora community, and then we used transfer learning with the pre-trained model BERT to train our NLP model on the scraped dataset. After researching several loss functions for the NLP task, we decided to use cosine similarity, which we implemented using the ETL process and the logic of the similarity comparison based on the KNN algorithm. ...

2020-03

Data Augmentation for X-Ray Screening Images

The objective of this experiment was to increase the robustness of an AI model through data augmentation for X-ray screening images. As the project lead, I was responsible for devising the architecture of the generative adversarial network (GAN) and designing the experimental steps, as well as analyzing the performance of the model. The adversarial system The evolution of the image quality To start, I experimented with two different network architectures, using extensive hyperparameter tuning to fine-tune the GAN for optimal performance. Additionally, to capture the features of the X-ray images, I advanced the network by appending a Sobel filter. Finally, I conducted a non-parametric hypothesis test (NHST) to analyze the performance of the network, which demonstrated the effectiveness of our approach. ...

2020-01

WeChat micro App

This POC project was developed for China Huaneng Shanxi Qingling Power Generation to predict electrical output. As the principal architect, I was responsible for overseeing the development of the application. I worked closely with the engineering director of the power plant to understand their requirements and ensure that the application was designed to meet their needs. The application integrated a REST service with its frontend to enable users to query and visualize the predicted electrical output. We used the WeChat Micro App platform as the front end, which allowed users to access the application easily through their mobile devices. To ensure that the application was robust and reliable, we conducted extensive testing and validation throughout the development process. ...

2018-07

eCommerce Service Platform

This e-commerce project seamlessly integrates the merchant API to provide quotation and ordering services to other applications. As the senior Java engineer, I am responsible for developing new features to meet business team requests and ensure the project’s rapid growth. To improve the user experience, I redesigned the API to provide a more convenient payment service and led a team of two to implement the tracking log module, which greatly facilitated troubleshooting processes. Additionally, we introduced a subsystem to help the business analytics team perform AB tests. ...

2016-01

Foundation Email& File Storage Service

This project serves as the backbone of the enterprise by organizing file storage and email delivery for other applications. As the senior Java software engineer, I played a pivotal role in selecting candidate technologies, designing the project’s architecture, developing the code, and overseeing the work of junior colleagues. Before refactoring the entire project, the legacy system was outdated and difficult to maintain, which required other applications to adapt to its limitations. I identified and addressed bottlenecks in the system, such as direct MQ invocations and an excessive amount of unnecessary data stored in the NFS space. ...

2016-01

Mount Alvernia Hospital Information System

Mount Alvernia Hospital This project was developed for the Mount Alvernia Hospital in Singapore, offering patient admission and medical information management services that meet the patients’ expectations. As the senior Java software engineer, I was responsible for mentoring junior team members, performing code reviews, and overseeing project development. Using the Wacom pad platform, I developed the e-signature module, which converted invoice pictures into Base64 encoded strings for HTML5 visualization. The module captured signature curves via an e-pencil, promoting a seamless and user-friendly experience for patients. ...

2014-05

SATA Community Health Integrated Medical Centre System

SATA Community Health Medical Center This system offered services for organizing patient information, managing in-patient processes, storing medicine, generating branch revenue reports, and calculating citizen subsidy deductions. My responsibilities as a senior Java software engineer included developing extensions using EJB 2, Struts1, and iBATIS. One of my primary contributions was optimizing the function of database pagination by utilizing clustered indexing in Oracle. This optimization significantly improved the system’s performance, enhancing the user experience for medical staff and patients alike. Overall, my contributions played an essential role in ensuring the success of the system and its ability to provide high-quality patient care services.

2012-05

British Virgin Island (BVI) Offshore Financing System

This application serves as a platform for receiving requests for company registration from agencies around the world. As a developer on the project, my responsibilities included working on extension development and evaluating changes to the browser compatibility task. To complete these tasks, I utilized Java as the primary programming language, analyzing the codebase to scan JSP and JavaScript files and highlight problematic code chunks. This analysis helped the team better understand and estimate the scope and risk level of the task at hand, ultimately leading to the successful completion of the project. My contributions played a crucial role in ensuring that the application could provide high-quality services for agencies worldwide.

2011-03

NEC Cloud Heartbeat Surveillance Toolkit

This in-house project was developed using Java and designed to monitor the survival status of nodes in the cluster. The system inspected the heartbeat signal and notified maintenance team by email if any nodes stopped responding to the heartbeat signal. As a lead developer on this project, I worked closely with a team of 5 developers to collaborate on project development. My contributions spanned various phases of the development cycle, including design, implementation, code review, testing, and release to the environment. ...

2010-05

NEC Outsourcing Service Platform

As a team lead, my responsibilities included both people management and technical mentorship for on-site team members. In particular, I provided guidance and support for a team of 30 colleagues in Beijing and 10 colleagues in Shanghai, focusing on testing customer applications. Thanks to the expertise and dedication of my team, we were able to successfully detect 1.79 bugs per kilobyte for each release, contributing to the overall quality and reliability of the applications we were responsible for testing. ...

2010-04

Social Security& Hospital Information System Integration

In my role as a maintenance engineer, I was responsible for providing crucial production support to the company’s clients, ensuring that their systems and applications remained fully functional and running smoothly at all times. Additionally, I played a key role in assisting the R&D team in troubleshooting complex data issues on the fly, leveraging my expertise in database integration and management to provide effective solutions to a variety of challenges. ...

2009-11

Insurance Contract& OA Backoffice System

In my role as an on-site support specialist, I maintained the business OA system for an insurance company. I was responsible for troubleshooting and fixing frontend bugs using JavaScript, as well as deploying new server stations for the helpdesk. Additionally, I organized the hardware network for conferences and other events.

2007-11