This project aimed to develop a knowledge graph system that could extract genetic sequence information from unstructured biotech patent documents. As the team leader, the project planning and research were my primary responsibilities. I also managed the working progress of my team members and reviewed their project designs.

The team developed a data pipeline to extract gene names, sequences, and organism names from biotech patent documents. We integrated the ETL pipeline with the BLAST+ database to classify the taxonomy category of the genetic information. This process helped us consolidate intellectual property acquisition with other information and build next-level business value.
The PoC project showed promising results, demonstrating the potential for knowledge graph systems to extract valuable information from unstructured data sources. As the project leader, I was pleased to have contributed to the development of this innovative solution.
