Cheng Jiahao

AI/NLP Algorithm Engineer
Shanghai, CN.

About

Highly accomplished AI/NLP Algorithm Engineer with a Master's focus in Natural Language Processing, specializing in large language model optimization, algorithm development, and data-driven solutions. Proven ability to enhance advertising creative generation, improve search relevance, and drive significant business impact through advanced machine learning techniques and robust research methodologies, as evidenced by +14.44% offline quality and +0.56% CTR gains.

Work

Xiaohongshu
|

Advertising Creative Algorithm Intern

Shanghai, Shanghai, China

Summary

Led the development and optimization of AIGC-driven advertising creative algorithms for Xiaohongshu, significantly enhancing material generation efficiency and online advertising performance.

Highlights

Developed and deployed multiple AIGC models for advertising creative generation, achieving an average +14.44% improvement in offline quality metrics, a 2.53x QPS increase, and online gains of +0.56% CTR, +0.27% CPM1, and +1.10% cost efficiency.

Designed and implemented a multi-source LLM distillation strategy, generating over 800,000 high-quality training samples through multi-stage filtering and online data feedback.

Optimized model training with SFT cold-start and DPO alignment, boosting model output diversity by +2.78% and rejection recall to 96.18% for online service safety.

Spearheaded research into black-box reward modeling and RL optimization for LLMs, achieving a new Pareto front with +2.36% average quality improvement in offline evaluations.

Optimized search ad fine-grained embedding models, boosting clustering advantage by +10.37%, NER F1 by +9.56%, expanding long-tail PV coverage by +3.98%, and achieving an 82.7% topic adoption rate.

4Paradigm
|

NLP Algorithm Intern

Beijing, Beijing, China

Summary

Optimized full-chain document retrieval systems and developed context-aware sentence extraction methods, significantly improving retrieval accuracy and abstract quality.

Highlights

Optimized full-chain document retrieval systems, achieving a +4.2% increase in ROUGE-L F1 score for enhanced relevance and precision.

Improved embedding representation quality by generating and integrating summaries and keywords into document chunks, enhancing contextual understanding.

Developed a novel context-aware sentence-by-sentence abstract extraction method for Chinese conference abstracts, outperforming direct extraction by +12.0% in ROUGE-L F1 score.

Tencent
|

Technical Research Intern

Shenzhen, Guangdong, China

Summary

Developed high-dimensional data visualization systems for MOBA game analytics and abnormal player detection, enhancing data analysis and system performance.

Highlights

Developed a high-dimensional data visualization system for MOBA game battlefield graphics and abnormal player detection, improving analytical insights.

Implemented distributed DBSCAN using Spark SQL, reducing distance calculation complexity and leveraging GraphX for efficient partition clustering.

Designed and developed hybrid Bezier spline-vector field visualization components using React and Fabric.js, enhancing interactive data representation.

Education

East China Normal University
Shanghai, Shanghai, China

Master

Information Resource Management (NLP Direction)

Wuhan University of Technology
Wuhan, Hubei, China

Bachelor

Information Management and Information System

Awards

Top 50 Content Creator, CSDN Python Domain

Awarded By

CSDN

Recognized as a leading contributor of high-quality content in the Python programming community.

National Second Prize, China University Computer Design Competition

Awarded By

China University Computer Design Competition

Recognized for outstanding computer design and innovative project development.

National Third Prize, 2023 China Collegiate Computer Competition Mobile Application Innovation Contest

Awarded By

China Collegiate Computer Competition

Recognized for outstanding mobile application innovation.

Finalist, COMAP Mathematical Contest in Modeling

Awarded By

COMAP

Achieved finalist status in a prestigious international mathematical modeling competition.

Second Prize, 2022 Asia-Pacific Mathematical Contest in Modeling

Awarded By

Asia-Pacific Mathematical Contest in Modeling

Awarded for excellent performance in mathematical modeling.

Languages

Chinese
English

Certificates

National Computer Rank Examination Level 4 - Database Engineer

Issued By

National Computer Rank Examination Committee

IBM Data Science Practitioner

Issued By

IBM

Skills

Programming Languages

Python, SQL, JavaScript, TypeScript.

Machine Learning & NLP

Large Language Models (LLM), Natural Language Processing (NLP), Deep Learning, Algorithm Optimization, AIGC, DPO, SFT, RAG, CoT Prompting, Hallucination Detection, Embedding Models, Topic Modeling, NER, LLM Distillation, Machine Learning.

Frameworks & Libraries

PyTorch, DeepSpeed, vLLM, ms-swift, FastAPI, Flask, React, Spark SQL, GraphX.

Databases & Data Management

Elasticsearch, Qdrant, MongoDB, Distributed Systems.

Development Tools & Methodologies

Linux, Git, Docker, Full-stack Development, Backend Development, Frontend Development, System Optimization.

Data Science & Analytics

Data Visualization, High-dimensional Data Analysis, Statistical Modeling, Bayesian Inference, KPI Tracking, Data-driven Solutions.

Projects

Intelligent Topic Selection Platform for Scientific Research Projects

Summary

Developed an intelligent platform for scientific research topic selection, integrating hot topic analysis, value assessment, and AI-driven generation, recognized with a National Second Prize in the China University Computer Design Competition.

CoT Prompting Obscures Hallucination Cues in LLM

Summary

Led comprehensive research on LLM hallucination detection, systematically evaluating 8 methods and analyzing how CoT prompting complicates the identification of hallucination cues.

2023 COMAP Mathematical Contest in Modeling

Summary

Modeled and analyzed Wordle win rates, proposing a Letter2Vec solution for sparse letter representation and applying Bayesian inference with posterior probability correction to optimize game step decomposition.