Scan contact card
I am a Senior Data Scientist with 8+ years of experience(both industry and academic) in developing innovative solutions to business problems using quantitative and machine learning models.
I obtained my Ph.D. from Texas A&M University with the major of Applied Statistics and Geo-computation focusing on signal processing, Bayesian and Machine learning.
I specialized in writing for publication, algorithm development, and applying machine learning models to develop analystical solutions using various kinds of data such as time series data, text, geospatial and remote sensing data.
Passionate about applying advanced statistical methods such as machine learning and deep learning-based approaches, NLP and time series analysis to perform deep-dive analysis to identify emerging trends, pain points and opportunity areas from data, and to turn data into actionable insights and provide support for decision making and business optimization.
Technical expertise in statistcs, machine learning, NLP, time series analysis,big data processing, remote sensing, computer vision, and geospatial data engineering based on years of experience in university research, non-government, start-up, and private sector.
To predict sentiment (postive, neutral, negeative) of customer feedback using tweet texts of differnt airline companies and compare different models'performace on text classification.
To develop a generalized model to deal with big and imblance data prediction that suitable for real-time fraud detection at the PySpark framework
To develop a generalized model to deal with big and imblance data prediction that suitable for real-time fraud detection at the PySpark framework
To examplify the uses of ensemble models in PySpark as the ensemble models in [previous project using sklearn and keras](https://github.com/tankwin08/ensemble-models-ML-DL-) and predict if the client will subscribe (yes/no) a term deposit (variable y) using market campaign data.
To investigate the trend and pattern of time seriese data (MODIS data) using the Long Short Term Memory (LSTM) networks and quantify the uncertianty of the time series prediction of target variables.
To investigate the trend and pattern of time seriese data (MODIS data) using the Autoregressive Integrated Moving Averages (ARIMA) and Long Short Term Memory (LSTM) networks and further to check if we can use the current model to predict further values of target variables.
To retrain the pretrained model (Submatrix-wise Vector Embedding Learner (SWIVEL) using using a small collected review datasets and classify the reviews of customer feedback as either positive or negative.
To develop a robust approach to conduct classification on data (a person is wearing glasses or not) using a ensemble of models, which include machine learning models (random forest,Gradient Boosting and Extra Trees) and deep learning model (optimized NN using Bayesian optimization).
To construct the architecture of Nentural Network (NN) and conduct paramter optimization of the NN.
See all Creations for more examples!
Developed an open source R package to the community for processing waveform lidar data and exemplify their uses.
A brief introduction of articles, presentations or talks.
Make business count on data and statistcs.
Award :