In this version I have: * Fixed some bugs in time_transfer
.
In this version I have: * Fixed some bugs in split_bins
,min_max_norm
,digits_num
. * Add new function of sql_hive_text_parse
for automatic production of hive SQL * Add new function of sum_table
includes both univariate and bivariate analysis and ranges from univariate statistics and frequency distributions, to correlations, cross-tabulation and characteristic analysis.
In this version I have: * Fixed some bugs in split_bins
,time_transfer
,cohort_analysis
,xgb_filter
,feature_selector
. * Rewrite the functions of plot_bar
,plot_density
,plot_line
, plot_box
,plot_relative_freq_histogram
,love_color
. * Add new functions of plot_colors
In this version I have: * Fixed some bugs in cohort_analysis
,time_transfer
,get_ctree_rules
.
In this version I have: * Fixed some bugs in plot_bar
,missing_proc
,char_to_num
. * Rewrite the logic of time_variable
. * New function plot_line
is for generating line plots.
In this version I have: * Fixed some bugs in data_cleansing
,plot_table
,check_rules
.
In this version I have: * Fixed some bugs in check_rules
, time_transfer
.
#creditmodel-1.2.2
In this version I have: * Fixed some bugs in get_ctree_rules
, ks_plot
,cross_table
.
#creditmodel-1.2.1
In this version I have: * Enhanced strategy analysis capabilities. * New function rule_value_replace
is for generating new variables by rules. * Fixed some potential bugs in ks_plot
, perf_table
,training_model
,process_nas
.
In this version I have: * Enhanced strategy analysis capabilities. * New function replace_value
is for replacing values of some variables. * Fixed some potential bugs in check_rules
, get_ctree_rules
,rules_filter
,%alike%
.
In this version I have: * New function plot_distribution
,plot_relative_freq_histogram
, plot_box
,plot_density
, plot_bar
are for data visualization. * New function swap_analysis
is for swap out/swap in analysis. * New function rules_filter
is used to filter or select samples by rules * Fixed some potential bugs in char_to_num
, merge_category
,check_rules
,get_ctree_rules
.
In this version I have: * New function cross_table
is for cross table analysis. * Fixed some potential bugs in data_cleansing
, low_variance_filter
,time_variable
,plot_vars
.
In this version I have: * New function entropy_weight
for is for calculating Entropy Weight. * New function term_tfidf
for computing tf-idf of documents. * New function plot_oot_perf
for plotting performance of over time samples in the future. * Fixed some potential bugs in get_breaks
, lift_plot
,perf_table
,model_result_plot
. * Add a parameter cut_bin to get_breaks
for cutting breaks equal depth or equal width.
In this version I have:
split_bins
, woe_transfer
time_series_proc
for time series data processing.ranking_percent_proc
,ranking_percent_dict
are for processing ranking percent variables and generating ranking percent dictionary.read_dt
to read_data
and add and parameter pattern for matching files.traing_xgb
,‘xgb_params’save_dt
to save_data
and save_data
also supports multiple data frames.In this version I have:
pred_xgb
for using xgboost model to predict new data.get_psi_plots
, psi_plot
to plot PSI of your data..p_to_score
for transforming probability to score.multi_left_jion
for left jion a list of datasets fast.read_data
for loading csv or txt data fast.In this version I have:
xgb_filter
, feature_selector
, split_bins
, ks_table_plot
, ks_psi_plot
, ks_value
.pred_score
for predicting new data using scorecard.lr_params_search
, xgb_params_search
for searching the optimal parameters. “random_search”,“grid_search”,“local_search” are available.partial_dependence_plot
, get_partial_dependence_plots
for generating partial dependence plot.cohort_analysis
, cohort_table
, cohort_plot
for cohort (vintage) analysis and visualization.perf_table
, roc_plot
, ks_plot
, lift_plot
, psi_plot
for model validation drawings.In this version I have: * Fixed some potential bugs in get_names
, digits_num
In this version I have:
data_exploration
for data exploration.missing_proc
, outliers_proc
,get_names
lasso_filter
, AUC
&K-S
is added to select the best lambda. In this way, not only can the set of variables that makes the AUC or K-S maximized be selected, but also the multicollinearity (which is difficult to eliminate by AIC in stepwise regression), can be minimized. That means instead of stepwise regression, the optimal combination of variables can be selected by lasso to solve the regression problem.K-S
or AUC
values corresponding to different lambda.auc_value
ks_value
, which can calculate Kolmogorov-Smirnov (K-S) & AUC of multiple model results quickly.