Predicting Purchasing Probability of E-Commerce Customers

Purpose : Understanding consumer behavior and anticipating their purchase likelihood is crucial for businesses to flourish in today’s competitive e-commerce market. Methodology : This paper describes a data-driven technique for predicting the possibility of e-commerce clients completing a purchase. The research begins with a thorough assessment of the current literature, emphasizing the importance of consumer behavior prediction in the e-commerce arena and explaining the difficulties associated with effectively anticipating purchase probability. Findings : The outcomes of this study provide a substantial contribution to the e-commerce business by giving concrete ideas for increasing customer interaction, optimizing marketing efforts, and tailoring personalized experiences. Unique contributor to theory, policy and practice: Based on this data research, people prefer mobile applications over websites for their online purchase needs due to search ability, accessibility, and other aspects.


Predicting Purchasing Probability of E-Commerce Customers Introduction
Identifying and forecasting consumer behavior has become critical initiatives for organizations wanting to maximize sales, improve marketing tactics, and create tailored experiences in the changing environment of e-commerce.Anticipating a consumer's chance of making a purchase is critical for personalizing marketing efforts, managing inventory, and boosting customer interaction.
Each year, the number of Internet users grows, resulting in an increase in the volume of online purchases (e-commerce) by 15-20 % [1], greatly surpassing the expansion of traditional brick-and-mortar commerce.The primary incentives for purchasing online are said to be low pricing, time savings, and convenience [2].Similarly, a significant portion of today's e-commerce research investigates strategies to boost online shopping income, notably through a better knowledge of consumer online behavior.
Research in online commerce is heavily focused on identifying and understanding the factors that drive behaviors of users, since such findings can have immediate and direct impact on the sales volumes.One of the long-established principles in the field has been that the more time a prospective customer spends in a visit, the more likely is the purchase decision.

Related Work
Visit duration has long been considered a crucial performance metric in e-commerce, boosting conversion rate and indicating commitment to the e-tailor.To quantify the financial return on website visit duration, data from 94 online retailers were statistically analyzed and numerous hypotheses were tested.However, overall a plausible model of user purchasing decision-making has been developed.The PLS (General Partial Least Squares) simulations obtained did not indicate a relationship between website ranking and visit duration.A type II to bit model was put forth by the authors of an even more reputable study [3] to indicate the visitors' decisions to stay on the website or leave, as well as the amount of time spent viewing each page.The findings revealed that the likelihood of visitors to stick around and browse varied dynamically according to the length of their site visit and the frequency of their return visits.In [4], the authors investigated a number of groupings of variables affecting how long a website visit lasts.To estimate the impact of these characteristics on the user's visit time and number of pages viewed, a random effects model was used.The findings revealed that older persons and women spend more time on websites, while most of the sites that featured lots of advertisements got visited less frequently.
The above study's findings, along with those from related research projects, have been deemed useful in enhancing (optimizing) e-commerce websites to increase average visit durations and, presumably, sales.For the reasons listed below, the" time on site" measure has been significant in internet business for more than 20 years.The more time a person spends browsing the website's store, the more interested she becomes in its products [5], and this interest should result in a purchase.Furthermore, during the past few years it has been highlighted time and time again that mobile commerce is expanding even more quickly than e-commerce, with a reported annual rate of 20-35% [6].More than half of online store visits currently take place on mobile devices, and sales are anticipated to catch up [7].However, it is still underexplored in the study that has come to our attention as to whether the elements influencing the purchase-making behavior of potential online customers who utilize mobile applications are the same as for websites.

Data Source
The dataset that we used in the study is the E-commerce Dataset, openly accessible through Kaggle repository.This dataset offers information on customers who prefer to purchase online.The dataset consists of eight attributes, making up a total of 500 different customer data.The time spent on the website and the time spent on the mobile application are our independent variables.The store's annual income (purchases made on the online store) is the dependent variable.

Method of Analysis
The method of analysis starts with the assumptions of the model.The assumptions are made based on the pairwise scatter plot Figure 3. From the pairwise scatter plot we see that the data points are normally distributed and there are no potential outliers which means that the variance is constant.Also, the variables are highly correlated with each other.The null hypothesis for this problem is that the Yearly Amount Spent is affected by none of the variables and the alternative hypothesis is that at least one of the four variables affect the Yearly Amount Spent.Performing the correlation matrix will show which variables are highly correlated.Figure 2 displays the frequency of yearly amount spent through a histogram.
Figure 2: Histogram -Yearly Amount Spent Vs Frequency From the correlation table Figure 4 we can see that the Yearly Amount Spent and Length of Membership are highly correlated.With that information, the regression equation for Yearly Amount Spent with all the four independent variables is Figure 5.
For the regression equation in Figure 5 the ANOVA table is given in Figure 6.
Lastly, upon performing influence analysis using leverage points, Cook's Distance and DF-FITS, we utilized the leverage points and removed few data points(as per Cook's Distance and DFFITS).This made our final model robust with optimal solutions.There are a few methods for performing variable selection, and one is to incrementally fit models with higher and higher order terms until the t test for the highest order term is nonsignificant.The method is known as forward selection.With the three independent variables average session length, time spent using the app, and membership length, we are able to select the optimal model looking forth to determine the annual amount spent.

Results
Based on our analysis from the previous section, the final regression equation is mentioned in Figure 7 .
From the Coefficient table in Figure 8, we observe that P-value is less than the significance level of 0.5.Thus, we accept the alternate hypothesis that the response variable is dependent on at least one of the predictor variables.In our case, the response variable (yearly amount spent) is dependent on 'Length of Membership', 'Average Session Length' and 'Time on App'.From the residual plot Figure 9, we deduce that the least squares regression line lies exactly along a straight line, indicating normal distribution.Also all the variables are independent as there is no multicollinearity observed from the Correlation matrix and the VIF values.
After performing all the analysis steps which includes hypothesis testing, variable assump-tion and independence, multicollinearity verification, variable selection; we arrive at the best model for this problem and dataset.98.42% variability of 'Yearly Amount Spent' can be explained by Adjusted R-square value from the model summary table.The final coefficients table (Figure 8) shows that, when all other factors are held constant, a one-unit increase in average session length corresponds to a 25.721 increase in total spending.Holding all other aspects constant, an increase of 1 unit in Time on App results in an increase of 38.746 dollars in total expenditures.Keeping all other parameters constant, a 1 unit increase in Length of Membership corresponds to an increase in total spending of 61.556 dollars.The confidence and prediction bands for time spent on app versus annual amount spent are shown in the figure 11.

Conclusion
We determine for an e-commerce company whether to concentrate their efforts on their mobile app experience or their website based on the findings of the performed linear regression.The data description makes it apparent that the user might gather all the information required to make a buying decision during one visit, and then make the actual purchase at a subsequent, shorter visit.
In several related studies, for instance [8], the authors conducted a study contrasting purchases made using a mobile application and website.According to the data analysis's findings, users choose mobile applications over websites for their online purchasing needs in terms of searchability, accessibility, and other factors.This may indicate a general movement in consumer purchasing behavior toward mobile commerce, whilst websites (browsed on desktop computers) are frequently used for different purposes.In any event, we want to emphasize that, despite being a somewhat indirect measure, the amount of time customers spend in online businesses is nevertheless significant.Firstly, it helps to keep customers interested in the store, which may result in future purchases [9].Second, taking into account how long a visitor stays on a website is crucial when placing adverts, which will increase revenue [10].On the other side, it's possible that excessive web advertising is what drives consumers to mobile applications by reducing the usefulness and aesthetic appeal of websites.One of our future research priorities includes taking these mixed impacts into account when modeling consumer behavior for online stores.Therefore, the conclusion is that e-commerce businesses should concentrate on the mobile app experience rather than the website experience.
As e-commerce rules change, it becomes increasingly important to include predictive analytics into decision-making processes, and our research sets the framework for such strategic deployments.This study sheds light on the subtle correlations between consumer traits, browsing patterns, and purchase probability by utilizing advanced linear regression techniques.
Finally, this research work not only increases theoretical knowledge but also provides concrete consequences for policy formation and actual implementations in the e-commerce industry.The insights created pave the way for better informed decision-making, individualized consumer encounters, and improved business strategies, all of which contribute considerably to the development and long-term viability of e-commerce operations.

Figure 11 :
Figure 11: Confidence and Prediction Bands