Wherever the response is binary, one of the most often utilized regression approaches in the business, which is widely employed across fraud detection, credit card scoring, and clinical trials, has a significant benefit. One of the key advantages of this popular technique is that it allows for the inclusion of many dependent variables, which can be continuous or dichotomous. Another significant feature of this supervised machine learning approach is that it offers a quantifiable value to quantify the strength of correlation in relation to the remaining variables. Despite its popularity, experts have pointed out its shortcomings, including a lack of rigorous approach as well as a high model reliance.

Today, organizations utilize Logistic Regression to estimate home prices in the real estate industry, customer lifetime value in the insurance industry, and to provide a continuous result such as whether a client can purchase/will buy scenario.


  • from sklearn.linear_model import LogisticRegression

Full Code Exercise of Logistic Regression

The dataset used in this study is from Kaggle and is freely available as the Fake and true news dataset. There are two CSV files in this data collection, one for factual news and one for fraudulent news. Each has the following attributes: title, text, subject, and date. The genuine and false CSV files include 21417 true news data and 23481 fraudulent news data, respectively. To train the model for classification, we will add a column labeled "real" or "fake" to both datasets. You can donwload here.