Develop skills - Data Scientists Competitions–Future Articles –Building a Team-Stock trading recommendations-Digital Store – AI Tools

Joined February 2026
55 Photos and videos
Pinned Tweet
1
167
Education & Skills Are Traditional Interviews Truly Fair? For decades, traditional job interviews have been the primary tool for hiring, yet they are often criticized as a source of unfairness. The reason is clear: interviews don’t always reflect candidates’ true abilities or potential. Instead, they tend to focus on confidence, appearance, or conversational skills during a short meeting. Where Traditional Interviews Fall Short Stress and anxiety: Many talented candidates fail to show their best due to nervousness or pressure. Language and communication barriers: Applicants who aren’t fluent in the interview language may be overlooked despite strong practical skills. Quick impressions: Interviews often evaluate charm or presentation more than actual competence. Interview experience: Some candidates master “perfect answers” but lack strong practical abilities. How Companies Are Correcting This Recognizing these challenges, many organizations now adopt fairer evaluation methods, such as: Practical skill tests: Writing code, preparing reports, or solving technical problems. Case studies: Presenting real-world scenarios and asking candidates to propose solutions. Portfolio reviews: Assessing previous work to validate expertise. Pilot projects: Assigning short-term tasks to measure actual performance.
3
Missing Values in Data and How to Handle Them? Missing values are among the most common problems encountered in data analysis. They occur when a value for a particular variable is unavailable in one or more records. Examples include failing to record a customer’s age, a participant not answering a survey question, or a measuring device malfunctioning during an experiment. An empty cell may appear to be a simple problem, but handling it incorrectly can lead to inaccurate results, the loss of a large amount of data, or the identification of false relationships between variables. What Are Missing Values? A missing value is a value that should have been recorded but is not available in the dataset. Missing values may appear in different forms in software, such as: An empty cell. The symbol NA. The symbol NULL. A question mark. Special numbers such as 999 or -1, when these numbers are used by the person responsible for collecting the data to indicate that no response was provided. It is important to confirm that numbers such as 999 are not genuine values before treating them as missing data. Causes of Missing Values Missing values may occur for several reasons, including: A person refuses to answer a particular question. Some information is accidentally not entered. An error occurs during data transfer. A measuring device malfunctions. Part of a file is lost. A question does not apply to certain participants. Some participants withdraw before the study is completed. Therefore, the first correct step is not to delete the missing values immediately, but to try to understand why they are missing. Types of Missing Data First: Missing Completely at Random — MCAR This occurs when the probability that a value is missing is unrelated to any variable in the dataset, whether observed or unobserved. Example: Some questionnaires are lost because of random damage to a storage device. In this case, deleting a small number of records may be acceptable because the records with missing values do not systematically differ from the remaining records. Second: Missing at Random — MAR This occurs when the missingness can be explained using other information that is available in the dataset. Example: Older people may be less likely to respond to an online survey. In this case, the missing responses are related to age, which is an observed variable. Methods such as multiple imputation or statistical models that use the available variables can be applied in this situation. Third: Missing Not at Random — MNAR This occurs when the probability that a value is missing is related to the missing value itself or to a factor that has not been recorded in the dataset. Example: People with very low or very high incomes may refuse to report their income. Therefore, the probability that income is missing depends on the income value itself. This is the most difficult type of missing data because the available information alone may not be sufficient to estimate the missing values reliably. The researcher must investigate the reason for the missingness, make different assumptions, and conduct a sensitivity analysis. How Is the Percentage of Missing Values Calculated? The percentage of missing values for each variable can be calculated using the following formula: Percentage of missing values = Number of missing values ÷ Total number of records × 100 For example, suppose a dataset contains 1,000 records, and the age value is missing in 80 records. The percentage of missing values in the age variable is therefore 8%. It is preferable to calculate the percentage of missing values for every column and every row rather than calculating only one percentage for the entire dataset. One column may contain a very high percentage of missing values, while the other columns may be complete. The Appropriate Decision Based on the Percentage of Missing Values There is no single scientific percentage that is suitable for every project. However, the following guidelines may be used as a starting point. Less Than 5% This is generally considered a low percentage. If the data are missing completely at random and the remaining number of records is sufficient, the records containing missing values may be deleted. However, records should not be deleted automatically when the variable is highly important or when the missing records belong to a particular group. From 5% to 20% This is considered a moderate percentage. Deleting all incomplete records is generally not recommended because it may result in the loss of a significant part of the sample. The following methods may be used: The median for skewed numerical variables in simple analyses. The mode for categorical variables. Imputation using regression or the k-nearest neighbours method. Multiple imputation in rigorous statistical studies. Replacing missing values with the mean without careful consideration is not recommended because it reduces variability and may alter the relationships between variables. From 20% to 40% This is considered a high percentage. Deleting records becomes more risky because it may result in a small or biased sample. Multiple imputation or maximum-likelihood methods are generally preferable. Auxiliary variables related to the missing variable or to the reason for its missingness should also be included. The results should be compared using more than one method, and a sensitivity analysis should be conducted. From 40% to 60% This is considered a very high percentage. The importance of the variable must be evaluated: If the variable is not essential and suitable alternative variables are available, deleting it may be the most appropriate decision. If the variable is necessary, the researcher should search for another source of data or attempt to collect the data again. If this is not possible, advanced models may be used, but the results should clearly indicate that there is a high level of uncertainty. More Than 60% This situation is generally considered critical. In many projects, deleting the variable may be safer when it is not essential because most of its information is unavailable. However, if the variable is the main subject or outcome of the study, deleting it is not an appropriate solution. It may be necessary to collect the data again or redesign the study. This does not mean that a variable with more than 60% missing values can never be used. The remaining information may still be valuable, but the decision must be supported by a clear scientific justification. Why Is the Percentage Alone Not Enough to Make a Decision? A missing-data rate of 10% may be more dangerous than a rate of 40% in another situation. For example, if 10% of income values are missing only among people with high incomes, deleting these records will produce an artificially low average income. In contrast, losing 40% of the values of a secondary variable may be less serious if the missingness is random and other variables are available to help predict its values. Therefore, the decision depends on five main factors: The percentage of missing values. The cause and type of missingness. The importance of the variable. The amount of data that remains available. The purpose of the analysis, whether descriptive, predictive, or inferential. Main Methods for Handling Missing Values Deleting Rows This method is suitable when the percentage of missing values is low, the data are missing completely at random, and the number of remaining records is sufficiently large. Its disadvantage is that it reduces the sample size and may introduce bias when the missingness is not completely random. Deleting the Variable This method may be used when the percentage of missing values is extremely high, the variable is not essential, and suitable alternative variables are available. Imputation Using the Mean, Median, or Mode This is a simple and quick method, but it does not represent the uncertainty associated with the missing values. The median is usually more appropriate than the mean when the data contain outliers or are highly skewed. The mode is suitable for categorical variables. However, these methods are generally not considered the best options for advanced statistical studies. Model-Based Imputation The missing value is predicted using other variables through methods such as regression, decision trees, or the k-nearest neighbours algorithm. This approach may be useful in machine-learning projects, but the quality of the imputed values depends on the strength of the relationships between the variables. Multiple Imputation Multiple imputation creates several versions of the dataset. In each version, different plausible values are inserted in place of the missing values. All versions are then analysed, and the results are combined. The main advantage of this method is that it takes uncertainty into account instead of treating a single predicted value as if it were certainly correct. Sensitivity Analysis Sensitivity analysis is particularly useful when the data are suspected to be missing not at random. It involves testing different assumptions about the missing values and examining whether the final results change significantly. If the results change substantially when the assumptions are changed, the findings should be interpreted with caution.
11
IFAI Technology Can the Human Brain Be Electronically Hacked in the Future? With the rapid pace of technological advancement, the discussion about hacking is no longer limited to electronic devices—it now extends to the most complex part of the human body: the brain. With the emergence of Brain-Computer Interface (BCI) technology, direct communication with neural signals has become possible, opening the door to fascinating—and frightening—possibilities. Between Science Fiction and Reality What was once considered science fiction decades ago is now edging closer to reality. Modern research aims to develop neural chips capable of reading thoughts or controlling prosthetic limbs through brain signals. Yet, this direct connection between the brain and machines also creates a new surface for cyberattacks. The Concept of “Brain Hacking” Experts refer to this potential threat as “neural hacking”—the possibility that an external entity could manipulate neural signals or steal brain data such as memories or behavioral patterns. Although this scenario remains largely theoretical, the existence of implanted or internet-connected devices makes it technically conceivable. Potential Threats Neural data theft: Extracting preferences or emotional responses. Behavioral manipulation: Sending signals that influence decisions or emotions. Unauthorized control of neural implants: Such as prosthetic limbs or neural stimulators. Security and Ethics To prevent such risks, scientists propose developing neural security systems similar to computer firewalls, alongside strict ethical oversight of brain technologies. Just as we protect our digital data, we must also safeguard our neural privacy.
6
DATA VISUALIZATION
1
8
IFAI Future Money & Investment The Most Effective Strategy for Building Wealth with a Small Amount of Money HAGO members agree that the best strategy for someone who has a small amount of money and wants to gradually build wealth over the coming years is to follow the Dollar-Cost Averaging (DCA) approach by investing in low-cost, diversified index funds or exchange-traded funds (ETFs). Consistent investing: Contributing a fixed and reasonable amount regularly—whether the market is up or down—reduces the average cost per share over time. Freedom from “market timing” stress: You don’t need to worry about picking the perfect moment to buy or sell. Fractional shares: Using platforms that allow fractional share purchases ensures every dollar is put to work immediately. Dividend Reinvestment Plans (DRIP): Dividends are automatically reinvested to buy more shares, boosting the power of compounding. The Power of Compounding Over the years, small monthly contributions grow into significant long-term capital thanks to accumulated dividends and reinvestments. This approach may seem “boring,” but it is disciplined, reliable, and anchored in broad market indices like the S&P 500, rather than speculative individual stocks.
1
29
IFAI Future Money & Investment members at HAGO foresee that the most promising long-term investment sectors in the stock market will be those driven by technological innovation, digital transformation, and environmental sustainability. As the world moves toward artificial intelligence and green energy, these industries are expected to lead global economic growth. Top Future Investment Sectors Information Technology (IT) This includes Artificial Intelligence (AI), Cloud Computing, and Cybersecurity. The growing need for smart solutions to protect and analyze data makes these areas among the most stable and high-growth investments. Healthcare and Pharmaceuticals With aging populations and the rise of digital health and telemedicine, this sector offers steady returns and long-term potential, especially for companies integrating technology into medical services. Energy and Renewable Energy The global shift toward clean energy supports companies producing batteries, lithium, and solar power. The increasing demand for electric vehicles and eco-friendly fuels makes this one of the most attractive sectors for future investors. Infrastructure and Industry Smart infrastructure, sustainable construction, and Industry 4.0 technologies (automation and digital transformation) are propelling this sector forward with strong growth prospects. Financial / FinTech The modernization of digital payment systems and banking technologies creates lucrative opportunities for investors in innovative and secure financial platforms. Key Steps Before Investing Check Management Quality: Ensure company leaders are ethical and competent. Review Financial Health: Choose firms with consistent dividends and low debt. Think Long-Term: Focus on sustainable growth rather than quick profits.
2
71
What are the different tasks of a Data Engineer, a Data Scientist, and a Machine Learning Engineer?
1
1
27
The difference between supervised learning and unsupervised learning?
1
18
Machine Learning What is the difference between Machine Learning and Deep Learning?
2
33
IFAI Future Money & Investment If someone wants to start an online technology company with a small budget, what are the best fields or projects that can begin with low costs and generate huge profits in the future? The answer is here skool.com/hago-8156 Share the question and earn money
26
Data Processing You have a column named City that contains a very large number of different city names, and you want to use it in a machine learning model. What potential problem might arise? Answer : Using categorical data like 'City' directly in a machine learning model can lead to several major problems. The main potential problems are listed below: High Cardinality: A column can contain hundreds or thousands of different city names. Using One-Hot Encoding will create a new column for each city, which greatly increases the size of the model and causes memory issues. Slowness and Overfitting of the model: The number of parameters in the model increases significantly due to the extra columns. This causes the model to take extra time and memorize small words or noise from the training data, but does not work properly on new data. Label Encoding Limitations: If the names of the cities are labeled alphabetically as 1, 2, 3..., the model may assume a mathematical relationship or hierarchy between the cities, which is actually incorrect. Out-of-Vocabulary (OOV): If the model does not see the names of the cities during training, the model may get confused and show an error during prediction.
36
deep neural network You are building a very deep neural network (Deep Neural Network) consisting of 15 layers for image classification, and you used the Sigmoid activation function in all the hidden layers. During training, you noticed that the weights in the early layers (close to the inputs) hardly change, and the Loss value decreases very slowly as if the model is not learning. What is the mathematical reason for this problem, and how can you solve it? The answer is here skool.com/hago-8156 Share the question and earn money
24