Every single day, we answer questions about data analyst skills. Some people ask me, “I only know how to use Excel and make analysis charts since I started. To me, data analyst seems to be an analyst of business data, and I don’t know how to improve myself.”
This is because he did not fully dig out the value of big data analysis. Data analysis is to optimize products, marketing strategies, and operation strategies through discovering the data. Only knowing the business is not enough, the more important is to master various skills of data analysis. Based on my years of experience, I summarized ten skills that a qualified and senior data analyst must master.
Statistics：law of large numbers, rank sum test, regression, forecast
Data Visualization Tools: excel, professional software, python
Big Data Processing Framework: Hadoop, storm, spark
Databases: SQL, MySql, DB
Data Mining Skills: Matlab, R, python
Programming Language: Java, python
data analyst skills
As we all know, statistics is the cornerstone of data analysis. Statistics, of course, is also a core skill of data analysts. After you study statistics, you will find that in many cases, the analysis is not so accurate. For example, many people like to use the average to analyze the results, but this is often rough. Statistics can help us look at the data more scientifically and gradually get closer to the “truth” behind the data.
Systematic learning is the key point to learn statistics well. Pure machine learning emphasizes the predictive ability and implementation of algorithms, but statistics have always emphasized interpretability. You need to understand the principles behind the data. The following statistical methods are all you can learn.
Statistical methods related to data mining: multivariate logistic regression analysis, nonlinear regression analysis, discriminant analysis, etc.
Quantitative methods: time axis analysis, probability model, optimization
Decision analysis: multi-purpose decision analysis, decision tree, influence diagrams, sensitivity analysis
Analysis of competitive advantages: learning basic analytical concepts through projects and success cases
Database principles: data model, database design
Predictive analysis: time axis analysis, principal component analysis, nonparametric regression, statistical process control
Data management: ETL (Extract, Transform, Load), data governance, management responsibility, metadata
Optimization and heuristics: integer programming, nonlinear programming, local exploration, super-inspiration (simulated annealing, genetic algorithm)
Big data analysis: learning of unstructured data concepts, MapReduce technology, big data analysis methods
Data mining: clustering (k-means method, segmentation method), association rules, factor analysis, survival time analysis
Computer simulation of risk analysis and operational analysis
Software-level analytics: analytical topics at the organizational level, IT and business users, change management, data topics, presentation, and communication
2. Data Visualization Tools
Data visualization is mainly realized through two types of tools: programming and non-programming. For data analysts in the general industry, it is not necessary to master the visual tools of programming. I will recommend the following tools:
Excel is a common data display tool. For data analysts, skills need to master in Excel includes being proficient in presenting data in Excel charts and knowing how to set formats for a series of generated charts.
Learning excel is a gradual process. For example, beautify series format, enhance 3d format, set the axis and grid.
Charts can be used in conjunction with functions or macros to produce analog charts or advanced charts with interactive effects, such as population distribution of cities and counties on a map, for better data analysis and viewing.
The data analysis function in Excel can largely complete the data analysis of professional statistical software (R, SPSS, SAS, Matlab), including descriptive statistics, correlation coefficient, probability distribution, mean estimation, linear, nonlinear regression, multiple regression analysis, time series, and other contents.
Familiarity with the various features of Excel is essential for a good data analyst.
2.2 Professional software
Data visualization tools like D3.js, HighCharts, Tableau and PowerBI all have their advantages. You must be proficient in at least one visualization tool. If you want to know which tools you can learn, you can read this article 9 Data Visualization Tools That You Cannot Miss in 2019or Compare 6 Types and 14 Data Visualization Tools
9 Data Visualization Tools That You Cannot Miss in 2019
In my case, I often use FineReport at work. As a reporting and data visualization software, FineReport has two core functions: data entry and data display. However, I think it is quite surprising that it has a large number of built-in charts and visualized dynamic effects. It can make a variety of dashboards in various formats, and even a large screen like TV dashboards.
Those who have learned Python data analysis know that there are many excellent third-party libraries in visual tools, such as matplotlib, seaborn, plotly, Boken, pyecharts, etc. These visual libraries have their advantages and are widely used in practical applications. Mastering Python will be the right choice as data analyst skills.
Python Weekly：Update every week, including blogs, tutorials, lectures, books, careers, and so on
Python challenge: A game which each level can be solved by a bit of(Python) programming
Python official website: Have rich documentations
10 python blogs worth following: 10 blogs to learn about Python
3. Big Data Processing Framework
If you want to be released from the business and become a big data analyst, understanding the basics of the big data framework is also the required skill for data analysts.
A big data processing framework is responsible for calculating the data of a big data system. Data includes data read from persistent storage or access to the system through message queues, while computation is the process of extracting information from the data.
The system can be classified as a batch processing system, stream processing system, and hybrid system according to the data form and timeliness of the obtained results. A typical batch processing system is Apache Hadoop; The Typical stream processing system includes Apache Storm, Apache Samza; Apache Spark, Apache Flink are Hybrid processing systems.
Some data analysts are responsible for data cleaning, which is relatively simple. Some others do molding, but it is not enough to master machine learning algorithms that commonly used if you want to be outstanding. To be first-class, we need to learn the essence of each algorithm, that is, to master the foundation of the database.
SQL is the core technology in the database. You need to pay attention to it when learning the data analysis.
Currently, MySQL, SQL Server, and Oracle are still the most widely used databases. The followings are the statements and functions data analysts must know.
list of popular databases
5. Data Warehouse
When analyzing data, we always come across some terms, such as data warehouses. The data warehouse is vital in data analysis. It is a theme-oriented, integrated, relatively stable data set that reflects historical changes.
The most important in data analysis work is data processing. According to my experience as a data analyst, the time for data processing often takes up more than 70% in the whole process of data analysis. And the data warehouse has the benefits of integration, stability, high quality. Doing data analysis based on the data warehouse can ensure data quality and data integrity. That is why understanding data warehouses is one of the data analysts’ skills.
6. Artificial Intelligence (for advanced data analysts)
Strictly speaking, artificial intelligence and data analysis have a clear boundary. They do not belong to the same field. Therefore, this skill is required for big data analysis scientists. You can skip this chapter if you are a newbie.
The knowledge covered by machine learning and artificial intelligence is too broad and deep, so it is better to adopt the learning method of problem-based learning. First, select the problem. Then, find resources to solve the problem and further understand the nouns and knowledge encountered in the process of solving the problem.
7. Machine learning
Machine learning is a branch of artificial intelligence. Machine learning algorithms are a class of algorithms that automatically analyze and obtain rules from data and use rules to predict unknown data. It has been widely used in data mining, computer vision, natural language processing, search engine, medical diagnosis, securities market analysis, and other fields.
8. Data Mining Skills
Operating data mining software is one of the necessary skills for data analysts. It is the core application of most business intelligence initiatives, and data mining software can help you discover insights from large amounts of data
Mastering development skills and the advanced algorithm leads to the perfect data mining skills for data analysts. Many people it is hard. In fact, the algorithm is not complicated. It will be much easier when you combine algorithm with an actual business background and be problem-solving oriented.
Data Mining Skills
The mainly including classification algorithm, clustering algorithm, correlation analysis, connection analysis, etc., is the study that must be mastered the basic algorithm of data mining.
Three books you can’t miss:
Pattern Recognition and Machine Learning
The Elements of Statistical Learning
9. Programming Language
Proficiency in some programming languages can make data analysis work more flexible. Programming languages are suitable for all types of data. Most of the new and amazing dashboards can be implemented with code or drawing software.
Which language to choose? It depends on the situation,
If you do massive data analysis on the obscure statistics, R will be your best partner. If you do NLP or intensive neural network processing across the GPU, Python is better. If you want a rugged, production-oriented data flow solution with all the important operational tools, Java or Scala is a perfect choice.
Take R as an instance. The R language is the most favorite analysis software for the statisticians. It is open-source and free, and its graphics function is very powerful.
R is designed for data analysis. And it was initially intended for statisticians and data scientists. However, due to the increasing popularity of data analysis, the use of the R language is not limited.
The use flow of R is obvious. Many toolkits support R. Just load the data into R and write one or two lines of code to create the data graph. For example, use the Portfolio toolkit to create the following hierarchy diagram quickly.
10. Write Reports
The ability to write reports is also one of the critical analyst skills.
Writing data analysis reports is summarizing and presenting the whole process of data analysis. Through the report, the causes, process, results, and advice of the data analysis are fully presented for the consideration of decision-makers.
So, What is a good data analysis report?
An excellent analysis report must have a good analytical framework, a definite conclusion, suggestions or solutions.
In addition to the above hard skills, soft skills such as data sensitivity, logical thinking ability, inductive ability, critical thinking ability, communication ability, and responsibility ability are also essential skills of excellent analysts. Besides, if analysts can think from a higher angle or from the stand of managers, they can stand out among numerous analysts.
Some of these skills are acquired before entering the workforce, while others need to be built up after entering the industry. To grow into an excellent data analyst requires a robust professional quality and technical ability, which can not be achieved in a short period. It requires continuous growth in practice.
Assistant Professor, IT Department