Monday, May 31, 2021

Data analysis with MS Excel

This article will explain you how excel features are used to data analysis process: 

1. Sort: You can apply sort to one or multiple cells to get expected data. You may sort data A-to-Z or Z-to-A manner. This is very simple and easy method for data analysis. Custom sort has advance features like sort by cell colour, font colour and cell icon.

2. Conditional Formatting: Conditional formatting is very rich functionality for data analysis in Excel that enables user to highlight cells with different colours, depending on the cell's value/percent.  By applying conditional formatting to your data, you can quickly identify variances in a range of values with a quick glance. Following example will give quick answer about high temperature value.

3. Filter:  Filter your data in excel by certain criteria to get expected output. Filtered data displays only the rows that meet criteria that you specify and hides rows that you don't want displayed. You can apply filter to multiple columns at a timeAfter you filter data, you can copy, find, edit, format, chart, and print the subset of filtered data without rearranging or moving it.

4. ChartsA picture is equal to thousand words so generating chart with a sheet full of numbers is easy to understand. This is very useful and easy tools for data analysis and to create visualizations. There are so many different charts we can create for data analysis.

5. Tables: Table will allow you to analyse data easily and quickly.
6. Pivot Tables: Pivot Table is very powerful and easy to use data analysis tool in excel. You can extract information from rows and columns by inserting pivot on that data.  

7. Analysis ToolPak: This is an excel add-in program used to data analysis for statistical and engineering data analysis.
8. Solver: 
Excel is having one more tool which is called solver. This uses techniques from the operations research to find optimal solutions for all kind of decision problems.

What is Statistics

Statistics is defined as study of the collection, analysis, interpretation, presentation, and organization of data. Statistics have so many definitions when applying it to a scientific, industrial, or social problem. 

Some are the popular definitions:

“Classified facts representing the conditions of a people in a state, especially the facts that can be stated in numbers or any other tabular or classified arrangement” by Merriam-Webster dictionary". 

“Numerical statements of facts in any department of inquiry placed in relation to each other” as per Statistician Sir Arthur Lyon Bowley. 

Statistics as branch of mathematics pertains to the collection, analysis, interpretation and presentation of data. Some scientist considers statistics to be a distinct mathematical science rather than a branch of mathematics. And some considers statistics is concerned with the use of data. Mathematical statistics is the application mathematics concept in statistics. Use of Mathematical techniques to collection and analysis of data includes mathematical analysislinear algebra and differential equations.

Sunday, May 30, 2021

What is data mining?

Here are some important Data Mining definitions: 

Mining means extracting something to find out important result. e.g., mining earth for extracting diamond, gold or coal. Data mining is an interdisciplinary sub-field of computer science. It is the computational process of discovering patterns in large data sets. Data mining tools predict about future trends and customers behaviours. 

Data mining (sometimes it is called as data or knowledge discovery) is the process of extraction of raw data for different perspectives and transforming it into useful information. Further this information can be used to increase revenue, cuts costs, or both. This helps to most of companies to focus on the most important information and allowing businesses to make proactive, knowledge-driven decisions. 

Data mining is goal of extraction of patterns and knowledge from large amounts of data, not only the extraction of data. It is also information processing which include collections, extraction, warehousing, analysis, and statistics of data. 

Uses of data mining:

Data mining has been used in many applications. Some of the notable examples where data mining can be used are business, medicine, science, and surveillance.  

Data mining mostly used for business to analyse the historical business activities, stored as static data in data warehouse databases. Main goal is to reveal hidden patterns and trends to discover unknown strategic business information to increase business and revenue. 

Data mining has been widely used widely in the areas of science and engineering field, such as bioinformatics, genetics, medicine, education, and electrical power engineering. 

Spatial data mining is the application of data mining methods to spatial data. Data mining offers great potential benefits for GIS-based applied decision-making and there are so many more examples where data mining is used now days. 

Sources of data for mining:

World going toward digital, data is increasing in size day by day, and it is difficult to store this data. Some of the below sources that is increasing data day by day in our daily life. 

1.Real time data captured in one electronic system (e.g., CCTV, Cameras, and Satellites)

2.Government reports

3.Historical data and information

4.Mass media products

5.Web information

6.Social media sites

7.Government Official statistics

8.Observation

9.Medical or scientific research data

Saturday, May 29, 2021

Skill set required to be a successful Data Analyst

In this blog I will explain you about what skill sets are needed to be a successful data analyst. This will help you to prepare interview or once you joined as data analyst it will help you to became successful.  

1.Microsoft excel:

First skill is Microsoft excel, which is widely used for data entry, Data management, financial analysis, Accounting, Task management, making chart and graph, it is perfect for small data sets, so data analyst must be aware about is formulas, functions, features, macros, pivot table, creating chart and graphs are most used to visualized data.

2.SQL (Structured Query Language):

SQL is used to communicate with a database such as update data on a database or retrieve data from a database. There are three most popular database where we can use SQL to update data on a database or retrieve data from a database. Oracle Database, Microsoft SQL Server, MySQL. So having hands on experience on SQL for data extraction, data collection.

3.Data Visualization Tools:

You should be familiar with at least one Data Visualization Tools. There are so many Data Visualization Tools are in market such as Microsoft Power BI, Tableau, Qlik Sense, Looker, IBM Cognos Analytics, Sisense, Tableau and Microsoft Power BI are the data visualization tool that can be used by data analysts, scientists, statisticians, Microsoft Power BI, Tableau are mostly used to many data sources to create interactive dashboard and reports. Such dashboard and reports provide us actionable insights from row data.

4.Industry Domain Knowledge:

You should have good knowledge about market or industry. There is multiple industry domain in market such as Banking, Finance Services and Insurance, E-commerce, IT industry, Retails, financial, health care, Industrial Manufacturing, Automotive & Transportation, Online Services and marketing, Telecommunication.  Having computer skill is not enough for successful data analyst. We should have domain knowledge to whom you are working for. We need to understand row data and need to create chart and graph that can help management to future actions.

5.Presentations and communications skills:

As a data analyst we need to work with different departments, stake holders, engineering teams. We need to communicate with manager for data gathering as well as provide them with data insights that helps to predict business movement. So having command over communication skill and presentable will help us to be successful data analyst.

6.Statistical programming language: 

Experience in any statistical programming language will help you most as data analyst. It is not that much necessary, most of the companies do not require this skill set but that will help you more. Python is the most widely used data science programming language now day in most of industries. JavaScript, Scala, R and Julia are most popular programming languages used by data scientists or data analyst.

7.Statistics and Mathematical 

As a data analyst do need to be good with numbers, Statistics is science about data analysis, data collection, interpretation, and presentation on data. Having knowledge in Statistics help to make decisions based on data and make predictions. Essential Mathematical topics for a data analyst include calculus, and discrete mathematics, algebra (basic & linear).