In this Data Science Tutorial, we will understand data science and its inter-disciplinary fields: data manipulation, data visualization, machine learning, and many more.
This is the age of data! As soon as you open your Facebook account, you are inundated with a huge amount of data. You get to see posts from your friends, which could be in the format of text, pictures, and videos. Now, just imagine if you could tap into this data and use it to gain insights, that would be just wonderful, wouldn’t it? And this is exactly where data science comes in. So, in this Data Science tutorial, we are going to dive into this magical field.
In this Data Science tutorial for beginners, we will start off by understanding what exactly data is! This entity called data is present all around us; it’s omnipresent like God! Simply put, data is just a collection of facts.
A bunch of numbers like -0.879 and 348 is data. When we say statements like ‘My name is Sam’ or ‘I love Pizza’, this again is data. A mathematical formula such as ‘A = ’ is nothing but data, and well, when it comes to computers, data is nothing but the binary code, i.e., 0s and 1s.
Become a Master of Data Science by going through this Online Data Science Course in Mumbai.
Now, why is this necessary?
Because this data has gone from scarce to super-abundant in the past two decades and will keep on increasing exponentially for the next two decades. Around two or three decades back, the data which we had with us was small, structured, and most of a single format, and then the analytics performed was quite simple.
But with the advent of technology, this data started to explode; multiple sources started to generate huge amounts of unstructured data of different formats. The data, which was of just a few kilobytes or megabytes earlier, started blowing up exponentially and, today, we generate around 2,500 zettabytes of data every single day!Now, a huge amount of data was being generated every second from every corner of the world, but we did not know what to do with it. In other words, we had a lot of data with us, but we were not trying to find out any insights from it. And this need to understand and analyze data to make better decisions is what gave birth to Data Science.
Now that we know what is the need of data science, we will move ahead in this data science tutorial and understand the concept of Data Science.
Data ManipulationLet’s say, you are working with an employee dataset that comprises 1000 columns and 1 million rows. Now, by just looking at the dataset, you would be overwhelmed. To make matters worse, your boss asks you to find out all the male employees whose salary is exactly $100,000. This definitely is a daunting task, isn’t it? So, how would you go about finding the solution? Would you manually go through each of these 1 million records and check the gender and salary of the employee? Well, that would be a time-consuming and stupid idea.
So, what is the solution to this? Well, this is where data manipulation comes in. With the help of data manipulation techniques, you can find interesting insights from the raw data with minimal effort. Let’s take this example to understand this better.
Now, from this dataset, I want to extract only those records where the age of the person is 50. So, let’s see how can we do this with the R language:
census %>% filter(age==50)So, all it took was one line of code and we were able to extract all those records where the age of the person is exactly 50. Now, just imagine, if you had to manually go through each of the 32,561 records to check the age of the person!! Thank god that we can manipulate data with just a single line of code.
Similarly, let’s say if I want to extract all those records where the education of the person is “Bachelors” and Marital Status is “Divorced”:
census %>% filter(education==" Bachelors" & marital.status==" Divorced")Again, just a single line of code, and we were able to get our desired result. So, with these examples, you can understand that data manipulation helps you to find insights from the data with the smallest amount of effort.
Now, let’s head onto the next sub-field in data science, which is data visualization.
Data VisualizationData Scientists are sometimes called artists, not because of their skills with the paint-brush but because they can actually represent the data in the form of aesthetic graphs. As they say, pictures speak louder than words, and obviously, you wouldn’t want to deal with excel sheets after excel sheets of data when you can visualize it with beautiful graphs.
This dataset comprises of different species of the iris flower: ‘setosa’, ‘versicolor’ & ‘virginica’, along with their ‘Sepal length’, ‘sepal width’, ‘petal length’ & ‘petal width’. Now, I want to understand what is the relationship between the ‘Sepal length’ & ‘Petal length’ of different species. So, by just looking at the data-set, we don’t really get to know about any patterns. So, this is where we can visualize the data.
Now, let’s go ahead and build a scatter-plot between ‘Sepal.Length’ & ‘Petal.Length’:
ggplot(data = iris,aes(x=Sepal.Length,y=Petal.Length,col=Species)) + geom_point()Now isn’t this just a beautiful depiction of the underlying data? So, this scatter-plot tells us that as the Sepal Length of the flower increases, its petal length would also increase. Not just this, we also see that ‘setosa’ has the lowest values of Petal Length and Septal Length and ‘virginica’ has the highest values.
Applications of Data ScienceData Science has a lot of real-world applications. Let’s have a look at some of those:
ChatbotsChatbots are basically automated bots, which respond to all our queries. I believe all of you must have heard of Siri and Cortana! They are examples of chatbots. These chatbots are perfect applications of Data Science and are used across different sectors like hospitality, banking, retail, and publishing.
Self-driving CarAnother very interesting application of Data Science is the self-driving car. This self-driving car is the future of the automotive industry.
Image Tagging
I believe all of you have Facebook accounts! Whenever you hover over a person’s picture, Facebook automatically tags a name to that person, and this again is possible with the help of Data Science.
Types of Data Science JobsIn this Data Science tutorial, you will not only learn Data Science but will also find out various job roles in the domain of Data Science which are listed as below:
Data Analyst
A Data Analyst is entrusted with the responsibility of mining huge amounts of data, looking for patterns, relationships, trends, and so on, and coming up with compelling visualization and reporting for analyzing the data to make business decisions.
Data Engineer
A Data Engineer is entrusted with the responsibility of working with large amounts of data. He/she should be available to clear data cleansing, data extraction, and data preparation for businesses for working with large amounts of data.
Machine Learning Expert
A Machine Learning expert is the one who is working with various Machine Learning algorithms like regression, clustering, classification, decision tree, random forest, and so on.
Data Scientist
A Data Scientist is the one who works with huge amounts of data to come up with compelling business insights through the deployment of various tools, techniques, methodologies, algorithms, and so on.
Learn Data Science more from our tutorial to get clear cut knowledge.