Why on Earth is Data Science?

Why on Earth is Data Science?

Data science is a field that extracts insights and knowledge from data, to use in a broad range of contexts. It uses interdisciplinary methods, processes, and algorithms. Data science is an umbrella category for disciplines that include statistics, data analysis, machine learning and its closely related methods. The purpose of data science is to understand a subject through the use of data. Data science has been a booming field in recent years. Data science professionals have been sought by many companies for solving business complexities. Thus the job role of a can be defined as “A data scientist is a statistician who can code.” Data science is the study of how data can be used and analyzed in order to make new discoveries or can also be simply defined as “the discipline of making the data useful.”


Data science encompasses a broad range of data-related techniques and skills, from advanced analytics to ethics. In order to be an effective data scientist, one must also have concern for infrastructure development and pragmatics. The field requires high levels of scientific knowledge and skill in order to work with data sets in the real world. Data science involves principles, processes, and techniques for understanding phenomena via the (automated) analysis of data. From the perspective of this article, the ultimate goal of data science is improving decision-making, as this generally is of paramount interest to business. Data-driven decision-making (DDD) refers to the practice of basing decisions on the analysis of data rather than purely on intuition. For example, a marketer could select advertisements based purely on her long experience in the field and her eye for what will work. Or, she could base her selection on the analysis of data regarding how consumers react to different ads. She could also use a combination of these approaches. DDD is not an all-or-nothing practice, and different firms engage in DDD to greater or lesser degrees.


“Big data” technologies, such as Hadoop, Hbase, CouchDB, and others have received considerable media attention recently. As with traditional technologies, big data technologies are used for many tasks, including data engineering. Occasionally, big data technologies are actually used for implementing data-mining techniques, but more often the well-known big data technologies are used for data processing in support of the data-mining techniques and other data-science activities. One way to think about the state of big data technologies is to draw an analogy with the business adoption of internet technologies. In Web 1.0, businesses busied themselves with getting the basic internet technologies in place so that they could establish a web presence, build electronic commerce capability, and improve operating efficiency. We can think of ourselves as being in the era of Big Data 1.0, with firms engaged in building capabilities to process large data. Managers in enterprises without substantial data-science resources should still understand basic principles in order to engage consultants on an informed basis. Investors in data-science ventures need to understand the fundamental principles in order to assess investment opportunities accurately. More generally, businesses increasingly are driven by data analytics, and there is great professional advantage in being able to interact competently with and within such businesses. Understanding the fundamental concepts, and having frameworks for organizing data-analytic thinking, not only will allow one to interact competently, but will help to envision opportunities for improving data-driven decision making or to see data-oriented competitive threats.


Underlying the extensive collection of techniques for mining data is a much smaller set of fundamental concepts comprising data science. In order for data science to flourish as a field, rather than to drown in the flood of popular attention, we must think beyond the algorithms, techniques, and tools in common use. We must think about the core principles and concepts that underlie the techniques, and also the systematic thinking that fosters success in data-driven decision making. There is strong evidence that business performance can be improved substantially via data-driven decision making, big data technologies, and data-science techniques based on big data. Data science supports data-driven decision making—and sometimes allows making decisions automatically at massive scale—and depends upon technologies for “big data” storage and engineering. However, the principles of data science are its own and should be considered and discussed explicitly in order for data science to realize its potential.


~Chirag Ferwani

Comments

Popular posts from this blog

Python: while(!(succeed=try()));

Brain Drain from India : "Better a brain drain than a drain in the brain"