Why Data Structures and Algorithms Are Important to Learn
- Big Data Analytics
I’ll be honest when I say that the first look on my face when I heard algorithms and data structures was anything but a surprise. I already had computers at school and dabbled in LOGO and BASIC for close to 5 years. I knew the basic definition of algorithms but that was it. Anything beyond that was unchartered territory. Even in college, I didn’t really think algorithms had such an important role to play in building the technology infrastructure of the world. Marc Andressen wasn’t wrong when he once remarked, “Why Software is eating the world” in his cult blog. The world post that blog was never the same. Today the entire world is being run by algorithms and the fun fact is that we don’t ever realize it. Don Norman in his bestseller ‘The Design of Everyday Things’ made a bold claim by saying, “Good design is invisible”. To cut an analogy, it won’t be a far cry from reality if we say that, “Good algorithms are invisible”. For the first time in human history, machines are running the show though it’s still vaguely distant from ‘Superintelligence’ that Nick Bostrom talks about in his book. But before we even go there, let’s first delve deep into deciphering the invisible world of algorithms.
If you ask a high school kid what an algorithm is, chances are he’d come up with the standard text definition of an algorithm which describes algorithms to be a set of sequential instructions that can help accomplish a task or solve a problem preferably by a computer. Though that definition is quite bang on, but it would help add a few intricate details to that definition. Algorithms help resolve a task, but their efficacy is measured in time and space complexity. If an algorithm takes insanely less time to execute and occupies less storage space, it’s hailed as a good algorithm and vice versa.
So besides the functional objective that an algorithm is supposed to achieve, time and space complexity are also two important constraints that need to be kept in mind while designing algorithms. Today it’s a rarity to find folks in computer science who have read all the volumes of Donald Knuth’s ‘The Art of Computer Programming’.The books explain in entirety, how computer algorithms work and how their application could lead to the resolution of some of the biggest problems mankind is grappling with. How about I make it even simpler to understand through an easy to understand example. Let’s say you want to calculate the first 100 digits of a Fibonacci series. First let’s try deciphering what a Fibonacci series is. In a Fibonacci series every digit is a sum of the first two digits and the series always starts with 0 and 1. The following is an example of a Fibonacci series
0,1,1,2,3,5,8,13,21,.....,Xn(Xn being the last term)
If we were to compute such a series, then we would have to think of a formal logic that can help us generate the above series. The following is a piece of pseudocode that does the job
Step 1: Start
Step 2: Declare variable a,b,c,n,i
Step 3: Initialize variable a=1, b=1, i=2
Step 4: Read n from user
Step 5: Print a and b
Step 6: Repeat until i<n
6.1 c=a+b
6.2 print c
6.3 a=b, b=c
6.4 i=i+1
Step 7: Stop
The above algorithm accepts ‘n’ as a user input and then runs a loop till ‘n’ to generate all the terms of the series till ‘n’. The two pieces of logic that accomplish the said job would be the first part where we generate the next number ‘c’ by adding ‘a’ and ‘b’ and the second part where we reassign the value of ‘b’ to ‘a’ and ‘c’ to ‘b and then the loop keeps on running till we reach ‘n’.
The aforementioned example demonstrates how a set of instructions could lead to a problem resolution effectively. Having said that, algorithms running in real time computational scenarios aren’t as simple as the aforementioned. They tend to be quite complex. For example, the most frequently used algorithm the ‘Traveling Salesman’ has its implementation in various industries from food tech to supply chain to taxi aggregation and many others. Similarly Greedy Algorithms find their implementation in multiple industries. Take a look around and you’d find that beyond the layers of Governance and Capitalism, it’s algorithms that are ruling the world. Brian Christian in one of his seminal books, ‘Algorithms to Live By: The Computer Science of Human Decisions’ argues that algorithms are all around us at play. They are silently doing their jobs and solving some of the biggest problems for mankind. What’s even more enthralling is the singular fact that the entire universe is mathematically intertwined with an algorithm running everything at play. Take the example of ‘Natural Selection’. Nature in its evolutionary process has been using natural selection for bringing in and diminishing biological species that can’t cope with change as a constant. It’s surprising to find some species survive through ages and some species perish with the advent of change. Natural selection as an algorithm helped many of today’s species survive change. Likewise, every nook and corner of the Newtonian universe has some algorithm running which many astrophysicists are trying to unravel.
Now with the advent of AI, algorithms are learning and improving themselves. Wait!! but is that remotely possible for algorithms to learn? Let’s discuss how that might shape up. But before we jump to that, it is important for us to understand what data structures are. Data structures can be defined as a certain way of storing and organizing data. They make it easier to play with data in order to extract meaningful insight out of it. Let’s look at the following example to demonstrate -
Let’s say a company wants to save the names of all its employees. In Python it can be done using a list where list ‘L’ will be equal to the following
L=[emp 1,emp 2,emp 3,emp 4,emp 5,......,emp n]
This is just a simple example of an ordered data structure but there could be as many data structures as possible. The central idea is to be able to retain data in a format where it becomes convenient for the algorithm to feed on it to learn more. We live in a digital world which never stops talking. As more and more people are using the internet, there is a data explosion that is happening every single day.
Now with cloud computing that has reduced the price of storage exponentially, a lot more data can be stored at an phenomenally lower cost than what it was two decades back. That means for algorithms to formally work on these big datasets, we would have to ensure the data is in an acceptable format. That again means the data structures that need to be employed should be flexible enough to allow the necessary changes to the data before they could be fed to some machine learning algorithms. Today most of the work that’s happening in data science happens around data structures because the data wrangling process which involves data cleaning, manipulation, enriching and validation make extensive usage of data structures.
In a way, data structures and algorithms can somewhat be seen and perceived as Siamese twins. One can’t function without the other. Algorithms need data to apply their internal logic to solve a said problem.
When Clive Humby, the British mathematician, coined the famous phrase, “Data is The New Oil”, he was spot on. He had identified a data centric world where data would be used to make the most optimal decisions. Today every facet of human behavior could be explained using data and psychology. Today sophisticated algorithms are using our data to make predictions about our subsequent behavioral cycles. In fact, it’s almost astonishing to see how data science and behavioral psychology is being used to alter human behavior by marketers for large corporations and by campaign managers of political parties.
Shivam Shanker Singh in one of his seminal books, “How to win an Indian Election” explains descriptively how political parties use data to deduce voter behavior and sentiments and use micro targeting to target various socio cultural, religious and ethnic groups to alter their behavior.
Neel Mehta one of the co-authors of the bestseller, “Swipe to Unlock” talks about how a mix of data science, political acumen and data science is being used by political parties to win elections. In fact, if Sasha Issenberg who wrote the bestseller, ‘The Victory Lab’ is to be believed, in the entire evolutionary process of elections, there was never a foolproof method through which politicians could accurately gauge the mood of the diaspora and take relevant actions as it is today. The easy access to data through social media and various new generation tools and using sophisticated ML algorithms pave the way for some really accurate predictions through which voter sentiment can be discovered and altered. Donald Trump’s historic win against Hillary Clinton in 2016 is a classic example of how powerful it is when one combines algorithms with big data and behavioral psychology.
Now that we are clear why it is not an option but a mandatory requirement to learn algorithms and data structures, it’s important to understand how one can effectively learn algorithms and data structures. There are some amazing books that could be used to learn everything about algorithms and data structures like Cormen’s ‘Introduction to algorithm’ or ‘Structure and interpretation of computer programs’ by Harold Abelson but one has to have a clear idea about the problem statement and how to build an algorithm to solve the problem statement in the most optimal way. An easy alternative would be to go to NASSCOM’s FutureSkills Prime platform and take some courses on algorithms and data structures to get some familiarity. However, having said that, it takes years and years of dedication or 10,000 hours for one to master algorithms using an extensive amount of data structures. As Lao Tzu once said, “The journey of a thousand miles begins with one step”, so to traverse through thousand miles of computer science, one needs to cross that one step through algorithms and data structures. Happy learning!
About FutureSkills Prime:FutureSkills Prime started as a platform with a vision to upskill/reskill every Indian citizen in emerging technologies. A joint initiative of Ministry of Electronics and IT (MeitY) and National Association of Software and Services Companies (NASSCOM), it brings a synergy between the Government, Industry, Academia towards the eventual goal of making India a digital talent nation. A novel skilling program, it incentivizes the cost of the eligible course(s), providing authentic and accredited certifications acceptable in the industry.
Written By Saurabh Saha, Product Leader