3 Tools that I Used the Most in My First Year as a Data Scientist
I think it will be the same for the upcoming years.

The data science ecosystem has a ton of software tools and packages which is a good thing because such tools expedite and simplify our workflows.
I’m sure we all are glad to have these tools in our lives. However, having a rich selection of them might be a disadvantage if not used wisely.
Based on what I have experienced and observed in the last 3 years, I can say that we tend to learn more than necessary. Instead of distributing your time and energy among a high number of tools, I recommend focusing on a small subset of them.
This is the reason why I wanted to write this article and explain the 3 tools that I used the most in my first year as a data scientist. Improving your skills on these tools will increase your chance of landing a job dramatically.
Let’s start with the obvious one.
Pandas
Python dominates the field of data science and so do Python libraries. Pandas is a data analysis and manipulation library for Python. Considering a substantial amount of time in a project is spent on cleaning and preprocessing the raw data, Pandas might be the most frequently used Python library.
Pandas is a highly efficient library to work with tabular data. I don’t recall a problem for which Pandas could not provide a solution.
Another advantage of Pandas is having a clean syntax. It is intuitive and easy-to-read syntax just like most Python libraries.
Pandas makes it quite easy to perform the most frequently done operations on tabular data which are as follow:
- Reading data from an external file (e.g. CSV or parquet)
- Checking the size of data
- Changing the data types if necessary (e.g. should not store numbers as string)
- Finding and handling missing values
- Filtering based on a condition or a set of conditions
- Exploratory data analysis
Although Pandas has numerous functions and methods, there is a small subset of them that you will use the most. Here are the 8 Pandas functions I used the most.
SQL
SQL is used for managing data stored in relational databases. A relational database consists of several tables that are related by means of shared columns.
Most companies store their data in relational databases so it is definitely a must-have skill for data scientists, engineers, and analysts. I have used SQL almost everyday in my first year as a data scientist and I think I will keep using it as frequently as before.
Although SQL stands for Structured Query Language, it is capable of doing much more than just querying a database.
SQL is also a data analysis tool. It is capable of doing most of what Pandas can do. If the data is stored in a relational database, it is more practical to do the analysis using SQL instead of exporting all the data and then using another tool for analysis.
You can also automate routine operations by writing SQL scripts as stored procedures. Here is an example of what a stored procedure can do:
- Read data from a few tables and filter if necessary
- Transform and reformat if necessary
- Combine data from multiple tables based on the requirements
- Write the transform data into a new table
There are many relational database management systems such as MySQL, PostgreSQL, SQL Server, and so on. Although they mostly use the same SQL syntax, there are some minor differences. For instance, MySQL uses the limit keyword to limit the number of rows to be displayed whereas SQL Server uses the top keyword.
Tableau
Tableau is a visual analytics platform. It makes it easy to create informative dashboards that can be used for understanding the data, evaluating results, and delivering results to customers.
If you work at a SaaS or consulting company, dashboards are of crucial importance. This is how you explain your solution and results to the customers. You cannot just send them csv files with plain numbers.
A lot of companies use Tableau for creating business intelligence dashboards as well. You can combine data from a variety of sources and make a summary of how your company is doing.
Tableau is the market leader in this domain. The other popular one is Power BI. There are also some open source alternatives such as Grafana. What they aim to do are the same but Tableau seems to be the leading player.
In my first year as a data scientist, I have created dashboards and updated existing ones based on customer demand. Either you work as data scientist or data analysts, you will most probably use Tableau or a similar tool. From what I observe in the community and job postings, Tableau is the most-demanded one.
Tableau supports many different types of data visualizations. It also can connect directly your data source so that you do not have to export the data manually.
Pandas, SQL, and Tableau are the 3 tools that I have used the most in my first year as a data scientist. I think this trio will keep their place for the upcoming years.
You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
Thank you for reading. Please let me know if you have any feedback.





