I used to read a lot of doom and gloom from the Data Science community on working with dirty databases instead of clean CSV files. Generally I agree with criticism toward how academia handles programming content, but this isn’t a hill to die on at all. On the surface the two seem significantly different because CSV files just need an input stream, whereas you need to learn SQL to work with databases. That being said, SQL can be pretty easy to learn and it’s actually super convenient. Working with both NodeJS and PHP I was able to sort the data before I started sticking it onto a page or into an array. What is the difference between a stream and SQL though?
The biggest difference between the two is the amount of work required to read data. Ultimately, it is generally not the incredible difference that people on LinkedIn love to advertise. There can be many different clauses or cases to a SQL statement, and that can trip people up. However, it is a high level programming language which means it is meant to be human readable. Generally with a good cheat sheet it should not be all that different than a “read_csv” function meant to process a CSV. However, instead of dropping columns after the CSV is read you can ignore them altogether in the SQL query. The steps are the same either way, but the way they’re handled is a little differently.