BLUE
Profile banner
DG
David Gasquez
@davidgasquez.com
Data ∪ Engineering ∪ Science. Walks taker. Music enjoyer. davidgasquez.com
211 followers279 following129 posts
DGdavidgasquez.com

The end result is around 5 thousands `zstd` compressed Parquet files, taking less than 500MB in disk. 💫 The original CSVs take around 200GB and took some tinkering time to be able to download them all efficiently in only 3 to 4 hours.

0
DGdavidgasquez.com

I remember watching this talk and being blown away. Probably, the one with the biggest impact on my career, alongside the DDIA book!

1
DGdavidgasquez.com

Thanks! Lots of this is inspired by all your awesome and public work on Provider Quest! 🙌

0
DGdavidgasquez.com

We're now also publishing a database file to IPFS with all the tables. All the datasets, one curl command away! Check out this notebook on how to use it. 👇 colab.research.google.com/drive/1wbuhO...

0
DGdavidgasquez.com

If you're not using them and work with VS Code, check them out! See if you can add one of the templates to your repository and take it from there.

0
DGdavidgasquez.com

This approach showcases the wonders of a few things I'm excited about! 🤩 - Open Infrastructure (Parquet, IPFS, Dagster, ...) - Open Source (ETL, dbt models, IaC, ...) - Open Data (raw and curated tables) - Niche stuff like CIDs and compute over content addressed datasets.

0
DGdavidgasquez.com

This makes it even simpler to access and query the data. Remember you can query the tables even from your browser! For example, lets get all the providers by data onboarded to the network. csvfiddle.io#JTdCJTIyaXNU...

1
DGdavidgasquez.com

Sweet! Thanks for sharing.

0
DGdavidgasquez.com

Ohhh! What is the duckdbt secret 3rd thing? 🤓

1
DGdavidgasquez.com

Aaaarg! Now I want to fiddle with Nix again!

0
Profile banner
DG
David Gasquez
@davidgasquez.com
Data ∪ Engineering ∪ Science. Walks taker. Music enjoyer. davidgasquez.com
211 followers279 following129 posts