I refuse being polite to a bag of weights
As a consequence data versioning becomes code versioning. As raw data files are the same for everyone, tracking changes in the sql queries via got ensures reproducibility.* *in a research environment where “raw” data is in fact curated and immutable
At Nationalbanken our best practice for FS analyses in DST was - raw (sas) files transformed in parquets as is - data flows that load data in ram ready for analysis packed in a (series) of sql queries referring to the parquet files. Thanks to DuckDB, that’s seconds
Lack of training imho. It’s the sign of someone thinking of a linear storyline. Narrating a story. Makes sense in theory, but doesn’t work in practice.
Super link! Another killer feature of DuckDB not mentioned there which still feels like black magic is that you can join an existing pd.DataFrame in memory with a eg parquet file like df = get_data() duckdb.sql(“select a.x, b.y from ‘data.parquet’ as a inner join df as b on a.k=b.k”)
Embrace SQL
Our workshops were attended by a total of 3693 people: -2410 people registered directly -282 through the special registration form for Ukrainians -518 through the waiting list as their registration fee was sponsored by someone else. -483 people purchased recordings 8/n
More details on an upd report. As of 05/10 we raised total of 81 643 euros. In particular, we raised the following amounts for the following orgs: Leleka foundation: 48 984 euro Special account for Ukrainian armed forces by National Bank of Ukraine: 22 707 euro Other organizations: 9 952 euro 7/n
Well Rabois is a very prolific poster
He must have very big n for being so confident