BLUE
PMplanetpatman.bsky.social

That was an AMAZING @EuropaClipper launch! I'm so happy for all my friends and colleagues who will be working with the incredible high-resolution datasets that will be returned when the craft gets to Europa!

A rocket lifts off from an oceanside launch pad, with extensive lateral steam plume and bright column of flame beneath the rocket.
1

🔼 por isso q eu digo q esse debate sobre IA e ética n deve ser pautado em opinião de empresa ou de burguês, mas na opinião da classe trabalhadora em si e gringo tá atrasado nessa treta achando q o maior problema da IA Generativa é uma galera n receber dinheiro datasets feitos de dados roubados

0
GMgmcd.bsky.social

The standard tactic is to use "Hive-style" partitioning (e.g., folders look like `year=2021/country=us/<data>`, etc.) Works best with parquet, although can even work with csv. Easy to read and write partitioned datasets with arrow, duckdb, polars... (dplyr has tight integration with these.) 2/2

1
GMgmcd.bsky.social

As a general rule for big datasets, avoid writing large single files. Much better to write multiple files--partitioned by group--so that you can take advantage of predicate pushdown and materialisation. (Basically: skip data and work you don't need.) 1/2 duckdb.org/2021/06/25/q...

1

📣 We are pleased to announce that the much awaited #ERA5forum.ecmwf.int/t/new-datase... Direct links 👇

1
Lleiracal.bsky.social

The rapid advancement in learning models required a very specific set of circumstances where their datasets could be built by ripping off the entire sum of human digital output without being poisoned by the feedback loop of training on and amplifying their own mistakes. Those conditions are gone.

1
TTtimtrent.bsky.social

ESA water vapour climate change initiative 2nd user workshop kicks off in Julich. Looking forward to the interesting talks and discussions over the next few days.

Presentation on project datasets from Marc Schroeder
0
ACarxiv-cs-cl.bsky.social

Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang, Munazza Zaib, Ahoud Alhazmi Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation https://arxiv.org/abs/2402.01512

0
STuaa.bsky.social

言語資源、日本語ウェブコーパス2010以外にHugging Faceのcc100-jaがあるのは知ってるけど、あれを書いた時点では知らなかったからな…改訂するのもだるいからあのまま。 huggingface.co/datasets/ran...

0