KOSMOS-G: Generating Images in Context with Multimodal Large Language Models

Even_Adder@lemmy.dbzer0.com · edit-2 9 months ago

KOSMOS-G: Generating Images in Context with Multimodal Large Language Models

SubArcticTundra@lemmy.ml · 9 months ago

Wow, that is impressive. It essentially does Photoshop for you

GraniteM@lemmy.world · 9 months ago

Am I exposing by lack of knowledge about this tech when I say that this seems to me like an early step along the way to a Star Trek style universal translator? Like, literally translating foreign languages on the fly from one to another?

Even_Adder@lemmy.dbzer0.com · 9 months ago

You’re kinda right. I saw this video a little while ago. I’ve linked it at the relevant part.

PipedLinkBot@feddit.rocks · 9 months ago

Here is an alternative Piped link(s):

this video

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

rewarp@slrpnk.net · 9 months ago

Thank you for sharing this talk! Literally sweating as the ramifications started to hit me while watching it. Probably the most profound video I have watched in many years.

Even_Adder@lemmy.dbzer0.com · edit-2 9 months ago

PBS did a short video on this too. You might have seen it already.

PipedLinkBot@feddit.rocks · 9 months ago

Here is an alternative Piped link(s):

PBS

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.