Ah, the difficulty of learning a second language as an adult.
Reminds me of the ways that my mediocre grasp of the German language still haunts me => Resurrects myself out of specific tracks in which my half-hearted clutch on the Alsatian tongue still plagues me.
Last year I inadvertently asked to borrow a fork-lift truck when I meant stapler. Still, at least my pronunciation is no longer Crabtree from 'Allo 'Allo: https://www.youtube.com/watch?v=fYNXMWRdCx0
Somebody should write commentaries/explanations for all of these, especially for non-native speakers and technical jargon. Hell, ChatGPT can probably write it up quickly. Even as a native speaker, there were quite a few that took a moment to understand. Have Reddit-style voting for the "best ones" and you could make a whole social network out of this.
Some poet must have had a lot of fun with these slipping things past the censors. It'd be in the same league as the underhanded C contest - what is the most innocent short text that can be written while slipping in some subtle but likely rather dirty double meaning in.
This was a common thing to do in the late 90s and early 2000s when websites thought you could make 'safe chat' with a limited dictionary and it turns out they were mostly wrong.
An interesting ML exercise (possible class project!) would be to try to automate this. A bigram corpus combined with word embeddings and a NSFW text classifier, maybe? Your usual word embedding might not work because the point is the multiple meanings, so maybe the twist would be that you need a polysemous word embedding with multiple vectors or something like that, so it's not just an off-the-shelf word2vec...
You don’t even need embeddings or Ml. A simple search across dictionaries, thesauruses and the Wikipedia entry list (with disambiguations) should be enough.
- Find all 2 word phrases and compound words
- search across all pairwise combinations of mutual synonyms, and determine whether the compound synonym is itself a word or phrase
That would be a good baseline. Maybe the ML part would be for ranking them? Because I expect you would drown in matches, and the hard part becomes finding the good ones.
I expect the opposite. I would expect an ML/embedding approach to find lots of false positives, because lots of words have close embeddings but are not synonyms. A strict thesaurus lookup should produce fewer matches. As for ranking the good ones, an embedder might help with that, but then we need a definition of "good". I would argue the most conceptually "unrelated" matches are the "best" ones, so yes, an embedder could quickly determine the farthest vector distance.
Lovely. This is a fun, devious list (constable?), and though it says "since 2018", it's clearly actively maintained, as it contains entries like "Twitter ban / exterminate" and "quick jab / prompt injection".
The gifted kid to burnt-out transgender puppygirl pipeline
Explanation for anyone who needs it: "Home" and "house" are both nouns that mean a place you live in. "School" and "train" here are both used as verbs that mean "to teach something to someone". But "Home schooling" is the practice of parents teaching their own children at home and keeping them out of public school, whereas "House training" is the practice of teaching dogs not to pee inside the house.
Some of these are very cool and fits their definition exactly. But some others are very forced. Who would think that “fashion” and “taylor” are synonyms? And they are both further synonyms of “fix”? Or in the “Cold Fusion // Cool Cat” pair how would “fusion” be a synonym of “cat”?
I think you are onto something with ‘tailor’. If we take it as the verb ‘to tailor’ that is the synonym of the ‘to fashion’. Tailor as a profession is not a synonym of the noun ‘fashion’. But it is enough for them to be synonyms in one sense. So that works.
I hear you on ‘cat’ but i wouldn’t call that relationship synonyms.
when you make a list like this then it's always a challenge to deal with edge cases. fashion and tailor fit together as well as many other terms on that list. a swift taylor can be seen as producing fast fashion, even if not true in reality. fixing clothes is also the work of many tailors. with getting custom tailored clothes falling out of fashion (pun intended ;-) the bulk of the work of most tailors is making changes and fixing things. in germany it is even a recognized profession with a two year training (you can become a full tailor with an extra year). most of my visits to a tailor recently have literally been for a quick fix. stuff that i could probably have done myself if i had a sowing machine and time.
i am with you on the cat fusion however. i don't get that one either.
> fixing clothes is also the work of many tailors. with getting custom tailored clothes falling out of fashion
These are all good reasons for semantic closeness, but semantic closeness and being synonyms are not the same thing.
That being said while tailor (profession) is not a synonym of fashion (concept), tailor (verb) is a synonym of fashion (verb) and can be also a synonym of fix (verb).
Indeed. 'beach' and 'wave' are definitely not synonyms, and while a shelf is a platform, not all platforms are shelves. Too many of these are essentially really forced puns. But there are some absolute crackers in there nonetheless.
A related idea I've thought of is finding (comparatively) long words that have different definitions in different languages. E.g. "fetter" (English/German). In theory you only need a wordlist for each language, but in practice, most common words across languages also share definitions (e.g. "unintelligent" in English/German).
Just noted to my partner last night, a screw and a nail are similar things, but "you screwed it" and "you nailed it" are opposites. But if you take them as sexual innuendo they mean the same thing again.
Ah, the difficulty of learning a second language as an adult.
Reminds me of the ways that my mediocre grasp of the German language still haunts me => Resurrects myself out of specific tracks in which my half-hearted clutch on the Alsatian tongue still plagues me.
Last year I inadvertently asked to borrow a fork-lift truck when I meant stapler. Still, at least my pronunciation is no longer Crabtree from 'Allo 'Allo: https://www.youtube.com/watch?v=fYNXMWRdCx0
Somebody should write commentaries/explanations for all of these, especially for non-native speakers and technical jargon. Hell, ChatGPT can probably write it up quickly. Even as a native speaker, there were quite a few that took a moment to understand. Have Reddit-style voting for the "best ones" and you could make a whole social network out of this.
Hacker News / Intruder Alert
Underdog/subwoofer is my favorite. A lesser known rapper needs to make a bar out of that xD
Some poet must have had a lot of fun with these slipping things past the censors. It'd be in the same league as the underhanded C contest - what is the most innocent short text that can be written while slipping in some subtle but likely rather dirty double meaning in.
This was a common thing to do in the late 90s and early 2000s when websites thought you could make 'safe chat' with a limited dictionary and it turns out they were mostly wrong.
This is great, had quite a few chuckles.
You could take it one step further and merge in Cockney Slang as an extra stage.
An interesting ML exercise (possible class project!) would be to try to automate this. A bigram corpus combined with word embeddings and a NSFW text classifier, maybe? Your usual word embedding might not work because the point is the multiple meanings, so maybe the twist would be that you need a polysemous word embedding with multiple vectors or something like that, so it's not just an off-the-shelf word2vec...
You don’t even need embeddings or Ml. A simple search across dictionaries, thesauruses and the Wikipedia entry list (with disambiguations) should be enough.
- Find all 2 word phrases and compound words - search across all pairwise combinations of mutual synonyms, and determine whether the compound synonym is itself a word or phrase
https://chatgpt.com/share/6818d11d-f444-800a-96b0-7a932e9213...
That would be a good baseline. Maybe the ML part would be for ranking them? Because I expect you would drown in matches, and the hard part becomes finding the good ones.
I expect the opposite. I would expect an ML/embedding approach to find lots of false positives, because lots of words have close embeddings but are not synonyms. A strict thesaurus lookup should produce fewer matches. As for ranking the good ones, an embedder might help with that, but then we need a definition of "good". I would argue the most conceptually "unrelated" matches are the "best" ones, so yes, an embedder could quickly determine the farthest vector distance.
Lovely. This is a fun, devious list (constable?), and though it says "since 2018", it's clearly actively maintained, as it contains entries like "Twitter ban / exterminate" and "quick jab / prompt injection".
"since 2018" means "starting 2018, through the present", no?
> Home-schooled / house trained
The gifted kid to burnt-out transgender puppygirl pipeline
Explanation for anyone who needs it: "Home" and "house" are both nouns that mean a place you live in. "School" and "train" here are both used as verbs that mean "to teach something to someone". But "Home schooling" is the practice of parents teaching their own children at home and keeping them out of public school, whereas "House training" is the practice of teaching dogs not to pee inside the house.
home schooled is about the location of the teaching
house trained is about the subject of the teaching
Some of these are very cool and fits their definition exactly. But some others are very forced. Who would think that “fashion” and “taylor” are synonyms? And they are both further synonyms of “fix”? Or in the “Cold Fusion // Cool Cat” pair how would “fusion” be a synonym of “cat”?
A tailor makes clothes.
To fashion something is to make something. To tailor something is to customise it.
`cat` is a Unix command for concatenating files, which you could argue is fusing them together?
I think you are onto something with ‘tailor’. If we take it as the verb ‘to tailor’ that is the synonym of the ‘to fashion’. Tailor as a profession is not a synonym of the noun ‘fashion’. But it is enough for them to be synonyms in one sense. So that works.
I hear you on ‘cat’ but i wouldn’t call that relationship synonyms.
when you make a list like this then it's always a challenge to deal with edge cases. fashion and tailor fit together as well as many other terms on that list. a swift taylor can be seen as producing fast fashion, even if not true in reality. fixing clothes is also the work of many tailors. with getting custom tailored clothes falling out of fashion (pun intended ;-) the bulk of the work of most tailors is making changes and fixing things. in germany it is even a recognized profession with a two year training (you can become a full tailor with an extra year). most of my visits to a tailor recently have literally been for a quick fix. stuff that i could probably have done myself if i had a sowing machine and time.
i am with you on the cat fusion however. i don't get that one either.
> fixing clothes is also the work of many tailors. with getting custom tailored clothes falling out of fashion
These are all good reasons for semantic closeness, but semantic closeness and being synonyms are not the same thing.
That being said while tailor (profession) is not a synonym of fashion (concept), tailor (verb) is a synonym of fashion (verb) and can be also a synonym of fix (verb).
Hypothesis = Understatement makes sense, right?
That's a very loose definition of "synonyms" . Most are like "conceptually adjacent".
Indeed. 'beach' and 'wave' are definitely not synonyms, and while a shelf is a platform, not all platforms are shelves. Too many of these are essentially really forced puns. But there are some absolute crackers in there nonetheless.
A related idea I've thought of is finding (comparatively) long words that have different definitions in different languages. E.g. "fetter" (English/German). In theory you only need a wordlist for each language, but in practice, most common words across languages also share definitions (e.g. "unintelligent" in English/German).
These are called False Friends - https://en.wikipedia.org/wiki/False_friend
Hacker News / Tweaker Buzz?
Just noted to my partner last night, a screw and a nail are similar things, but "you screwed it" and "you nailed it" are opposites. But if you take them as sexual innuendo they mean the same thing again.
I have to admit some of these took a moment to register.
Brilliant, and hilarious. Thank you for sharing!
this is absolutely genius
Someone please vibe code a quick website to submit, verify, rank, filter, sort and explain/comment on all of these. Should be an hour with Cursor.