Wikidata talk:Lexicographical data
Lexicographical data Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.
|
![]() |
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2025/03. |
I cannot create a reference for a usage example.
[edit]Sorry if I'm asking in the wrong place. I am a newbie. I have created a usage example (P5831) for pr-ꜥꜣ/𓉐𓉻 (L7922). Then I wanted to add a reference: reference URL (P854) with the value https://oraec.github.io/corpus/oraec51-301.html . But then I get the following error message: "Could not save due to an error. The save has failed." What am I doing wrong? Esther82090 (talk) 21:37, 23 November 2024 (UTC)
- @Esther82090: that's strange, I tried and succeed on sandbox (L123). Could you tried other ways, for instance with other browsers, on other items (like لَيْلَة/捌/ともだち/カタ/sandbox 2/لَيْلَة (L1234), the sandbox lexemes are here for testing purposes), with other URLs, etc. (I'm trying to see where it could come from). Cheers, VIGNERON (talk) 11:11, 3 December 2024 (UTC)
- @VIGNERON: Thank you for testing and your tip with the sandbox! Now, I tried it again and I succeed. Yay! I have no idea why that is. Maybe there is a security mechanism against spam that prevents completely new users (like me) from posting links directly. Now I have more than 1000 edits. Maybe I've moved up in the hierarchy. Anyway: Thanks again! Esther82090 (talk) 12:37, 3 December 2024 (UTC)
Fetch lexicographical data for language script converter
[edit]Hi, I would like to know if there is any possibiliy to utilize lexemes from Wikidata as root word dictionaries for the upcoming language script converter (ms to ms-arab) that I am going to develop? This is because recently I have developed a dictionary-based language script converter that load a dictionary data from an external url. I found that the speed of conversion is slow.
If let say a main article on Malay Wikipedia has the Malay word "kuda" (ms) and the word "kuda" could be converted into "کودا" (ms-arab) through some kind of entry point or api that could fetch the existing lexicographical data here (for example Lexeme:L480587), hopefully the process of conversion would be faster. It would be great too if the affix conversion could be coupled with ms-arab text loaded from Wikidata lexemes. Hakimi97 (talk) 10:36, 26 November 2024 (UTC)
Removal of redundant P5972
[edit]Hi,
translation (P5972) is redundant when a sense already pointed to an item via item for this sense (P5137).
My understanding (shared by many people IIRC) is that it should be avoided. But meanwhile, the lexemes are inconsistent, with some values entered with translation (P5972) which wrongly leads to believe that people should add more values. To me, it's seems to bad example and a waste of time and ressource ; that's why I propose to remove all these values to have cleaner lexemes (@Ainali, Alexander-Mart-Earth, Mahir256: I think it was done at least once in the past). FYI, there is around 36 143 senses concerned https://w.wiki/CPZC.
Cheers, VIGNERON (talk) 13:42, 12 December 2024 (UTC)
- Seems ok, however I'm not certain how good our item for this sense (P5137) values are. Maybe only do it for nouns to start with (most other parts of speech wouldn't have suitable item for this sense (P5137) values anyway)? ArthurPSmith (talk) 02:17, 13 December 2024 (UTC)
- Thanks for moving on this! I agree, and I also think it may make sense to do it by lexical category to minimize errors. Ainali (talk) 07:08, 13 December 2024 (UTC)
- @ArthurPSmith: true, item for this sense (P5137) is not applicable everywhere and is still often missing, that's why I'm focusing on redundancy, on senses where P5137 is already there (and with the assumption that the data is correct). I had a quick look (with SPARQL queries) almost all of them are indeed noun (Q1084) (and a few proper noun (Q147276) which are nouns lato sensu, and even less other categories, including some strange possible mistakes that I'll check).
- @Jon Harald Søby, Situxx, 0xDeadbeef, Jsamwrites, عُثمان, Vis M: as the top users of this property, what do you think?
- If there is no objection in the coming weeks, I'll proposed to remove the redundancy.
- Cheers, VIGNERON (talk) 08:19, 18 December 2024 (UTC)
- @VIGNERON If the suggestion is to do this automatically, I would not support that. There are some existing lexeme senses which are linked to an item and a translation, but the translation is more accurate. See this change I made earlier today for example: https://www.wikidata.org/w/index.php?title=Lexeme:L1379061&diff=prev&oldid=2289004058&diffmode=source (This happens to be a verb, but there are nouns like this too I am sure.)
- If someone is adding inappropriate or redundant values, this can be communicated to them directly. عُثمان (talk) 08:28, 18 December 2024 (UTC)
- I usually use this property to link Bokmål and Nynorsk lexemes together. My reason for doing that is that there isn't actually any good bilingual Bokmål–Nynorsk dictionary online anywhere, so Wikidata lexemes could help provide the data to create one. While in most cases it is indeed redundant to item for this sense (P5137), I am afraid we might lose some of the nuance this property provides.
- To provide an example: sanger (L613515-S1), songar (L3517-S1), sangerinne (L718602-S1) and songarinne (L1400922-S1) are all item for this sense (P5137)singer (Q177220), but the first and second are linked together as translations, as are the third and fourth one, but linking 1 and 4 together as translations would be incorrect. Now, I'm sure Mahir256 or others have good ways of solving this specific challenge (where one set of senses have a biological/social gender attached, while the other set doesn't; probably using hyperonym (P6593)?), but there are probably other analogous cases where such a solution is less obvious. And for those, I think we might lose some nuance by removing this en masse; though in the vast majority of cases, it should be totally fine, so I don't disagree with the rationale or reasoning. Jon Harald Søby (talk) 09:42, 18 December 2024 (UTC)
- Personally, I like this symmetric property. It's easy to see all the other available translations. Furthermore, I am also thinking about the WikiFunctions project. It will be easy to get the available translations in one single call. Possibly, it could help to do the translations faster and efficiently. However, I do understand your reasoning. Since all the senses connected to P5137 can be easily retrieved using SPARQL queries, this property may seem redundant. John Samuel (talk) 10:30, 18 December 2024 (UTC)
- About that, the last wikifunctions weekly letters made clear that Wikifunction wants build the software necessary to use "item for this sense" efficiently, even if it is not done yet. author TomT0m / talk page 10:35, 18 December 2024 (UTC)
- For me, I would like to keep translation properties and the item for this sense, because to me they are not necessarily redundant for the following reasons:
- - There may be differing opinions about the translation of different Lexemes by Sense, so especially for the cuneiform languages we record which resource claims that sth. is a translation of sth. else and that is quite valuable for many researchers. It would not be the same if you were to add this reference to the "item for this sense" statement.
- - Nuances in translation: Not always, items are linked to a different, but maybe related QID via "item for this sense" but are connected using the translation statement, which is useful to preserve in my opinion Situxx (talk) 14:32, 18 December 2024 (UTC)
- In the context of this thread, may I refer you to an idea I touched upon in a discussion a few years ago, that of employing qualifiers to the item for this sense (P5137) property to represent nuances of a lexeme such as level of understanding, socio-linguistic and stylistic context (slang, poetic etc). I have written some related proposals at [1]--SM5POR (talk) 09:02, 7 January 2025 (UTC)
- For verbs we now have predicate for (P9970)
that I don't see in you proposal. author TomT0m / talk page 09:24, 7 January 2025 (UTC)
- @TomT0mː Thank you for pointing that out, I'll try to include it, as I have been unable to work on Wikidata and those proposals for the past few years for health reasons, witness my troubles editing another comment below, partly due to tremors hamperinng my keyboard work.--SM5POR (talk) 11:44, 7 January 2025 (UTC)
- For verbs we now have predicate for (P9970)
- Also it remains to be seen that we have a good enough "transitivity of translations" to assume this is totally equivalent to a 1-1 mapping.
- There may be ways in which it's perfectly reasonable to say
- "Lexeme 1 sense 2 has for element Q345 , and for translation Lexeme 2 sense 4 (but it's unclear wether Q345 is the right element)"
- Lexeme 2 sense 4 and for translation "Lexeme 3 sense 5" but it clear that Q345 is not the right element and Q567 is the right one.
- In short, does the rule "if a sense has an elemenent and could be translated into another sense, then the element is right for the second sense" make sense and is actually something we want to enforce, as far as translation is a use case ?
- I think these kind of questions are a big headache where symbolic machine translation can stumble over, so this might require a bit of thought. The fact that "element for that sense" is a catch-all property (not precisely defined) tends to imply we wanted to use it in a variety of usecase, so this kind of assumptions may fall appart quite quickly. author TomT0m / talk page 09:04, 18 December 2024 (UTC)
- Thanks for moving on this! I agree, and I also think it may make sense to do it by lexical category to minimize errors. Ainali (talk) 07:08, 13 December 2024 (UTC)
- Thanks everyone, that's very interesting remarks.
- What I gather here:
- something fundamental, sometimes the redundancy is needed (but from what I understand, in these case there is formally no redundancy).
- something more esthetic, it's easier/more direct/looks nicer, it's a bit subjective but duly noted.
- I was thinking first to some mass removal, at least for a subset where there is actual redundancy. But now, I'm not sure anymore: where to draw the line? What could/should be kept or not? The lack of rule (or even rough guidelines) is really problematic. Maybe we could look at specific lexemes to determine when to use translation (P5972) or not?
- Cheers, VIGNERON (talk) 11:19, 21 December 2024 (UTC)
More languages
[edit]What s the current consensus (if any) on extending the Lexeme database to cover additional languages, in particular sign languages (using {q|1497335}), and Native American ones such as Lenape ({q|2665671}, "del"). I'm currently seeking contact with Lenape groups to inquire about ongoing work to revive their language, first documented by Swedish clergyman {q|5601467} while working in the {q|322187} colony in the 1640's. His translation of Luther's Little Catechism into the Lenape language was republished in a faksimile edition in the 1930's, from which I learned two words "sixi" meaning "quick", and "nitáto" meaning "knowledge" making me think of "Sixi Nitáto" as an apt name for Wikipedia in Lenape if I ever saw one. I'm not at this moment proposing the establishment of a del.wikipedia.org edition, but I may suggest that possibility to the Lenape groups I'll talk to as a means of developing their language for modern use, if they would have any use for it. Right now however, I'm only concerned with how much support we may expect for these languages in Wikidata.--SM5POR (talk) 12:31, 6 January 2025 (UTC)
- I'm sorry for my botched item links, I forgot the template syntax and edited my comment off-line on my laptop, not noticing the error before posting the text. I still don't seem to have made thetemplates work correctly.--What s the current consensus (if any) on extending the Lexeme database to cover
- additional languages, in particular sign languages (using {q|1497335}), and Native American ones such as Lenape ({q|2665671}, "del"). I'm currently seeking contact with Lenape groups
- to inquire about ongoing work to revive their language, first documented by Swedish clergyman {q|5601467} while working in the {q|322187} colony in the 1640's. His translation of Luther's Little Catechism into the Lenape language was republished in a faksimile edition in the 1930's, from which I learned two words "sixi" meaning "quick", and "nitáto" meaning "knowledge" making me think of "Sixi Nitáto" as an apt name for Wikipedia in Lenape if I ever saw one. I'm not at this moment proposing the establishment of a del.wikipedia.org edition, but I may suggest that possibility to the Lenape groups I'll talk to as a means of developing their language for modern use, if they would have any use for it. Right now however, I'm only concerned with how much support we may expect for these languages in Wikidata.--SM5POR (talk) 12:54, 6 January 2025 (UTC)
- What s the current consensus (if any) on extending the Lexeme database to cover
- additional languages, in particular sign languages (using {q|1497335}), and Native American ones such as Lenape ({q|2665671}, "del"). I'm currently seeking contact with Lenape groups
- to inquire about ongoing work to revive their language, first documented by Swedish clergyman {q|5601467} while working in the {q|322187} colony in the 1640's. His translation of Luther's Little Catechism into the Lenape language was republished in a faksimile edition in the 1930's, from which I learned two words "sixi" meaning "quick", and "nitáto" meaning "knowledge" making me think of "Sixi Nitáto" as an apt name for Wikipedia in Lenape if I ever saw one. I'm not at this moment proposing the establishment of a del.wikipedia.org edition, but I may suggest that possibility to the Lenape groups I'll talk to as a means of developing their language for modern use, if they would have any use for it. Right now however, I'm only concerned with how much support we may expect for these languages in Wikidata.--SM5POR (talk) 12:54, 6 January 2025 (UTC)
- What s the current consensus (if any) on extending the Lexeme database to cover
- additional languages, in particular sign languages (using SignWriting (Q1497335)), and Native American ones such as Lenape (Leonardo de Matos Cruz (Q2665671), "del"). I'm currently seeking contact with Lenape groups
- to inquire about ongoing work to revive their language, first documented by Swedish clergyman John Campanius (Q5601467) while working in the New Sweden (Q322187) colony in the 1640's. His translation of Luther's Little Catechism into the Lenape language was republished in a faksimile edition in the 1930's, from which I learned two words "sixi" meaning "quick", and "nitáto" meaning "knowledge" making me think of "Sixi Nitáto" as an apt name for Wikipedia in Lenape if I ever saw one. I'm not at this moment proposing the establishment of a del.wikipedia.org edition, but I may suggest that possibility to the Lenape groups I'll talk to as a means of developing their language for modern use, if they would have any use for it. Right now however, I'm only concerned with how much support we may expect for these languages in Wikidata.--SM5POR (talk) 12:54, 6 January 2025 (UTC)
- @SM5POR: Templates require two '{' characters, not 1, on each side; I fixed your most recent copy of this request just above to do this. As to your question, I believe lexemes can be created for any language that has a Q item, so this should certainly be possible for the ones you mention. ArthurPSmith (talk) 22:15, 6 January 2025 (UTC)
- @ArthurPSmithː Thank you for fixing my templates as well as the answer, with respect to sign languages in particular, there is the issue of text representation, as the Sutton Signwriting characters may not be widely used. I believe SignWriting code points in Unicode can be used but the spatial placement of characters in relation to each other needs to be specified, I'm investigating this right now, but I could have use for feedback from sign language experts as I don't know sign language myself (besides a few signs in Swedish sign language). I'm also told that SignPuddle may provide a few dictionaries that could be used as source vocabularies for Wikidata, provided their licenses allow it. Should I create a WikiProject Sign Languages to this end?I may follow up here whenever i hear from the enape cultural center regarding the potential for a Lenape edition of Wikipedia. Maybe this category of languages deserves a WikiProject as well?--SM5POR (talk) 08:26, 7 January 2025 (UTC)
- A Wikiproject sounds like a good idea. I believe there are also already lexemes here for many words in at least British Sign Language - for example Lexeme:L15039, but I don't know if the representation used there is something that can be used across other sign languages. ArthurPSmith (talk) 18:23, 7 January 2025 (UTC)
- @ArthurPSmithː Yes I noted the same, and will explore the contents, once I have refreshed my SPARQL skills.--SM5POR (talk) 10:03, 8 January 2025 (UTC)
- A Wikiproject sounds like a good idea. I believe there are also already lexemes here for many words in at least British Sign Language - for example Lexeme:L15039, but I don't know if the representation used there is something that can be used across other sign languages. ArthurPSmith (talk) 18:23, 7 January 2025 (UTC)
- @ArthurPSmithː Thank you for fixing my templates as well as the answer, with respect to sign languages in particular, there is the issue of text representation, as the Sutton Signwriting characters may not be widely used. I believe SignWriting code points in Unicode can be used but the spatial placement of characters in relation to each other needs to be specified, I'm investigating this right now, but I could have use for feedback from sign language experts as I don't know sign language myself (besides a few signs in Swedish sign language). I'm also told that SignPuddle may provide a few dictionaries that could be used as source vocabularies for Wikidata, provided their licenses allow it. Should I create a WikiProject Sign Languages to this end?I may follow up here whenever i hear from the enape cultural center regarding the potential for a Lenape edition of Wikipedia. Maybe this category of languages deserves a WikiProject as well?--SM5POR (talk) 08:26, 7 January 2025 (UTC)
- @SM5POR: Templates require two '{' characters, not 1, on each side; I fixed your most recent copy of this request just above to do this. As to your question, I believe lexemes can be created for any language that has a Q item, so this should certainly be possible for the ones you mention. ArthurPSmith (talk) 22:15, 6 January 2025 (UTC)
Is L1136073 really a lexeme?
[edit]Hi,
Everything is in the title, is Colorless green ideas sleep furiously/Colourless green ideas sleep furiously (L1136073) really a lexeme? And if so, how to deal with it?
For the context, please see Colorless green ideas sleep furiously where it is stated that this quote was created by Noam Chomsky with the specific intent to be "semantically nonsensical" (which cause problem for adding a sense I guess).
PS: I found it doing some cleaning on the lexemes, it's one of the few lexemes with sentence (Q41796) as the lexical category.
Cheers, VIGNERON (talk) 15:24, 19 January 2025 (UTC)
- IMHO it’s a quote and not a lexeme. It doesn’t really have a sense, and it especially can’t have any meaningful forms (how do you inflect a whole sentence?). Lucas Werkmeister (talk) 15:28, 19 January 2025 (UTC)
- We do have stable phrases as lexemes, why not this. --Infovarius (talk) 20:50, 20 January 2025 (UTC)
- @Infovarius: but is this a "stable phrase"? And indeed, we do have phrases, like proverbs or locution, but in almost every case, there is a meaning and reference(s). Cheers, VIGNERON (talk) 14:21, 6 March 2025 (UTC)
- Is it perhaps conveying the meaning that words can be combined in ways that make no sense (i.e. the meaning is in the example of its existence)? But I wouldn't object to it being deleted from lexeme space. ArthurPSmith (talk) 15:26, 6 March 2025 (UTC)
- It is stable for sure (it can't be changed). But may be it's better to move it to item namespace? Infovarius (talk) 20:44, 29 March 2025 (UTC)
- An item already exists, Colorless green ideas sleep furiously (Q1227715). Lucas Werkmeister (talk) 21:16, 29 March 2025 (UTC)
- @Infovarius: but is this a "stable phrase"? And indeed, we do have phrases, like proverbs or locution, but in almost every case, there is a meaning and reference(s). Cheers, VIGNERON (talk) 14:21, 6 March 2025 (UTC)
Verbs with multiple conjugations
[edit]Some languages require polypersonal agreement (Q2401947) for verb (Q24905). transitive verb (Q1774805) in Greenlandic (Q25355) is such a case. How are/should these be handled? I see examples using items under possessive (Q2105891), such as third-person possessive (Q71470909) for grammatical features, e.g., nerivaa (L1328642) and more found with https://w.wiki/DcEU. I have been trying to understand this approach. I do not recall seeing possessiveness used for verbs before. — Finn Årup Nielsen (fnielsen) (talk) 09:53, 28 March 2025 (UTC)
- I now see that there exist e.g. second person subject (Q117795140) and plural subject (Q117795217) that has been used a bit for Akkadian (Q35518) — Finn Årup Nielsen (fnielsen) (talk) 10:10, 28 March 2025 (UTC)