Commit Graph

95 Commits

Author SHA1 Message Date
Evgeniy A. Dushistov
931fc98478 check file size before mapping on linux 2022-06-24 21:25:55 +03:00
Evgeniy A. Dushistov
6f30be7815 clang-format for mapfile 2022-06-24 21:24:03 +03:00
Evgeniy A. Dushistov
e89cfa18b1 Revert "replace deprecated g_pattern_match_string function"
This reverts commit 452a4e07fb.
2022-06-24 20:57:57 +03:00
Evgeniy A. Dushistov
12d9ea5b97 more robust parsing of ifo file
fixes #79 fixes #81
2022-06-24 20:54:30 +03:00
Evgeniy A. Dushistov
920c2bafb9 stardict_lib.hpp: remove unused headers plus clang-format 2022-06-24 20:53:53 +03:00
Evgeniy A. Dushistov
452a4e07fb replace deprecated g_pattern_match_string function 2022-06-24 20:06:54 +03:00
Evgeniy A. Dushistov
59ef936288 clang-format for stardict_lib.cpp 2022-06-24 20:03:45 +03:00
Aleksa Sarai
4a9b1dae3d stardict_lib: remove dead poGet{Current,Next,Pre}Word iterators
They aren't used at all by scdv, and thus aren't tested (meaning that
adaptions to the core lookup algorithms can be complicated because these
methods use them but aren't tested so there's no real way of knowing if
a change has broken the methods or not).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-11-14 22:38:26 +03:00
Aleksa Sarai
6d385221d0 lookup: return all matching entries found during lookup
Previously, we would just return the first entry we found that matched
the requested word. This causes issues with dictionaries that have lots
of entries which can be found using the same search string. In these
cases, the user got a completely arbitrary word returned to them rather
than the full set.

While this may seem strange, this is incredibly commonplace in Japanese
and likely several other languages. In Japanese:

 * When written using kanji, the same string of characters could refer
   to more than one word which may have a completely different meaning.
   Examples include 潜る (くぐる、もぐる) and 辛い (からい、つらい).

 * When written in kana, the same string of characters can also refer to
   more than one word which is written using completely different kanji,
   and has a completely different meaning. Examples include きく
   (聞く、効く、菊) and たつ (立つ、建つ、絶つ).

In both cases, these are different words in every sense of the word, and
have separate headwords for each in the dictionary. Thus in order to be
completely useful for such dictionaries, sdcv needs to be able to return
every matching word in the dictionary.

The solution is conceptually simple -- return a set containing the
indices rather than just a single index. Since every list we search is
sorted (to allow binary searching), once we find one match we can just
walk backwards and forwards from the match point to find the entire
block of matching terms and add them to the set in linear time. A
std::set is used so that we don't return duplicate results needlessly.

This solution was in practice a bit more complicated because .otf cache
files require a bit more fiddling, and also the ->lookup methods are
used by some callers to find the next entry if no entry was found. But
on the whole it's not too drastic of a change from the previous setup.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-11-14 22:38:26 +03:00
Aleksa Sarai
51338ac5bb lookup: do not bail on first failed lookup with a word list
Due to the lack of deinflection support in StarDict, users might want to
be able to create a list of possible deinflections and search each one
to see if there is a dictionary entry for that deinflection.

Being able to do this in one sdcv invocation is far more preferable to
calling sdcv once for each candidate due to the performance cost of
doing so. The most obvious language that would benefit from this is
Japanese, but I'm sure other folks would prefer this.

In order to make this use-case better supported -- try to look up every
word in the provided list of words before existing with an error if any
one of the words failed to be looked up.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-09-29 03:28:44 +10:00
258204
c7d9944f7d Added --json (same as --json-output) to match man 2021-06-19 19:19:31 -06:00
NiLuJe
3b26731b02 Making glib thinks it's a filename instead of a string prevents the
initial UTF-8 conversion

At least on POSIX.

Windows is another kettle of fish. But then it was probably already
broken there.
2021-01-14 19:26:06 +01:00
NiLuJe
070a9fb0bd Oh, well, dirty hackery it is, then.
the previous approachonly works as long as locales are actually sane
(i.e., the test only passes if you *actually* have the ru_RU.KOI8-R
locale built, which the CI doesn't).
2021-01-12 04:37:07 +01:00
NiLuJe
8f096629ec Unbreak tests
glib already runs the argument through g_locale_to_utf8 with
G_OPTION_REMAINING
2021-01-12 04:16:03 +01:00
NiLuJe
25768c6b80 Handle "rest" arguments the glib way
Ensures the "stop parsing" token (--) is handled properly.
2021-01-12 03:35:55 +01:00
Jeff Doozan
994c1c7ae6 Use mapfile directly instead of buffer 2020-12-21 17:10:37 -05:00
Jeff Doozan
d38f8f13c9 Synonyms: Use MapFile 2020-12-21 08:53:29 -05:00
Jeff Doozan
cc7bcb8b73 Fix crash if dictionary has no synonyms 2020-12-19 18:37:15 -05:00
Jeff Doozan
8e9f72ae57 Synonyms lookup: return correct offset 2020-12-19 18:01:21 -05:00
Jeff Doozan
88af1a077c Use binary search for synonyms, fixes #31 2020-12-19 15:10:39 -05:00
Evgeniy A. Dushistov
824764ab50 handle possibly invalid data: origin_data == nullptr 2020-08-14 12:46:42 +03:00
Evgeniy A. Dushistov
431a5774ba fix warning 2020-08-14 12:37:21 +03:00
Evgeniy A. Dushistov
7facbe215e refactoring: run clang-format against code 2020-08-14 12:36:02 +03:00
Guido Cella
2fd47ba0d0 Keep searching in $HOME 2020-05-10 12:48:32 +02:00
Guido Cella
3413d847c5 Comply with the XDG Base Directory Specification 2020-05-10 07:01:31 +02:00
alcah
021e467b37 return exit code 2 if search term not found 2020-03-17 22:15:16 +10:30
nickeb96
7341675088 Moved history file path code to helper function 2018-05-07 20:08:47 -04:00
nickeb96
7719111c57 Added support for 2018-05-07 17:45:07 -04:00
Michal Čihař
0f83f0aa0b Store integer magic in cache file
This allows to detect different endianity of machines to avoid
loading caches created with different endianity.

Fixes #36

Signed-off-by: Michal Čihař <michal@cihar.com>
2017-11-14 16:39:57 +01:00
Evgeniy A. Dushistov
0cd29823cf ready for 0.5.2 release 2017-08-16 10:14:23 +03:00
Evgeniy A. Dushistov
8f16ceae59 refactoring: apply clang-format rules 2017-08-09 07:46:27 +03:00
Evgeniy A. Dushistov
d0c0a0837f fix: do not give interactive menu via pager
fixes #28
2017-08-09 07:41:33 +03:00
Peter
e85927e562 Add -e for exact searches (no fuzzy matches).
Only exact matches (or synonyms) are returned for simple searches.
2017-07-28 11:39:34 +02:00
Peter
835dffcaf8 Add additional type identifiers h,w,k
Like for xdxf, no processing is done, the raw content is shown.
2017-07-27 08:15:45 +02:00
Evgeniy Dushistov
af6362f5df Merge pull request #23 from sleep-walker/master
fix FSF address in LICENSE
2017-07-27 00:30:30 +03:00
Evgeniy Dushistov
25dd4c8264 Merge pull request #22 from ecraven/json
Add json output, fix #6
2017-07-26 23:55:59 +03:00
Tomáš Čech
98e98d0746 fix FSF address 2017-07-26 22:39:28 +02:00
Peter
3105823e8b Add option --json-output (-j)
If given -j, format the output of -l and of searches as JSON.
2017-07-26 22:07:23 +02:00
Peter
5f0f6e036f Add option --only-data-dir (-x)
Only use the dictionaries in data-dir, do not search in user and system directories

This makes testing much easier
2017-07-07 08:39:26 +02:00
Evgeniy A. Dushistov
214fbbf91e fix portability issue in PR #20 , plus simplify code 2017-07-07 00:19:50 +03:00
Evgeniy Dushistov
f510300f59 Merge pull request #20 from ecraven/master
Add support for .syn synonym files.
2017-07-06 23:57:19 +03:00
Peter
4b52181898 Add support for .syn synonym files.
Fixes #8.
2017-07-06 19:46:15 +02:00
Evgeniy A. Dushistov
72a15b70a7 simplify parsing of integers in ifo file 2017-07-06 13:11:02 +03:00
Evgeniy A. Dushistov
4c367fc12c fix build with clang 3.4.1 #19 2017-07-06 11:41:58 +03:00
Evgeniy A. Dushistov
b2ced870ab fix potential undefined behaviour
fix #19
2017-07-04 22:33:14 +03:00
Evgeniy A. Dushistov
5c1357840c Merge branch 'order_dist_list' 2017-04-22 21:25:58 +03:00
Evgeniy A. Dushistov
1667de0650 cleanups for specify "dictionary order by" 2017-04-22 21:23:00 +03:00
Evgeniy A. Dushistov
97b13e6702 remove not used code 2017-04-22 20:52:18 +03:00
Anton Yuzhaninov
84367a5744 Fix using SDCV_PAGER
Stream opened with popen() should be closed with pclose() as documented
in popen(3) man.
2017-03-07 18:37:52 -05:00
Evgeniy A. Dushistov
7df514e117 fix build without readline 2017-02-17 17:38:57 +03:00