base
Contains a base class for implement any incremental method in RiverText.
IWVBase
Bases: Transformer
, VectorizerMixin
Base class for implement any incremental method in RiverText.
Source code in rivertext/models/base/iwv.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
__init__(vocab_size, vector_size, window_size, on=None, strip_accents=True, lowercase=True, preprocessor=None, tokenizer=None, ngram_range=(1, 1))
Base constructor for common hyperparameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size |
int
|
The size of the vocabulary. |
required |
vector_size |
int
|
The dimension of the embedding. |
required |
window_size |
int
|
The size of the window. |
required |
on |
str
|
The name of the feature that contains the text to vectorize. If |
None
|
strip_accents |
bool
|
Whether or not to strip accent characters, by default True. |
True
|
lowercase |
bool
|
Whether or not to convert all characters to lowercase by default True. |
True
|
preprocessor |
An optional preprocessing function which overrides the
|
None
|
|
tokenizer |
Callable[[str], List[str]]
|
A function used to convert preprocessed text into a |
None
|
ngram_range |
Tuple[int, int]
|
The lower and upper boundary of the range n-grams to be
extracted. All values of n such that |
(1, 1)
|
Source code in rivertext/models/base/iwv.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
learn_many(X, y=None, **kwargs)
abstractmethod
Train a mini-batch of text features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
List[str]
|
A list of sentence features. |
required |
y |
A series of target values, by default None. |
None
|
Source code in rivertext/models/base/iwv.py
64 65 66 67 68 69 70 71 72 |
|
vocab2dict()
abstractmethod
Abstract method for transforming the vocabulary into a dictionary. The keys are the words of the vocabulary, and the values are the training vectors.
Returns:
Type | Description |
---|---|
Dict[str, np.ndarray]
|
A dictionary of embeddings. |
Source code in rivertext/models/base/iwv.py
74 75 76 77 78 79 80 81 82 83 84 |
|