What is Spacy?

Many of you think weird after hearing this word or some may think it is related to SPACE or something other which ever related to the word mentioned above,it all upto the imagination level of a person and the thinking capability.

What actually SPACY means: spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.

Natural Language Processing(NLP) It is an area of computer science and Artificial Intelligence concerned with interaction between computer and human(Natural Language) with the help of NLP it can analyze large amount of natural language data.

Spacy is one the best framework to work with large amount of natural language and to process those data.

With the help of spacy one can able to build his her own model ,after creating the model it has to be trained on various examples and test it with the new values and analyze accuracy of the model.

HOW TO INSTALL SPACY ON YOUR MACHINE?

Spacy can be installed in various operating system and it is open source

Before installing spacy in your machine make sure Python is installed properly installed your machine .To check whether python is installed or not in your machine,check the link below.

https://stackoverflow.com/questions/8917885/which-version-of-python-do-i-have-installed

1)LINUX OR UBUNTU

Using pip, spaCy releases are currently only available as source packages.

TERMINAL COMMANDS

Step 1) pip install -U spacy

When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state:

step 2)venv .env source .env/bin/activate pip install spacy

2)CONDA

Thanks to our great community, we’ve finally re-added conda support. You can now install spaCy via conda-forge:

COmmand => conda install -c conda-forge spacy

FEATURES PRESENT IN SPACY

Spacy provides many features as just a function,with the help of built in functions many number of lines can be reduced and the maximum efficiency can be achieved,some of the features of spacy are

Tokenization
Part-of-speech (POS) Tagging
Dependency Parsing
Lemmatization
Sentence Boundary Detection (SBD)
Named Entity Recognition (NER)
Similarity
Text Classification
Rule-based Matching
Training
Serialization

I will explain some of the concepts in detail and make sure that you can understand it .

Tokenization:

Segmenting the given text into words,punctuations marks etc.with the help of tokenization. The importance of tokenization is that only after the text is splitted or segmented into words it will be used for analysis.In the tokenization process the words need which is in the form of abbreviation it has to splitted into the correct form spacy helps in identifying the abbreviations good.

For example, punctuation at the end of a sentence should be split off — whereas “U.K.” should remain one token

CODE

make sure the model is installed in your machine : ‘en_core_web_sm’

import spacy

nlp = spacy.load(‘en_core_web_sm’)
doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)
for token in doc:
print(token.text)

The output will be in the form of

Apple |is |looking |at |buying |U.K.|startup|for |$ |1|billion

2)Part-of-speech tags and dependencies

Spacy can spare and tag the given doc. This is where the statistical model comes in, which enables spaCy to make a prediction of which tag or label most likely applies in this context.

Like many NLP libraries, spaCy encodes all strings to hash values to reduce memory usage and improve efficiency.

CODE

import spacy

nlp = spacy.load(‘en_core_web_sm’)
doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)

for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)

Here the print function contains many attributes

Text: The original word text.example base word for is is be
Lemma: The base form of the word. example the pos of IS is VERB
POS: The simple part-of-speech tag.
Tag: The detailed part-of-speech tag.

Dep: Syntactic dependency, i.e. the relation between tokens.
Shape: The word shape — capitalisation, punctuation, digits.
is alpha: Is the token an alpha character?
is stop: Is the token part of a stop list, i.e. the most common words of the language?

REMAINING FEATURES WILL BE EXPLAINED IN NEXT POST

Intro to Spacy

What is Spacy?

import spacy

nlp = spacy.load(‘en_core_web_sm’)
doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)
for token in doc:
print(token.text)

Apple |is |looking |at |buying |U.K.|startup|for |$ |1|billion

CODE

import spacy

nlp = spacy.load(‘en_core_web_sm’)
doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)

for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)

Post a Comment

Contact form

Intro to Spacy

What is Spacy?

import spacy

nlp = spacy.load(‘en_core_web_sm’)doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)for token in doc: print(token.text)

Apple |is |looking |at |buying |U.K.|startup|for |$ |1|billion

CODE

import spacy

nlp = spacy.load(‘en_core_web_sm’)doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)

for token in doc: print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)

Post a Comment

Contact form

nlp = spacy.load(‘en_core_web_sm’)
doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)
for token in doc:
print(token.text)

nlp = spacy.load(‘en_core_web_sm’)
doc = nlp(u’Apple is looking at buying U.K. startup for $1 billion’)

for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)