Posts

OPEN SOURCE & Companies providing Services

OPEN SOURCE TO START A NEW ERA http://help.monkeylearn.com/en/articles/2174206-what-is-monkeylearn http://help.monkeylearn.com/en/?q=Is+monkeylearn+an+open+source Lets have  look at these companies, and the services that they are providing by developing the platform. NER by using the API

Natural Language Processing

Image
Step wise instructions to do the NLP projects. Applications of NLP: =============== Sentiment Analysis Topic Modelling Name Entity Recognition Machine Translate Speech Recognition Question Answering Automatic Summarization Chatbots Market Intelligence Text Classification Character Recognition Spell Checking Autocomplete Voice text messaging Spam filters Related keywords on search engines Search Autocomplete Search Autocorrect Virtual Assistant Knowledgebase Support Customer Service Automation Survey Analytics Social Media Monitoring Descriptive Analysis Automatic Insights

Predictive Manitainence

Image
Condition-based maintenance (CBM) Definition Condition-based maintenance (CBM) is a maintenance strategy that monitors the actual condition of an asset to decide what maintenance needs to be done. CBM dictates that maintenance should only be performed when certain indicators show signs of decreasing performance or upcoming failure. Types of condition based maintenance Vibration analysis: Rotating equipment such as compressors, pumps and motors all exhibit a certain degree of vibration. As they degrade, or fall out of alignment, the amount of vibration increases. Vibration sensors can be used to detect when this becomes excessive.   Infrared: IR cameras can be used to detect high-temperature conditions in energized equipment   Ultrasonic: Detection of deep subsurface defects such as boat hull corrosion   A...

ACCURACY, RECALL, PRECISION, F1 SCORE

ACCURACY, RECALL, PRECISION, F1 SCORE Classifying a single point can result in a true positive ( truth = 1 , guess = 1 ), a true negative ( truth = 0 , guess = 0 ), a false positive ( truth = 0 , guess = 1 ), or a false negative ( truth = 1 , guess = 0 ). Accuracy measures how many classifications your algorithm got correct out of every classification it made. Recall measures the percentage of the relevant items your classifier was able to successfully find. Precision measures the percentage of items your classifier found that were actually relevant. Precision and recall are tied to each other. As one goes up, the other will go down. F1 score is a combination of precision and recall. F1 score will be low if either precision or recall is low. The decision to use precision, recall, or F1 score ultimately comes down to the context of your classification. Maybe you don’t care if your classifier has a lot of false positives. If that’s the case, precision doesn’t matter as m...

Naive Bayes

NAIVE BAYES Two events are independent if the occurrence of one event does not affect the probability of the second event If two events are independent then: P ( A ∩ B ) = P ( A ) × P ( B ) P(A ∩ B) = P(A) \times P(B) P ( A ∩ B ) = P ( A ) × P ( B ) A prior is an additional piece of information that tells us how likely an event is A frequentist approach to statistics does not incorporate a prior A Bayesian approach to statistics incorporates prior knowledge Bayes’ Theorem is the following: P ( A ∣ B ) = P ( B ∣ A ) ⋅ P ( A ) P ( B ) P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} P ( A ∣ B ) = P ( B ) P ( B ∣ A ) ⋅ P ( A ) ​

Random Forest

RANDOM FOREST A random forest is an ensemble machine learning model. It makes a classification by aggregating the classifications of many decision trees. Random forests are used to avoid overfitting. By aggregating the classification of multiple trees, having overfitted trees in a random forest is less impactful. Every decision tree in a random forest is created by using a different subset of data points from the training set. Those data points are chosen at random with replacement , which means a single data point can be chosen more than once. This process is known as bagging . When creating a tree in a random forest, a randomly selected subset of features are considered as candidates for the best splitting feature. If your dataset has n features, it is common practice to randomly select the square root of n features. Boosting Steps : Draw a random subset of training samples d1 without replacement from the training set D to train a weak learner C1 ...