Twitter vs Thomson Reuters: Delek Holdings Results

Delek Holdings recently announced their results for the first ninth months of 2013 (including those for their listed US unit).

The first tweet that fswire ingested was from @InvestorEnergy at 08:50 UTC which linked to http://www.otcmarkets.com (http://www.otcmarkets.com/stock/DGRLY/news?id=72226) which provided a pdf of the full results.

The news (press release) was originally distributed via PR newswire and did appear on a number of sites (such as Bloomberg & Marketwatch) earlier than 08:50 but we couldn’t see any other tweets.

It is also worth noting the results from our search engine (http://www.ftsea.com/?q=delek). Where the volume of Tweets is not high, the search results include indexing of related sites and can often return better content.

delek-results-fs2

In comparison, Thomson Reuters published the news at 09:38 UTC and while they provided a summary of the results, the full results were not attached or linked to.

delek-results-tr

Twitter vs Thomson Reuters: Nigerian beer

A slightly less serious example, although a important topic, of how news travels from emerging markets – unless of course you are a beer drinker in Nigeria.

The following appeared in fswire at 20:29 UTC on Wednesday evening.

nigerian-beer-fswire

The first example of this in Thomson Reuters was at 19:57 UTC on Thursday evening.

nigerian-beer-reuters

Maybe the time difference could be explained because this isn’t that important to world markets. Possibly, but then again, everything is relative. Cheers.

Update: this story (on yahoo news) just made it (Fri 29th) onto digg.com so it’s officially a big story now.

ftsea.com launched – a new financial search engine

ftsea.com is a specialised search engine for financial services.

The content that is returned has been extracted from those links that people are actively sharing on Twitter. The messages and links that are selected go through an extensive filtering process to eliminate as much poor quality and non-financial content as possible.

The best way to show the value of the search engine is to look at an example.

Here is a search for ‘fracking’. The first image shows a standard search on Google. The results are all relevant to fracking but they are very general in nature.

google-search-fracking
Click to expand

A Google News search returns better results but the articles at the top are typically from higher profile sites that presumably have a higher pagerank score. While there is logic to that, it does mean that it is more difficult to find articles that could lead to any sort of edge. It is relatively easy to find articles from the Guardian or Bloomberg but it is much more difficult to find relevant but niche content that could make a difference. Also note that the top article from the Guardian is already three hours old.

google-news-fracking
Click to expand

Finally, the results from ftsea.com. Significantly more content and more particularly more recent content. Note that all the content showing on this page falls well within the hour.

ftsea-search
Click to expand

Twitter vs Thomson Reuters: Apple buys Primesense

The rumours had been around for a while, but now it was official – Apple Confirmed the acquisition of 3-D Sensor Startup PrimeSense.

On Thomson Reuters (Eikon) we looked for the news in both the Apple feed (AAPL.O) and in mergers and acquisitions. In both cases, the first article was the one highlighted below which appeared at 06:43:14 (London).

Twitter was significantly faster. The earliest message we ingested into our system was from @allthingsd, note the second image which shows the timestamp at 23:22 (London) on Sunday night.

It’s quite a big time difference. If we missed an earlier article on Eikon, please do let us know.

aapl-primesense

fswire-aapl

fswire-aapl2

Eating AAPL pie

Entity extraction or determining what assets a message relates to can be a difficult task.

With the hundreds of millions of tweets a day, it can be difficult to have 100% accuracy and services such as FSwire are often criticised for any messages that have been incorrectly allocated.

We couldn’t therefore help but feel a little smug when this popped up on the AAPL.O feed in Thomson Reuters (Eikon) earlier today.

It’s good to see we all get it wrong occasionally.

cranberries-find-sweet-tart home in a festive curd v2

cranberries-find-sweet-tart home in a festive curd

Breaking news examples from Twitter

Over the coming weeks, we’ll highlight some examples of where using Twitter would have had an advantage over using a traditional, financial news service such as Thomson Reuters or Bloomberg.

We do know from experience that sometimes Twitter is faster and sometimes it is slower. Often it is news from emerging markets which breaks first on Twitter as Bloomberg and Thomson Reuters have less of a footprint.

Here is one example from Friday 22nd – an oil pipeline explosion in China, a clear example of the news appearing first on Twitter.

The first tweet we were aware of was by the Xinhua News Agency in China (see https://twitter.com/XHNews/statuses/403734889331056641)

FSwire picked it up a little later, but still before it appeared on Eikon

Screengrab from fswire:
chinese-oil-explosion-fs

Screengrab from Thomson Reuters (Eikon):
chinese-oil-explosion

If you noted the announcement any earlier, please do let us know.

Our Unique Selling Point ?

The Problem

Personal and business interactions and have now moved to the Internet leading to an explosion data that can be mined to generate operation insight for SMB and Enterprises alike. A 2013 IDC report, estimated that the world generates 1 quintillion bytes of data per day, yet we are only able to analysis less than 1% of this information regardless of vertical.

To utilise even a fraction of this data, businesses must go through a series of steps with identify and sanitise the information they are interested in.

This requires the following expertise:

  1. Domain, for the vertical(s) the business operates within, allowing said businesses to build rules that identify pertinent data.
  2. Data, to define patterns and models that can be used to match/find/process future pertinent data, as well as models that exclude irrelevant data.
  3. Engineering, to implement the necessary machine logic and processes to collect, sanitise and categorise content so that it can be used by the organisation to drive business decisions
  4. Buisness, to create the business intelligence that will utilise said data and drive business decisions

With this in place the storage and compute power required to process such vast amounts of data in real time is significant. Furthermore the aforementioned steps/tasks that an organisation must go through/implement are not static in nature, content structure changes daily, thus continued data analysis to determine optimal strategies and their implementation adds too the running costs of any solution.

At FSWIRE we solve this for pain points I through III, for every area in financial services

How FSWIRE solves this problem for customers

(The Twitter Example)

450M tweets per day (310K tweets/minute or 5.1K tweets/second – average) are pushed through the twitter eco-system composed of any type of message content. Each tweet consists of unstructured text of 140 characters in size which may include embedded links to other information sources including web-pages and media. This equates to around 1 TB of message data, yet each message is augmented with a slew of additional content such as information about the author, location, tags etc. increasing data size to around 2 TB per day (2.1-2.3 MB/Sec) delivered as a stream of JSON structured data.

To process this one needs an effective/efficient mechanism for filtering the useful content (generally using text processing methodologies which are compute intensive). Subsequently content needs to be structured into a taxonomy fit for it’s business purpose.

Funnel Filter

Funnel Filter

At FSWIRE we do this by implementing a specialised multi-stage funnel filter allowing the removal of noise at various stages resulting in a highly pertinent stream of data structure around a multi dimensional taxonomy (see figure: funnel filter).

To provide some clarity on processing requirements, one should understand that some tweets are discarded at most stages with the pre-filtering stage reducing content from around 450 to 50 million messages per day.

  1. Pre-filtering is an efficient phase but still requires each tweet to be processed/matched against each of our 130K pre-filtering rules, which equates to around 59 Trillion rule applications per day.
  2. Junk Filtering follows, which uses a bayesian based machine learning model applied to each of the 50M resulting tweets.
  3. The Black/White list filters are compiled of around 100K rules that are also applied to each of the resulting 50M tweets (around 5 trillion rule matches).
  4. Our language analysis includes the use of bloom filters which allows us to identify and discard non-supported languages.
  5. Context and sentiment analysis utilize a cross-reference machine learning solution consisting of SVM (support vector machines) and Naive Bayesian models
  6. The taxonomy engine currently applies each of the 200K rule matches/queries to each tweet identifying which asset/market segment/jurisdiction a tweet belongs too (10 trillion query matches a day). This is a process intensive task as the taxonomy rules range from simple string matches through too multiple boolean like queries.
  7. The final stages identify the authority of a tweet based on its author and url domain as well as analysing any linked web-page for context/sentiment and further entity information.

As we have outlined each stage is CPU intensive, particularly the stages that require extensive text processing. To achieve this, FSWIRE operates a cloud scalable solution currently employing 300 processing nodes solely for the implementation of the aforementioned stages (excluding any data storage infrastructure) which spread the processing of:

64 Trillion rule matches/day + 10 Trillion data queries/day

Unique Selling Point

Flexible: Whether you want to enable your users to be able to follow people, tickers, commodities, currency pairs, key words… the FSwire platform delivers relevant content in real-time via our streaming API. We can customise our platform so that your users are able to decide “how much filtering” they want (i.e. removal of noise, irrelevant tweets or tweeters)

Speed to Market: The FSWIRE platform has been under development for 2 years. Our platform has been designed from the bottom-up specifically for financial professionals. Via a streaming API customers are able to access this platform quickly and efficiently.

Cost Savings: FSWIRE is built on a fully-resilient and highly-scalable platform that is capable of analysing huge quantities of real-time data. We have invested heavily in the development of proprietary machine learning algorithm(s) for processing unstructured data, with the view that as the quantity and velocity of daily information increases so does their ability to influence the financial markets, the importance of having a solution that can extract the relevant information from “noise” is paramount.

Domain Knowledge: FSWIRE has been built by financial markets experts for financial markets professionals. We have used our domain knowledge to help us identify the pertinent information from the irrelevant information.

Compliant: Whether a bank, fund or general investment broker our solution is compliant with all financial regulations across all business boarders.

We do this so you do not have to !