You probably don't need AI/ML. You can make do with well written SQL scripts(threadreaderapp.com) |
You probably don't need AI/ML. You can make do with well written SQL scripts(threadreaderapp.com) |
select * from photos p inner join tags t where t.tag in('you','your wife','boat')
point being is "crap in, crap out". if you properly tag/label your data, you can accomplish anything with sql or machine learning.
Just read and stop hating. https://www.thestreet.com/investing/nasdaq-all-in-on-blockch...
One time at work I wrote a simple web app with a search box (just doing an sql query, nothing fancy). One of the "higher ups" was impressed and decided to flex their knowledge, pointing to the search box saying "and this uses nlp". It was a damn sql query on a full text field.
Artificial Intelligence is about statistical analysis.
Such as: Is this picture of a man and his dog, actually a dog? Or is it a cat? Or is it a 4 legged creature? Or is it a turtle?
The AI is supposed to identify that the animal in the picture, is a dog with a 99.8% probability. And since it exceeded the 98% threshold, then it becomes accepted as a dog, until otherwise disproven.
Basically, it is a pattern matching mechanism, on a massive statistical scale.
And from this, then further actions can be taken.
Such as, the owner of the dog, can be mailed advertising and coupons that are related to dogs.
And then, the AI can go even further. What specific kind of dog is it? Is it a German Shepard? Is it a beagle? Is it a poodle?
The AI can determine the specific type of dog, and conclude that it is a German Shepard with a 99.7% probability. This exceeds the threshold, so then the computer system might mail out an advertising to the owner, about deals related to a German Shepard.
For something like this, then this is where social media can really shine. When you upload your pictures to Facebook, or Gmail, or Instagram, then Facebook or Google, can use an AI to analyze your picture. As well as reading your caption on it. And they can determine the context of your picture, such as whether you have a dog in it. Are you holding the dog? Are you walking the dog? Are you smiling in the picture? If the scenarios check out, then the AI can select you as a candidate, and send advertisements related to your dog.
In fact, I think our brain operates the same way, by using statistical analysis.
When we see a dog, in a picture or in real life, our brain is actually using a statistical analysis to determine that it is a dog. Our brain follows a neural network pathway to match that picture of a dog, to a similar variation of a dog that we have in our memory. It is thus statistically true, until otherwise disproven.
This of course, happens in the deep recesses of our brain, so it's currently impossible to know what really is happening there, until we have a better scientific understanding of how our neurons work in our brain.
On the flip side, SQL scripts has no mechanism to view the picture, to determine if the animal in it, is a dog, or a cat, or even if it is a human.
See the point? Welcome to 2018.
Thinking back to successful ERP implementations, little was more useful during go-live or an ongoing basis than a script that ran every hour/day/week/month to look for a condition and report it.
In one case, over a 3 year period where the organization grew from 0 to 60 million per year, every data issue was logged as a ticket, investigated, where needed, a Sql script written to monitor other occurrences, and ultimately, if there was a need to action, it would be forwarded to the right destination with a link to instructions on how to resolve or investigate if a decision could not be programmatically made.
The power of this was users received direct and immediate feedback anytime they wanted if their work was good and compliant with the system and process.
How did the list of scripts to build get made? Every time the system behaved correctly or incorrectly, and needed attention, whether due to data being incomplete, mis-entered, or correct and ready for the next step, the technology was busy working for the users.
Scripts reduced concerns that issues were being missed. Once something had happened and it was important enough, a custom insight could be built. It helped build a data driven culture instead of hoping the computer picked the right thing.
Sql scripts could one day feed into or fit with AI/ML. I don't see that day here in the short term.
It's interesting because this article shows the overlap of what a non-tech thinks is AI and what is common fodder for any decent programmer. So many things get lost in buzzword to English translation, it's easy to forget that most people correlate the plastic box sitting in front of them with an intelligent Magic 8 Ball.
>> If a person tries to checkout with 3 different cards at the same time and they all bounced, something funny is happening. Block their account temporary for a while.
That assumes you know that 3 different cards were used and they bounced. Sure, the SQL can answer the question, but you have to know the question first.
I'm happy to be corrected here.
However, it would be good to have a bit more in the article to say what AI/ML* is in this context, and a couple of scenarios where it beats SQL; i.e. otherwise it just sounds like the rantings of an old man "in my day we only had turnips; you needed a snack: turnip; you needed a pillow: turnip". By showing a few good use cases allows you to better contrast the product / get an understanding of where the boundaries are between the technologies.
*NB: When I first read this I assumed the author was talking about AIML (artificial intelligence markup language) rather than AI/ML (artificial intelligence / machine language)... as though the slash was included, there was no use of the full terms.
There was a feature of Oracle called SOUNDEX which was magical. Here's an example from their docs page [1]:
SELECT last_name, first_name
FROM hr.employees
WHERE SOUNDEX(last_name)= SOUNDEX('SMYTHE');
This query will return all people with a last name that sounds like 'Smythe', including 'Smith' and 'Smithe'.[1] https://docs.oracle.com/cd/B19306_01/server.102/b14200/funct...
Soundex is very simple but works well, calculating a strings Jaro–Winkler distance also helped.
Does the same pattern apply also in more complex scenarios?
I think the issue the article hints at is there are way too many contractors willing to burn your cash on AI/ML.
Contracting has a serious principal agent problem; there was a discussion I recall over how to implement a quick search feature in a system we maintained. I floated the idea of sampling the data to get approximate results, but that was instantly shot down in favor of buying a ton more hardware. There are serious arguments against sampling, it's very tricky to get right, but if we had been spending our own money I think it would have gotten a more careful hearing.
Want to get a job done? Use a tool that gets a job done. Want to talk about getting a job done and be "listened" to - use ML to beat around the bush.
This is no different from all these companies talking about Big Data[tm] a few years ago, hiding people to build large processing clusters when their entire dataset would fit into memory of $700 server obtained from Ebay.
Neither it is different from companies mumbling about availability challenges when the entire stack gets sub 100 hits per second.
Late orders, biggest orders etc. etc. sure, those are all SQL queries.
However if you want to make statistical predictions or looks for the non obvious, these simple types of queries aren't going to do it. So it's an apples to oranges comparison.
There are a lot of cases where people don't know what they are after. And also lots of cases were orgs don't have a grasp on the simple things, but somehow think more complex things (especially buzzword things) are magically going to solve a lack of organization and insight.
Iterating to the author's given examples, we have probably been doing:
What would be the net effect in terms of sales and profit if we reduce our price by 5 cents, but increased our sales 25x? Those are already models that encompasses predictive modeling, where we provide inputs and determine from a given set of output based on general assumptions backed by data.
Sounds like you definitely need some ML there, in the form of statistics. Was there a difference in probability of repeat customers between sending and not sending the voucher? Was there a difference between basket sizes and probability of repeat customers? Is there an interaction between the two?
If your work scans your web browsing for certain words etc., don't click the link.
They are of course for companies that make so much money that they can afford to spend 6-digit pays for a Marketing Manager who doesn't know sh*t about his job who in turn is spending millions on randomizing-diagram generators so it seems like he is working hard.
I have also used Sparklines in Python for quick and dirty trends
Wow. I could never imagine so many people actually read marketing e-mails
Not sure if its going to be efficient for clustering and entity extraction at scale tho
I indicated that it was actually quite effective without ML, and that it was easier to explain to users this way. They kept prodding around on the ML stuff, and how we might be able to use ML to accomplish roughly the same thing.
A week later they said that they were no longer interested because, although they liked what our tech was able to accomplish, it didn't fit with their investment thesis — which was all about ML.
My wife asked me why I didn't just make some stuff up and say we could do v2 using ML. Perhaps she was right.
1: http://www.beelinereader.com/individual
update: in response to feedback below, I edited the link to point to a page with relevant content instead of our generic landing page. Lesson learned!
They probably decided they wanted to be part of that future market about AI and therefore want to invest in AI/ML startups. Someone sold your company internally as being a nice fit for the AI portfolio, but when you answered the question about ML with "No" the door was shut.
From someone who makes large trade deals as his daily business, I learned "Never say 'No'" (during a negotiation). In most cases, it is better to be diplomatic. So might have been better off with something like:
'So far we had decent results without ML, but we are constantly evaluating options to improve our technology, which includes optimizations based on machine learning.'
That way, some might hear '[...] includes optimizations based on machine learning.', while you just said 'we do not use machine learning' ;-)
We have already seen this with things like offshoring.
A few big name companies did the numbers and found they could save some on moving their production somewhere else.
Then came a rush of companies that made big that they had offshored who knows what, simply because that was what the stock market expected of a would be forward looking company.
The basic problem is that these decisions are not done based on what is good for the company long term, but what is good for the stock price short term. And in large part because CEOs are paid in stock options, and can be ousted by activist stock owners.
Why not put "BeeLine Reader makes reading on-screen easier, faster, and more enjoyable. We use a simple cognitive trick — an eye-guiding color gradient — to pull your eyes from one line to the next." on your front page?
Maybe you guys have already done some kind of testing and found out that the current layout is optimal?
One lesson learned: I should have linked to the /individual page instead of the generic landing page in my comment. Updating link now...
Also, if you ever have considered two code variants where one is a bit better in situation A and the other in situation B, you can use machine learning to combine the two methods, (e.g. using a random forest (https://en.wikipedia.org/wiki/Random_forest)
Hypotheses:
> corporate VC that wanted to make a strategic investment
They want to put money in your company to make sure they don't have to compete with you or compete with someone else who might buy you down the line.
> They kept prodding around on the ML stuff, and how we might be able to use ML to accomplish roughly the same thing.
They're looking for secret sauce, barriers, friction. Something patentable. Something to keep others out and keep them ahead. Perhaps something that they don't think they can do themselves or maybe rip off when they realize it's useful. Remember Flux?[0]
They're investment thesis makes sense from this perspective: they're thinking like classic, amoral businesses.
Also note that if your product is simple and popular enough, people will likely make free clones. You'll need to be a good steward to keep your value.
Best of luck!
----
A bit of feedback: you need to be more transparent and up front about what how and why you're using analytics, and let people opt-in. I was a bit disconcerted to see you using Google Analytics in your extension without informing me.
That's a new one to me. Is this Facebook changing the landscape right now, or have you been expecting this for a while? Do you have sites in mind that warn about GA?
Pretty much all websites use GA or something like it, and in my experience it's extremely rare to be warned about it. It always goes in the privacy policy, which you should be able to find. But I'm not sure I've ever seen an advance warning that analytics was taking place. I suppose it's assumed, but in any case appears to be normal and acceptable to not warn people that logging and searching of those logs exists.
Cookies are a different story, since the EU passed legislation requiring notice of their presence.
While that may be true, I'd suggest the VC's are just looking for a later exit when a greater fool buys them out. They're betting on the ML investment market getting hotter, and providing you meet the criteria of being a growing startup (by some metric) nominally in the ML field and that field attracts more investment dollars later - even whether or not you succeed as a bottom line business, or whether your business success comes from ML or not - doesn't necessarily matter.
More importantly, we don't tie GA or any other usage analytics to individual users. We basically just use it to see where the extension is used and where people are blacklisting sites. We use this data to make the extension run better on more sites. We don't monetize user data in any way. You're right that we should make this point more salient. In our Privacy Policy, we do describe how to opt out by blocking GA, but we should put this somewhere more prominent.
Thanks for the feedback!
Parade of Fans for Houston’s Funeral
NEWARK - The guest list and parade of limousines with celebrities emerging from them seemed more suited to a red carpet event in Hollywood or New York than than a gritty stretch of Sussex Avenue near the former site of the James M. Baxter Terrace public housing project here.
I found an example [1] with the same content so I guess you just haven't updated it.1: https://gist.github.com/blairanderson/85cc961295fd03a6c4b3
The system doesn't actually use ML or AI in realtime. ML just validates the correct conditionals and decisions are used in the software.
Most ML models can be distilled down using the Pareto principle anyway, and the 80/20 rule written into the code.
A politically correct and high-integrity story to tell investors might be "We use machine learning regression analysis to validate our expert systems are operating on optimal statistical models."
Just a quick bug report - it does the wrong thing on right-to-left text. e.g. on this article [1], it highlights the beginning (right side) of one line and the end (left side) of the next line in the same color, which doesn't actually reflect the flow of reading. There's also a weird effect on LTR text embedded in the RTL (on my setup, the "a" in "axios" on the third line of the body is bright red, while the rest is complete black).
Do they just jump on every bandwagon in the hope that one of them might pan out?
VCs raise money from fund managers (sovereign wealth funds for example) who also present a thesis to their stakeholders for how the money will be managed.
Depends on the timing but I think VC focus on ML is a good indicator of where smart money thinks the money is going to be made. But there are a lot of people that catch a trend in the tail in, I don’t think we’ve gotten anywhere close to that for ML investments.
(I'm working on a startup right now that's in a similar position - non-hyped B2B segment. VCs are all "hey, look, we don't understand X market very well, maybe go to Y VC?" And Y VC is all "hmm, you're only tangentially related to Z market that we actually understand, so you'll need to get to Series A-sized revenues before we'll put in a seed-size investment.")
But their manager told them ML, ML, so they need to invest in ML. This is not a joke.
But it's so much damn data so loosely coupled to a company's actual product that trying to get any actionable intelligence out of it is basically impossible. Enter ML. "Just" chuck your data at a NN use A/B tests to train and hope the company ends up with higher revenues, if they do: claim the 'data scientists' are definitely a profit center; if they don't: claim you need more data.
Until Exxon-Mobile stops storing metrics on how many goldfish I own in a given month, cranking out ML related companies seems like a good bet for VCs because it's not going to get any easier to turn progressively more esoteric data points into money.
Disqualify the easy "no", then everything else is a "maybe". Invest and move on.
Whether it is true or not in this case they may have perceived that if it was based on a ML approach it might be harder for the competition to replicate if it was successful.
You may have pitched a perfectly fine product to somebody in the market for a perfectly fine acquihire.
Interesting idea overall. I've been using spritz a lot recently and I'm liking it more and more though it doesn't seem to have progressed from when I discovered it a few years ago.
Re the investors blindly wanting ML - we have this problem too (and it's definitely a problem - I can see this in spite of having a masters in ML). On the bright side, some investors - often the better ones - have said things like "I was interested the moment you didn't use ML or blockchain in your pitch", so stay true to your vision :)
I read the updated /individual page and found it very... difficult.
The colours led me to speed up and slow down at an uncomfortable rate, to the extent I had to re-read it three times. I typically receive 200-1000 emails per day which all need to be read (yay!) plus at least 30k words per day out of email (business stuff but excluding newspapers, books, etc).
I speed-read about 6-10 words per flick of the eyes (for a short document), which is about 50 words per second. Do slow down to ponder: careful phrasing, a needingly precise written document, some graphics, and this is not for detriment.
I found the colour-coding very difficult, however.
Also, there are issues with red-screen sleep promoting systems like f.lux.
Up front, many investors only seek buzzwords before they shift focus to numbers. Be a sales(wo)man, and feed them the ones they want. Later on, once you end up making them money, they're not going to care much about what technologies are involved.
I wish you the best of luck with your startup in the future.
(P.S. I had a FOUC on your webpage. The icons in the "want more" section were missing, and haphazardly popping in. Might want to fix that.)
I read a lot of blogs. Thought you could add a wp plugin if it doesn't exist.
Also you must have a ml/ai roadmap for tracking user interactions, etc for possibility of further personalization. Leaving aside the buzz of ai/ml, I think every business has opportunity to apply ml/ai. It doesn't happen often but when it does, its gold for any business
Right now, it's easy to say it's a color gradient that wraps from one line to the next, guiding your eyes. I don't know how I insert "machine learning" into that sentence somewhere and not sound like I'm trying to trend-surf.
You should probably link to the add-on page (https://addons.mozilla.org/en-US/firefox/addon/beelinereader...) instead of directly to the .xpi file. I would never accept to install a plug-in from a random link even if it hadn't been for the dialog's scary language.
Another explanation: they didn't actually know how they wanted us to us ML — they just knew they wanted us to use it.
• CNET found that readers were 35% more likely to finish reading an article if they were reading with BeeLine turned on. See more on the CNET study in this article in The Atlantic [1].
• An optometric study using eye-tracking glasses found that BeeLine reduced the number of regressions and/or skipped lines for the vast majority of participants.
• Various educator-led studies (some more rigorous than others, and none as informal as the "case studies" that are unfortunately common in edtech) have found strong gains in reading fluency and/or comprehension.
I totally appreciate your skepticism, which is especially reasonable for someone who does not personally see a benefit from the tech.
The response from doctors who work in vision has been very positive, and our tools are recommended by doctors at Stanford Med School and UC Berkeley School of Optometry.
We even got featured by the American Optometric Association. The AOA committee that evaluated our technology said that this was the first time they'd ever had a unanimous vote in favor of anything. Apparently to them, it was quite obvious that this has beneficial effects (without having run any formal studies on it themselves).
To your point about colorblindness, I would note that I regularly come across people with red/green colorblindness who love/use our product. You can change the colors to be whatever you want, or use it in grayscale.
We are talking with publishers, but mostly on the digital side. Thanks for the input and questions!
1: http://www.theatlantic.com/technology/archive/2016/05/a-bett...
Just call what you do ML.
IBM, for one, never shied from calling any old BS "AI" and branding it Watson.
Basically, you try to capture the instinct of a great salesmen by formalizing it into computer logic.
Often that's done with rules like in the article.
It works good, but has its limits. The finer reasoning of human judgement are often not expressable, people don't know why they made that decision. Making it hard to capture. And human also have their limits. Too many variables, too much noise, too much data and they won't make the best prediction/decision.
That's when ML shines. Instead of trying to encode an expert's intuition, instead you let the machine develop its own intuition, itself becoming an expert through training.
The downside is it now similarly becomes challenging to formalize the machine's intuition. Why it made a given choice is no longer easily apparent.
I do think expert systems still have value. Especially when you lack the dataset to train a machine expert.
When companies actually hire data scientists what they typically do is clean data for a few months to a year . Then they interpret the data by probably being able to perform linear regression. At that point the data is in a state where it can be easily understood by those stakeholders and then they have created value. Whether or not the linear regression or whatever model has been learned may mean something. But, at the end of the day you need to tell stakeholders how they can create value and guess what SQL and Bash will do 90% of the job.
I think AI is extremely overhyped and under performant. In fact, I think a major strength of AI is founded in the technical ignorance of certain project managers or decision makers. The type of person who doesn't appreciate the simplistic elegance of sql+bash/cron for simple tasks is the person who will bite a pitch for AI customer retention strategy. Customers are people. Business is people. You don't need a rack of gpus to understand why sending someone an email who has a saved cart is a good idea. It's common sense. It doesn't matter if we can force machines through trillions of operations to vaguely capture a customer pattern of a guy at a console can write it by hand in five minutes.
(not always, I know, I work in finance so a lot of my business IS machines and not people, but you catch my drift)
I'm pro-AI research, and anti-AI hype train. They're computers. They're objects. They're not us yet. Consider the magnitude of the AI research market, which is tens of billions, and compare that to what they are actually capable of doing relative to human performance.
/rant
Maybe HN skews my perception on what the public tech enthusiast's perception on AI is...
When I looked at it, every feature shown, could be more reliably delivered and have a better customer experience through deterministic behaviour.
So I agree, I think AI has made certain technologies way better, but I see it as a tool, and like any tool, it sometimes applies to the situation and sometimes doesn't.
After talking to a number of their engineers, it became quite clear to me that instead of a data scientist, they just badly needed a DBA / someone with ownership and a complete vision of the data structure.
They had no foreign keys, poorly 'designed' indexes, and tons of redundant tables with no rhyme or reason to them.
They'd organically grown their database with hardly any review. They did not have big data, they just had a big mess. And wanted someone else to clean it up.
> I mean, why send a letter with breast pumps to a man that just bought a pair of sneakers? It doesn't even make sense. Typical open rate for most marketing emails is anywhere between 7 - 10%. But when we do our work well, we saw close to 25 - 30%.
How do you know what items are compatible to each other? Why only recommend sneakers to somebody with sneakers, instead of also recommending sport clothing?
Oh, I guess you could build some type of topology of all your shopping items. But what about recommending soccer balls to people that bought soccer shoes? You could also add that to your database, but now you also need a heuristic to score item similarity: `category_matches * 10 + subcategory_matches * 5 + color_matches * 2 + ...`
This is the whole point of ML. People have been building rule-based systems built on "domain expertise" for ages, only to find that they are limited and cannot compete with simple algorithms fed with enough data.
You know, counting items and dividing by total number is a kind of machine learned model, too. In technical terms it is "Computing the maximum likelihood estimate (MLE) for the PMF of a random variable taking finitely many values."
But it's a poor man's model. That's why in order to solve complex problems we use stuff like neural nets and gradient boosting, and in unsupervised learning, matrix factorisation.
I've been in the thick of this previously, facing a complex rules-based engine that did most of its incredible feats in the fraud detection domain using a number of really complicated SQL queries. At the same time, I've used the results of such queries combined together with machine learning and predictive analytics, giving you the best of both worlds. Both have strengths and weaknesses.
These are tools in the toolbox, and I think the adage "try to use the best tool for the job" still applies. Sometimes, you use the tool you have and you know, and all the more power to you if you can get the job done using that tool. If you are a master of that tool (i.e. SQL in this case), you can often push its capabilities very, very far.
That said, I think the best thing to do right now is try to separate the signal from the noise regarding AI/ML and find what really works and what does not. Then find how these new tools can either complement or replace previous approaches. I think they work together quite nicely - and we see that sometimes, for example, with AI/ML tools integrated close to SQL engines.
AI/ML has a place, and so does SQL. I will say, though, that I for one don't want to be caught on the side of the discussion where I don't learn enough about what is possible with AI/ML, and then get left behind. I think many of my colleagues and professionals in the field and here on YC feel similarly.
Actually, I think even non-technical people feel the same way - the fear of being replaced by AI/ML is higher than ever.
So, keep applying SQL and get that low-hanging fruit. But make sure to learn the new stuff too, and add it to your toolbox.
When politicians say they improved the economy like 30%, nobody buys into that. It's an overly exaggerated misleading political talk. But when some tech gurus talk about how AI improved their profit 30% or something, everyone seems to hop on. It's an effective marketing, for sure, but this is a worrisome trend. The root cause of this is I think the lack of proper understanding of fundamentals (and intellectual sloppiness). AI will continue to plague us on this front, and I'm still not sure if the net gain is going to beat all the distractions it created.
But once you start having to account for noise or seasonality or autoregression or dynamic weights or non linear kernel spaces, pure SQL really starts to fall down on the job.
His point is highlighted in the first tweet, in which the author appears to be specifically annoyed by the potential founders and investors that can't understand that ML isn't a good solution for all of the problems.
He then goes on and gives an example of such problem by explaining a shopping cart that doesn't actually need ML, but just some old-fashioned SQL. He doesn't claim that SQL is a solution to all ML problems, just this one.
Taking the shopping cart example: "In a former life, I used to write SQL to extract customer of the week. Basically, select from orders table where basket size is the biggest."
The author decided that 'customer of the week' will be selected by 'biggest basket size'. Not by 'biggest $ amount spent', 'fastest time from add-to-cart to checkout' (and numerous other attributes or combination of them). This decision (the "best attribute") was taken by a human, leaving a field open where a combination of attributes could've resulted in overall better business outcome (how much did 99% of these retained customers shop for, in $ value over lifetime?, etc)
This is possibly what the parent commenter is hinting at - this human decision leaves a lot of optimization scope, where ML could have helped.
Data Warehouses changed the way people and companies do data. They expose all kinds of things that were never available before. It was magic!
No. It wasn't. Not that Data Warehouses are bad or ineffective. But it's a lot like the problem you face when you observing something changes it. The work you have to go through to build a real data warehouse is that you have to get disparate parts of an organization to codify process. Data warehouses don't model data. They model processes.
The mere fact of forcing the company to pin the process is often more beneficial than the warehouse itself.
The same thing goes for ML and AI. The only way to extract features is for them to actually exist. And that means the data needs to exist in a certain form, and there's a human process that leads to that. Absent that, it's pretty useless.
I cut my teeth on SQL, and it's a big part of my professional career. I think it's great. It's one of my favorite languages, and it does a lot that maybe a lot of people don't know about.
But this title and the content are really pretty garbage. Anyone who thinks that good SQL can do what good AI/ML can do is really misunderstanding both.
Set up a bunch of rules created by the sales team. Tweaked it over months. Made money
Then used real sales data tied back to search history and built a machine learning model. It found new patterns that the sales team hadn't thought of, and performs much better
Hah I wondered why I got so many notifications in the middle of the night. Now I know that it's from people who think they're helping - not realising that it actually sours my opinion on their company/product.
If their data is already clean enough for SQL queries to work reliably and they are familiar with the SQL syntax, why not look into things such as DMX in MSSQL to make predictions on what these customers are likely to want to buy. This solves the whole marketing breast pumps to a man who bought sneakers scenario, while it also providing more personalized recommendations.
If your current technique is to send an email about sneakers to recent sneaker purchasers, do you really thing they are in the market for another pair?
Sure, it might not make sense to implement a deep learning neural network just to send something like a semi-personal marketing email but their are so many varying levels of AI/ML that seem to get ignored in favor of the flavor of the month Tensorflow/IBM Watson/Whatever else. Quite frankly, the whole thing just comes across as a very closed minded rant from someone who isn't interested in exploring what new technologies are capable of.
SQL is annoying to debug, ML is impossible to debug.
On the other hand comments are talking about hiring data scientists for months if not a year or more (yikes!) To clean data & wait for it ... perform linear regression. To me this sounds like a great application of machine learning. Couldn't someone train some models to clean the data, then do one of the things ML does best, linear regression, in a fraction of the time the human data scientists could do it in?
Yeah I think the article is garbage too, makes me wonder why it gained so much traction? The argument/ topic are not developed at all.
I guess the point they were going for is there are people who want to use ML because it's 'trendy' or something, and simpler solutions would suffice. I could see that being true, but this article is BAD. I hate seeing low quality articles get rewarded.
I've never heard of anyone hiring expensive Data Scientists, spinning up Spark/H2O clusters, building a data lake, doing a database offload to S3/HDFS all for a "select from orders table where basket size is the biggest" query.
AI/ML doesn't even work like this. It's simply not designed for giving 100% accurate answers to highly structured queries.
It can be summarized as "don't overengineer" but quite frankly these days ML/DL is so easy to apply from a technical point of view (taking care of the data or fully grasping the things you apply is another issue) that I don't see why one wouldn't at least try to use it. I don't see why a ML-algorithm couldn't grab the first name for example. I mean if your argument is "just use SQL" my counterargument is "I agree but I can just try ML as SQL on steroids". If you already have well curated data that you run the SQL on you might as well play around with it in an ML setting. "Customer with largest basket" might work fine but why not try to prod the data to check for other interesting things. Same for the POD example. Why not at least try to see if a combination of variables might yield more interesting results than the simple stuff that might work. Occams razor should not cut out all curiosity :D
I like the overall idea of "try the simple stuff first" but quite frankly these days you can run very good ML with pretty much all it takes to do SQL queries (assuming you train your models on a separate machine).
Costs include technical debt, increased maintenance, general opacity, and the risk that the complex model runs amok and does something stupid (which is more common than you might think).
Sometimes those costs are justified, sometimes they aren't.
It is saying that if your looking for ML/AI solutions for marketing and you ARENT doing the basics already then your throwing good money after bad. You should START with some sql and targeted emails before you dive into a large and potentially expensive project.
Things has changed and ml now a days can do far better things. If the competitor is using ml and making gain, then one should also catch up as soon as possible.
SQL analytics was past, predictive analytics is the future. ML can do more than predictive analytics for you :)
SQL analytics was past, predictive analytics is the future
You're over-simplifying things. SQL is here to stay, regardless of how big ML, which I'm very bullish on, becomes. Start with the simplest approach and try alternatives when/if it doesn't work. Simply jumping to "predictive analytics" is silly.Use the right tool for the task.
One issue this runs into: the machine becomes an expert at something that matches what it was exposed to during training. The research on adversarial examples shows that the thing being learned != what the expert learns, frequently to an extreme degree.
Still, with these methods, I think we're very far from real artificial intelligence, where you can really learn something new on the basis of something old. I believe until we have a profound understanding of what it means "to understand", such an advancement in AI won't be possible.
There isn't as much mystery as there seems. When a neuroscientist asks "how does a animal perceive the world?" the answer is reasonably methodologically obvious.
When a computer scientist asks, "how does a machine perceive the world?" the mystery arises only because it doesn't.
Tooting my own horn here: I'm taking up this challenge professionally. There are new methods being discovered to extract interpretations for humans out of a model[1][2][3]; and I'll gladly work and provide advice and implementations on this, as an independent contractor.
Having an interpretable/explicable ML pipeline helps in:
* Knowing what drives your customers' buying patterns
* Knowing where to look for signals closer to the root, helping go from correlation to causation
* Orienting your marketing messages. You don't engage users the same way if they're a repeat buyer or a "hot now" lead
* Concentrating on "would buy our product but won't because of X", which is where the market share battle lies; as opposed to "definitely would buy" and "definitely wouldn't buy", which are cases where you have little influence on.
* Listing buying patterns exhaustively, enabling you to discover low noise niches
* Providing auditable explanations of the decisions the model has taken, if it has been required legally
Here is my email: pro@benoit.paris
[1] https://distill.pub/2018/building-blocks/
[2] https://www.darpa.mil/program/explainable-artificial-intelli...
1) You train a complex ML model (say, deep neural net) to get the job done, solve your problem with high accuracy.
2) When you want to explain one of your model's results, you train a simpler more explainable model (say, linear regression) in the neighborhood of that point, such that it is locally similar to the more complex model. The task is too complex to be explained by a linear regression model over all possible inputs, but it often is simple enough around a specific point.
One worry is that a machine might learn to do something we can't represent efficiently. I mean, people write whole libraries of books on ethics, so how can you expect a machine to punch out one or two lines on its own?
There's no upper limit on how long a proof to a given problem must be in any fixed language. And if you have to invent new language, people won't be able to read it anymore.
The first issue is that, in general, explaining a decision is a harder task than making a decision, just as reviewing and understanding someone's decision takes more effort than it took to make it. This means that adding explainability and interpretability to a system that achieves some result will take more effort, time and money than building that system itself - you'll triple your budget and get no better actual results, just a better feeling about them. This also limits the effort invested in tools and methods for this; there's large demand and funding for making stuff work or making stuff a bit better, so many people are getting paid to work on that, and not nearly so much on interpretability.
The second issue is that best performing systems take "everything" into account and thus become too complex for a human to understand. A system can be explainable, interpretable and accessible, but if making a correct decision inherently relies on complicated interrelations between many factors, then it will be difficult (or at least time-consuming) for humans to understand anyway. If you limit the complexity of the system to something that can be easily explained, then you're severely limiting its power, and you get poor results.
For your query planner, the analogy is the scenario where we could build a query planner that is capable of optimizing beyond the commonly used basic blocks in query plans, instead outputting optimized machine code to perform particular query better than the standard blocks, achieving improved performance by merging and interleaving operations. The resulting output would be much harder to interpret, since no "block-level" query plan can be equivalent; and requiring the operations to be neatly separable in a limited number of blocks each with separate human-understandable meaning would mean restricting the system and inevitably getting a worse result.
The same applies to humans, by the way - being able to explain all the factors why you believe your decision is likely to be correct takes more knowledge, insight and effort than simply making that decision. It's not like humans can explain why exactly they see (or don't see) an animal in picture like http://www.naute.com/images/cow.gif , consciously understanding that in a structured way is very, very hard. If you would require a human to explain the criteria they'll use before they get to make a decision (as opposed to post-decision rationalization that often has no relation to why the decision was made, and has more relation to making up a socially acceptable plausible justification why you did what you did), then you get poor, simplified criteria that can't reflect the complexity of the problem; you could get their "rules of thumb" or their major factors, but explaining why exactly it is so is generally a mentoring task that takes months (if not more) of effort both from the teacher and the student.
AlphaZero system plays Go and Chess better than any human. It could be modified to describe in detail why it chose this move over the other, but would you (or any other human) have the mental capacity to truly understand it, no matter how well it's explained? The distill.pub article illustrates lots of ways how an image classifier can explain why it made one choice or another - but is it actually useful for tasks other than for R&D to debug or improve the model? It seems like a nice-to-have feature that would be liked if it's not to expensive, but I'd bet that for most models the company wouldn't actually use the explainability feature (again, outside of its builders debugging or improving it) because the whole point of ML is that people get removed out of the loop and don't look at the decisions.
There is an increasing number of ML-first companies out there. They are solving problems like object detection, portfolio strategy optimization, medical diagnosis, etc. Things that are 100% not possible to do with just SQL and Bash, and much less expensive than to have humans run those tasks. In those fields, these companies will outperform their competition on average.
I think it's smart for investors to look for in knowledge and talent in the companies they look into. If there's AI/ML talent in the team, their value for acquisition (acqui-hire) is higher.
If you want to solve problems for your existing company, you certainly don't need AI/ML. You can tackle many, many low-hanging fruits with just SQL and Bash.
If you want to build the kind of company that will be around in 30 years, you need to have an ML-first mindset, and you probably need to start now.
Few things were more enjoyable for me than getting a new client, imagining what analysis I'd like to do, figuring out what data would be necessary, and then implementing the system to make it reality.
according to your schedule they’d ask what I’ve been doing and if I tell them I’ve just been “collecting data” that translates to them as “I haven’t done anything in 2 months”. If I say “keep paying me and you’ll see results in a year” that translates to “I haven’t done anything in 2 months but I want you to continue to pay me for another 10”.
What do you do to qualify a client? How do you know your engagement with them won’t just waste everyone’s time when they quit halfway through and then damage your reputation?
So while it may be tempting to simply hand data over to the business because “that’s not my job”, that can totally come back to bite you in the ass when your product roadmap is being dictated by ill conceived Excel formulas.
I'd summarize that the "big three" of ML/DL are all nontechnical
1) Finding the data/cleaning the data
2) Is there actually some reference you can benchmark against
3) It's not really all that hard and you might need less data than you think to show enough business value (broadly speaking a pretrained net + some usecase layers is good enough to get started for most business cases)
#3 is interesting. I tend to be quite honest but I've seen a lot of mumbo-jumbo hand waving "oh it's very tough" which of course makes sense if you want to sell shiny new stuff. Reminds me a lot of ERP sales where I've seen basically SQL+a little webfrontend pitched as rocket science :D
I understand that feeling. On the other hand, aren't there a lot of trains you can miss? Do you try to guess which trains are the critical ones (and can you, really?) or do you try to keep up with all of them?
AI/machine learning, quantum, block chain...the list goes on. It seems like the best you can do is not be blindingly ignorant of any of the technologies and their potential...I think few people can really "keep up" unless they have the luck (or misfortune?) to have a job like "Quantum Crypto AI Researcher".
OP gave a few examples for Ecommerce where SQL will do fine. Can you give a few where ML will do something otherwise impossible or harder with SQL?
You can't recreate the Netflix Prize in SQL, you can't really do NLP, you can't do pretty much any clustering or unsupervised learning. Again, "can't" meaning "can't do it easily or in a sustainable manner (i.e. shouldn't.)"
And if we add in AI, I don't need to explain that SQL has no abilities to generate new content - it can't write a new song or headline, understand context and provide feedback, or drive a car.
Now what OP is arguing is that you can get a lot of lift with SQL and basic exploratory data analysis and I totally agree. But ML can take you further out into the speculative realm and can work on a much less explicit and prescriptive model than SQL.
Basically in SQL if you don't write it (whatever it is, a pattern, an anomaly, a correlation, a context) in your query you will never learn a single thing about it.
I give you a problem e.g. "tell me what factors are influencing a customer's likelihood to leave a bad review". And with ML you can actually produce a list of sorted factors with weighted percentages. You simply can't do that with SQL.
Also remember that SQL is for databases so you often can't do anything algorithmic e.g. K-means clustering, linear regression, random forest, decision trees etc. Where as ML encompasses non-databases as well e.g. from a Kafka stream of numbers predict the next number.
And the choice to do this properly would depend on your volumes. If you're a small shop, you just pick one time - definitely not 2 AM, but, say, 11:30 AM (so office workers can do their thing in a lunchbreak) or 8 PM when they're likely to be home; it depends on your target audience. If you have a distributed client base, you'd want to take time zones into account. And if you're large enough so that small changes in this result make enough money to worry about it seriously, you might even do some ML to pick the optimal reminder time for each customer; e.g. training a predictor on what factors will influence the 'desired action' chance in a "multi-armed bandit" approach to explore the options initially and then start using the ones that work best. That's obviously overkill for most companies, but for large online retailers that would be a natural choice, it all depends on scale.
What ends up happening is effectively the Data Science engagement becomes 90% data cleaning, a handful of SQL statements that should have existed beforehand but never did because the data infrastructure wasn't there, and possibly a veneer of ML/AI just to say it was used. Clients come out happy (sometimes), despite overpaying for what was a much more basic engagement than they think it was, and they go on preaching to their business exec friends the virtue of ML/AI and the cycle continues.
[1] I built up a Business Intelligence/Analytics team at my last job, and currently work for a marketing agency managing digital analytics for Fortune 50 clients. Lots of exposure to analytics in lots of varying environments, and I've seen firsthand how ML engagements get pitched and results get presented.
I also own a consultancy that's the anti-version of this phenomenon, offering digital analytics management and support services. 50% of my work involves being a knowledgeable resource for marketing and business execs to lean on to cut through the bullshit. With most of the rest being basic Google Analytics/Google Tag Manager management, CrazyEgg, and drip marketing campaigns. All of which seems like AI-level magic for clients when done correctly.
I was asked about my thoughts on AI/ML at work, I said it didn't really apply to us. I was told "but with ML we can figure out when deliveries are happening and scale the machines before the deliveries happen based on the peak traffic times". I tried to explain that we could so all that from SQL and looking at our data. We have all the data we just need to formulate it into something that makes sense to predict which times of day, days of week, for each region, where we have more traffic then use that data to pre-scale. I was shot down to "you clearly do not understand ML and should go read up on it".
In your example, you should absolutely start by cleaning your data up and run some basic SQL aggregations and plotting volume over time. So you look that that and notice (1) volume is increasing over time, (2) some holidays bump a few days ahead but drive very low volume day of (3) weekends are higher, but the effect isn’t pronounced the whole year and (4) summer is better for you than winter except for the Christmas season. Now: it’s two days before Halloween, what’s our anticipated sales volume?
If you baked all those observations into an ARIMA model, it’s trivial to crank out a forecast with quantifiable accuracy. If you just have lines on a graph, it’s hard to pin down all the independent effects and recombine them for arbitrary scenarios.
But then.. you would be doing ML ;)
I've done tons of projects, for tons of companies, and this sort of refrain from people is pretty common when they don't have experience in the field. They think of it like some fad that doesn't make a lick of sense outside of a C-level discussion.
But the reality is, predictive analytics is extremely powerful. One of the last projects I did was to save Trains from derailing. Another was to improve crop yield of a farming company by using satellite imagery to determine when a field was most needed to be harvested. Tons of other examples.
To even explain the particular use cases would take quite awhile because they are domain specific issues. Cron isn't solving these problems.
What the person is really saying is, I don't have experience in these topics, what can be so hard about them?
The same style for people who arm chair sports, or politics, or programming, or any other topic. It all seems easy when you don't know the details.
I think its fare to claim that companies who skipped that process might want to consider it first, as a cheaper way to start with predictive analytics.
But, I'm not sure, I am actually intrigued, what was the techniques used before ML in that field otherwise?
Nobody, repeat nobody is spending tens of millions on Data Science programs for answers to problems that a Data Analyst could already do.
If you know names please be specific but what you are saying is a bit ridiculous.
Although to be fair, this outcome was still an improvement. At least with using machine learning unnecessarily the data actually meant something and wasn’t just arbitrary excel numerology.
Even Data Scientists couldn't tell you if it just means neutral networks or if it include ML techniques. There are technologies like AutoML which automate feature engineering but is that ML or AI. Not sure.
So I am not concerned whether people know AI/ML or not. What I have issue with is people thinking that you can 90% of AI/ML using SQL. Which makes no sense.
Sending a screenshot by e-mail now.
My first company provided a speed reading training course for the best part of a day.
It wasn't the best targeted course for most people there. We mainly read legal and policy documents which by their nature are terse, and the method of speed reading taught traded-off retention against speed against accuracy. It was more of a concept, which you can go off with an develop yourself. So here's the idea -
1. Start with a text that is quite verbose but not overly so. A code specification or Brett Easton Ellis novel is a no. A light news article should be fine or non-technical book should be fine.
2. Recognise that when you read, you don't read a word from letter start to finish, you basically look in the middle of it and the word is recognised. If you have dyslexia, this technically probably doesn't work; but if you have dyslexia, you've probably invested countless painful hours to something a lot of us take for granted and hats off to that.
You might not notice, but you probably read multiple-words at a time already. Commonly paired words are often read together, like 'a few'. YMMV. Harness that.
3. Mentally split a page of text into narrow columns, perhaps the width of 2 medium length words. Flick your eyes across a line of text. Only focus on the middle of the mental column and absorb the words, like a snapshot (no need to blink).
This is where accuracy starts to degrade. If the text consists of obscure words or lots of figures, accuracy can hurt at the start. Grammar patterns are important too - as long as the text is written by a native or very close near native speaker, anticipation and auto-ordering in this mini-snapshot can be hard or impossible; likewise unfamiliar or complex sentence structures from a native speaker.
4. Now, start to speed up over a paragraph. This is where retention starts to hurt. You certainly don't want to be wording-out words, as this will slow you down and defeat the purpose. But when you've read a block of text, have you been able to follow the meaning at the end.
Again, using well written, non-overly terse training text is important.
5. You should be tired by now, having done this for several hours. Our trainer then suggested changing the 'snapshot' from one line, to two lines, so you're essentially reading the start of the second line before finishing the end of the first line. I can do this but your text needs to be super non-terse for this to work at the start, and even then it's weird. If you can cover a single line pretty fast in less than a second certainly, then this starts to become feasible.
tl;dr
Went on a training course. Was pessimistic as were my colleagues at the start. Persisted afterwards by trying a few minutes per day. Became a habit, especially when seeing articles or documents that I needed to read but didn't need to scrutinise. Important to choose the right tool for the right job. I'd not use this technique to read a document where I'm not familiar with the content-scope; I wouldn't use this technique to read a contract I'm signing, or where specifications/requirements are being very clearly outlined. I do use it to read things like bloomberg.com articles, or PMBOK Guide :)
Hope a little useful, and the above can set you on some kind of direction to learn more. I highly recommend a half-day instructor-led course to kick you off. I imagine there's a college near you which has something.
I think people are too often too quick to assume that machine learning necessarily implies deep learning and neural networks, when often it can reasonably be reduced to simple statistical approaches.
But in this particular case, it was a corporate VC, investing their own money.
App notifications are just programming you to be psychologically dependent on them. You can break free of the programming.
It's really, really hard to get executive budgetary approval for a foundational data audit/cleaning project (comprehensive data cataloging, data cleaning, source auditing and validation, etc). Doing so implicitly admits that you weren't doing that before, and now you have to pay gobs of money to fix it. The larger the company, the more infeasible it is to push this through because of the breadth of technical/analytical debt that has accrued and the price tag associated with the project, combined with the perception of incompetency (i.e. it's an expensive project that's fixing a problem you as the executive shouldn't have let happen to begin with).
Whereas an ML/AI project/initiative/push is a net new capability that you're spearheading, and it's easier to get the political traction to spend money on net new things, especially buzzwordy net new things that the firm can use to be viewed as cutting edge. The fact that you're rolling up the cost of a complete data management audit to be able to even do the ML/AI project is a minor bullet-point that doesn't matter. Executive expectation is that anything ML/AI/new-age-techy is going to be astronomically expensive anyway, so it doesn't get noticed that they're paying a premium on labor to do it as a combined project rather than as two separate projects.
Effectively, the foundational work that's needed to support ML work is also the work that's needed to do basic SQL-based analytic work, but it's way easier to get that budgetary line approved in a flashy ML project, even if you're paying a premium by having the ML/AI firm do the foundational work instead of the specialist work.
Plus, spearheading ML/AI initiatives make for better resume points than "data management initiatives". So there's little reason for anyone in this process to attempt to change anything, unless you happen to materially benefit from a firm's profit. For this reason, my main consulting clients are bootstrapped firms that actually care about being pragmatic over being trendsetters.
Note: This is a huge generalization, and doesn't apply universally. But it's far more common than you would expect, especially as you veer away from the type of companies that pop up on HN towards more traditional industries.
What has AI really given us so far? Amazon / Youtube recommendations and some self driving cars that have crashed in unexpected ways. (Have they managed to debug that yet?)
Bar 1: Last years deliveries.
Bar 2: Predicted this years deliveries (based on % of increase from other markets if no previous years, or percentage of growth from last 2 years)
Bar 3: Actual deliveries.
Then another graph:
Bar 1: Average processing time from previous year.
Bar 2: Average processing time for current year.
I'm terrible with graphs and such tho, I can get all the data, I really suck at displaying that data tho.
(and also show the holidays in that region, India has a lot!)
> "by day of week"
This means you have a time series with daily resolution (one observation per day) and you expect Weekly Seasonality to matter. Model this as 7-period lag in your daily series.
> based on % of increase from other markets if no previous years
There are multiple markets, each with their own time series? Congratulations, you have Panel Data [0]. Do some regions have similar trends? Need to account for that correlation, maybe try a Mixed Model [1]
> percentage of growth from last 2 years
So there's a trend component (constant growth over time). Easy enough to fit the I term of an ARIMA model for this. You'll need to do some custom work to integrate this with your cross-regional correlations though.
> Average processing time
You'll want to model this at least as well as you're modelling the demand. That means seasonal effects, correlations between factories / warehouses, etc. PS any time you're dealing with a two-day weekend you'd better use at least 3 years of data in case major holidays happened to fall on a Saturday and Sunday, blindsiding you when it shows up on Monday this year.
> show the holidays
You'll definitely need to put together a calendar of major holidays for each region. These models will calculate the effect for each holiday. Specifically, the effect of the holiday AFTER accounting for the day of week, overall growth, time of year (season), and region. You might even get nifty charts like [2]
===============
Anyway that's the basics. You can take graduate math courses in just this kind of modelling. Easier - you can contract a decent statistician for a couple weeks to build the model for you. They'll be delighted that you can produce SQL queries with the relevant data, and their models will help you get a lot more value out of those queries.
===============
Resources 4 U
Generic time series reference:
[0] https://en.wikipedia.org/wiki/Panel_data
[1] https://en.wikipedia.org/wiki/Mixed_model
[2] https://assets.digitalocean.com/articles/eng_python/prophet/...
[3] https://en.wikipedia.org/wiki/Autoregressive_integrated_movi...
R tutorials:
[4] https://www.datascience.com/blog/introduction-to-forecasting...
[5] https://cran.r-project.org/web/views/TimeSeries.html (Particularly the "Forecasting and Univariate Modeling" section for your problem)
Python tutorials:
[6] https://machinelearningmastery.com/arima-for-time-series-for...
[7] http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA...
[8] https://www.digitalocean.com/community/tutorials/a-guide-to-...
You may not identify individual users but Google probably do...
> we should put this somewhere more prominent
You should make it opt in by default. Or stop doing it!
What? Can you post some examples of sites that already do this? Everybody has analytics on their landing pages, and I don't think I've ever personally seen an opt-in.
What you can find are thousands of examples of site policy pages that say something like: "by choosing to visit our site, you are sharing information with us that we log for the purposes of improving our customer experience. By using our service, you agree to this logging. If you don't want to be logged, then please don't use our service."
Google Analytics is already GDPR compliant, as long as:
* you are not pushing any unencrypted customer data to them in the clear
* you're careful regards who has access to your Google Analytics data
You can (and most businesses will) argue that you need to know how your website is used. It's a completely standard practice that doesn't put user privacy at risk. Moreover, Google has already offered an opt-out to Google Analytics globally, and you would be covered by that. As such, you don't need them to opt-in.
I wouldn't like it if someone followed me around meatspace with a clipboard all day. I don't like it online either.
I really appreciate all the candid advice in this thread and hope that we — and others in similar situations — can use it to improve our products.
- You're already not getting data from people with ad-blockers, consider also respecting Do Not Track[0] - Be transparent about what you track and how it's used - The data collected from opt-in users might end up being enough to build convincing enough reports for impact questionnaires? - Your privacy policy seems confused: "We think the best way to ensure that your personal information stays safe is to never collect any in the first place" then "we only use Google Analytics, which gathers information about our site users and generates aggregate reports to help us figure out who is using our site and extensions, and in what ways"
[0]: https://developer.mozilla.org/en-US/docs/Web/API/Navigator/d...
http://news.itu.int/reality-check-not-nearly-close-strong-ai...
> Marcus also observes “a bias in the field which is to assume that everything is learnt.” Marcus makes the argument that human beings do not learn everything by ‘trial-and-error’ – that part of our knowledge is innate, learnt over evolution – leading him to suggest that “we need more innateness if we are going to build intelligent agents … It’s not learning trial-by-trial in the way that our contemporary machines are.”
Perception is structured by concepts which are learnt. Perception is the prototypical mechanism of understanding.
Insofar as we talking about "abstractions" we are just using language to model the conceptual structure of perception; and then higher order concepts to model the structure of these "perceptual" concepts.
We tend to think of understanding as being equivalent to the linguistic act of accounting for a higher-order concept, when this is really only something humans can do -- and then only with tremendous difficulty needing quite significant methodological assistance from others, etc.
This is the mistake Socrates makes when he asks for "definitions", as-if the dog doesn't understand the geography of where he lives only because he has no linguistic model of the concepts he's using. Or as-if the general has no knowledge of justice because he has no precise linguistic model of it.
When Potter Steward said in the pornography case, " I know it when I see it " he was literally correct. The concept "pornography" is part of seeing pornography, it is knowledge which structures your perception. The lack of perfect linguistic model for that is neither here nor there, and a bit of a blind alley for AI (and socrates).
Animals, including humans, understand a great deal more than any one person can model linguistically. It is this which structures our perception and allows even trivial engagements with our environment.
Here you refer to categorization, not understanding. From the article I refer to:
> “Deep learning is good at certain aspects of perception, particularly categorization, but perception is more than categorization and cognition or intelligence is more than just perception. There are many things that go into intelligence … And what we have made real progress on is perception, but the rest of it, we still haven’t made that much progress collectively in the field.”
Understanding, real understanding, the "aha moment", is the very basis of new discoveries and real evolution. What we see now are systems quite good at categorizing, pattern recognition in a given set once prepared correctly and so on. There is no relation whatsoever with real intelligence, as in being able to make a new discovery based on analytical and synthetic skills.
There are quite a few voices in the community saying the opposite and maintaining that mimicking intelligent action is practically the same as the intelligent action itself, but there are quite obvious limits to what can be achieved in this way.
Bigger clients budgeting can also be conducive to longer term planning as well. In year one they can write it off as an R&D investment, year two it can start to make returns.
Clients who have been burned by less capable companies are also comforted by very well defined plans, so they are happy to hear you have actually accounted for the issues that likely "came out of nowhere" last time.
Also, you need a proven track record - if you don't have previous clients, have a working proof of concept for something similar; which can lead you into signing them up for a 3-6 month pilot project, followed by a real contract or two and maintenance.
This also leads nicely to the pricing - this can not be done cheap, since if you sell, say, 20 man months of engineering to a customer, it's not enough to get paid for 20 man months of engineering they do; the contract also needs to pay for 5 man months of sales (including engineer time) it took to arrange their contract, the 15 man months of sales you spent on other, failed leads to get this one contract (I mean, most sales will fail), and for a part of 20 man months of R&D it took to develop the various proof of concepts and demonstrations that were absolutely necessary to start getting contracts but not paid by anyone.
The first few months aren't just us working silently. It was always a very intensive process, I'd be on the phone with the client a few times a day to work through issues. They also get valuable tidbits of data early on. Most clients had ideas about how they expected data to look (70% of my revenue comes from product x) but often times the reality is different and they see immediate value.
Clients sign up for a contract that auto-renews every three months and they have to give 30 days notice to terminate. At a small agency you have the luxury of denying clients all the time too that you don't want to work with.
I think you are underestimating how useful NLP could be in your application.
I tried out one of your test cases, and I did find it useful. However, I think that blending colors based on entity recognition along with your line based system could focus attention on important parts.
Have a look at [1] (sorry for the long URL) and imagine the colors blending with yours, so entities were slightly brighter than the rest of the text.
[1] https://explosion.ai/demos/displacy-ent?text=John%20Fitzgera...
But honestly the biggest barrier we face in adoption among licensees (platforms that would integrate our tech) is that they are simply uncomfortable with text that has colors in it. It's not about how much it helps any metrics (the things that NLP could improve) — it's just that most folks don't want to be early-adopters of crazy-looking tech.
Though I should note we do have some great licensees, especially in the education, impact, and accessibility markets.
Regardless, I'd love to chat with you further — please shoot me an email (contact@[domain]) if you're up for chatting about how we could deploy NLP as we develop.
It sounds to me like you should talk to those investors again ;)
I'll drop you an email.
So I guess test it and variations?
No, fuzzy logic won't make your washing machine better. Yes, you can implement some stuff in fuzzy logic.
Ideally, people that do nothing but tick off boxes in their mental buzzword checklists should not even be allowed to attend the pitches, and thereby waste other people's time by forcing them to hack around the unrepentant ignorance. But they that have the gold, make the rules.
There's really not, but it might make you feel better to think there is.
"Managing big data with MySQL" - the syllabus mentions nothing of clustering or sharing. Ten years ago that was just "using a database". I am getting to old for the faddish nature of this industry.
I'm not sure how I'd do this with SQL. That certainly doesn't mean you couldn't take a SQL approach to the problem of "how do I make recommendations based on this training set and this particular individual's attributes" and do something creative and useful. But I do think you can, in this situation, get something out of ML that would be difficult (ok, I'll say impossible) to get with a pure SQL approach.
To be clear, I love SQL. I still get a little flash of anger when I remember my arguments with "architects" who announced to a room full of non-technical people that I was pretty much a dinosaur unwilling to move away from an obsolete technology when they advocated no-sql approaches to data that was deeply relational. And I'm sure that there are plenty of people who think "need answer from data, must use ML" and end up somehow trying to train a neural net to perform a WHERE clause. Having seen the hype cycle several times, I am absolutely certain this has occurred in boardrooms/open office cubicle farms all across the you ess aye and beyond.
But keep in mind, even though reports of SQL's death were remarkably exaggerated, "nosql" approaches often do make the most sense. Plenty of good ideas and technologies came from nosql, and many of the people who created them and advocated them continued to use (and argue for) SQL and relational databases depending on the problem. A graph database is a vastly superior approach to some types of network problems than standard SQL, and not helpful for others. When the hype cycle around ML dies down (and some people start writing articles that ML is "dead" or whatever), ML will continue to be used effectively in all kinds of places, as it is now.
I looked up a bit on elasticsearch's text classification, and it's interesting. There's a overview of "traditional" ML (similar to the song review classification I mentioned) and how elastic search differs.
https://www.elastic.co/blog/text-classification-made-easy-wi...
"The MLT query is a very important query for text mining. How does it work? It can process arbitrary text, extract the top n keywords relative to the actual "model" and run a boolean match query with those keywords. This query is often used to gather similar documents."
Yeah, looks like aggregation and scoring. It'd be interesting to see if/where this outperformed various ML algorithms (accuracy as well as performance).
[1] https://www.unc.edu/courses/2009spring/plcy/240/001/Kant.pdf
“Act only according to that maxim whereby you can at the same time will that it should become a universal law.” — Immanuel Kant, Grounding for the Metaphysics of Morals
Machine Learning = statistics + linear algebra + computer science, mostly.
Naive Bayes and Graphical Models are pure statistics, but they are mostly used for toy problems. Machine Learning scales these approaches to high dimensionality problems, and tasks where data is abundant.
Just because Bayesian isn’t over-hyped doesn’t mean it doesn’t solve real problems. Not everything can be solved with ML, unless the problem is getting more investor money.
Now, the biggest difference that I can see is that, within machine learning contexts, people are more concerned with the quality of the predictions than the interpretation of the independent variables. That’s not hard and fast, but it seems to be a common thread.
And for some problems, making a good prediction is really the right thing. In other cases, understanding the mechanisms you might use to effect an outcome is more important. Both are valid uses of statistical methods, depending on the problem.
I think within the realm of classical statistics, Bayesian methods’ super power is in being able to generate results that are much easier to communicate to lay people. Also Bayesian methods are nice if you want to do sensitivity analysis in a principled up front kind of way.
But I could imagine using those methods in an ML context even if they aren’t the current darlings of the methodological pantheon.
Also, the argument that it's pure statistics doesn't really fly - support vector machines, random forests or deep learning multilayer perceptrons are just as much statistics as probabilistic graphical models.
I think they just feel "toy" because they've been used with great accuracy for so long.
Complex problems are rarely solved by a single approach. The harder the problem the more likely a suite will be used, and often naive bayes will be part of that in some capacity.
Naive Bayes is a common baseline solution because it's very simple to implement and very fast to run; however, pretty much any other ML method gives better accuracy than Naive Bayes. Sometimes the accuracy improvement is small and doesn't justify the highly increased computing cost, though, so Naive Bayes becomes the appropriate method to use, but it's not because of its accuracy.
Bayesian models are also advantageous in that they are much easier to describe to stakeholders. Neural networks, on the other hand, are highly difficult(if not impossible in many cases) to describe how they function.
AI is about substituting human decision makers or other conventional computer decision algorithms. AI is used for situational decision making. It needs to be quick (like the conventional algorithms) while also recognizing the nuance and multi-dimemaional nature of a lot of decisions. Robustness and quickness are often valued more than precision.
IOW the GP is only using ML for everything because it is a buzzword.
What you use as description for AI tempts me to use the "use it as a buzzword" angle back at you here - it is a stereotypical description of some use cases of AI approaches; but there are others - [machine] learning, knowledge representation, planning and scheduling, reasoning (both formal logic reasoning and also reasoning under uncertainty e.g. Bayesian approaches), intelligent agent representation, etc, are all parts of the AI field.
Machine learning is the use of computers to make decisions (or classifications) based on data without human intervention.
Publishers, educators etc also need to understand what the value proposition is for their users.
I also think the page ought to be written in longer sentences. The current silo buttons aren't even making it clear that they explain more.
So, suggestions:
You just have to take the few sentences from your Individuals page "Reading on-screen can be tough on your eyes, especially if you have to do it all day long. We’re here to help. BeeLine Reader makes reading on-screen easier, faster, and more enjoyable." and put that on your landing page instead of "A New Way to Read."
Now change the "(choose)" subheading to "perspectives:", put it lower nearer the buttons instead of near the description, and perhaps left-align it.
Change the buttons from meaningless colours with labels to something descriptive like "for you the user :smile:", "for the publisher $" and "for the educator :graduation-hat-icon:".
Voila! good luck!
Maybe something generic like "improve focus while reading", which could appeal to individuals and publishing partners? Appreciate your taking the time to share feedback — this sort of stuff is not something we're skilled/experienced at.
I think it will be very helpful to have a simple description in your site. First impression matters.
p.s. I like the idea of your product.
The silos are good, you just need to make it much more obvious that the user needs to select one of the silos in order to get at more information. It's not particularly obvious that your buttons are buttons. I would also recommend that the content be full height / width so that on desktop at least the user doesn't scroll at all. The stock photo is probably unnecessary for the main page. Also you don't really need left/right margins since there isn't really text on the page, and that's what large margins are good for (but I'm assuming you know that since you made this app!)
I should also note that reading ease is an important (but difficult to measure) piece of this as well. So if it felt better for you, then it might be worth trying out the browser plugin and then trying the reading challenge again. There could be a little bit of a learning curve, as your eyes adjust to how quickly you can jump from line to line.
Things get more complicated, as the browser plugins are subscription ($2/mo or $22/yr) and the iOS app is one-time IAPs (mostly a few bucks, but more for the Kindle feature). I'm not sure how to succinctly communicate this, but you and other commenters make clear that we need to have some info here.
Thanks for taking the time to share your thoughts!
It doesn't need to be a lot of detail, or precise. Even just a "from $..." or "plans starting at ..." would be enough.
* I know that I will forget I even clicked on that link. The page has 3.5 seconds to convince me to stay longer.
* I'm mildly intrigued and form, in my head, a price I'd be willing to pay for the product.
* I scan the navbar for a pricing link to see if my expectation matches reality. I find nothing.
* The site obviously doesn't value my time, so why should I stay?
It's disrespectful.
Second, the other saying is if you have to ask how much it costs, you can’t afford it. So by concealing the price, the company is telling me the cost is too high and they know it. They are telegraphing that part of their sales process requires a salesperson to coerce the money out of me.
CNET did a study [1] that showed BeeLine readers were much more likely to finish an article, but it would be great to take things to the next level and see how that translates into purchases or other conversions.
Shoot me an email! contact@[domain]
1: the CNET study is discussed at the bottom of this article http://www.theatlantic.com/technology/archive/2016/05/a-bett...
Also, note that we don't actually require (or even ask for) email addresses or anything personal when you install the browser plugin. You just click install, try it for 2 weeks, and then buy if you want. The standard pricing is $2/mo, with a discount for annual purchase.
I totally appreciate the feedback and will see how we can better communicate our pricing to alleviate concerns like this.
Having worked with Oracle's version, and seen entire systems buult out of that, I dread it.