<![CDATA[BI/Big Data Outsourcing - Blog]]>Thu, 17 Jan 2019 03:16:44 -0500Weebly<![CDATA[Devops for data warehouse project]]>Tue, 08 Jan 2019 10:19:46 GMThttp://digitecgroup.ca/blog/devops-for-data-warehouse-projectTarget audience.
This article bring some ideas to people who work on the Business Intelligence department who are involved on delivering the data warehouse, reports, ETL development.


DevOps start to become one of the hottest topic recently in the IT market. The idea behind is very simple. How to make development and operations to work together as one team. In the beginning we see that in many organisation this could not happens since Operations is another department which has his rules and process to follow. But first why we want this to happens ? the answer is to deliver fast the IT product even if the product is not fully ready or complete. The first reaction to last statement is 'What is the point to deliver an unfinished product ?'. Indeed everything start to go fast in particular with the event of Internet and the smart devices,...etc. Customer is not as it was before, you and me we buy and use different product with a usage timeline very short so our behavior start to react quick and always asking for improvement of the quality of the product. Market and the industries are changing, the competition is global, supply chain is changing, the product life cycle is changing too, communication start to be cheap and global, the language barrier start to reduce with the usage of English first and the easiest ways to learn other language, without talking with the possibility to translate near our hand that does not exist before.

​I know that you know all that and more but I need to give this speech only to put the term of DevOps on it is global context.

Now that BI stuff and business users understand that we need to react quickly to different situations and challenges what the hell this DevOps practice can help us ?.
 This is exactly what we are going to discuss in the coming articles.

A. Can we use DevOps practices on a data warehouse project ?
B. Are the Data modelers, Analysts, Architects ready for DevOps ?
C. Are BI specialist ready for DevOps ?
D. Are ETL specialist ready for DevOps ?
E. Finally are the BI department ready for DevOps ?

<![CDATA[Data science role]]>Wed, 19 Dec 2018 03:02:11 GMThttp://digitecgroup.ca/blog/data-science-role​Since the nineteenth century companies start to use Innovative technology in their business process to increase profits and reduce costs and for many other raisons. In recent years companies face a new global context whit the event of Internet, social media, and many other innovations that lead to the massive use of technology in business. Software and telecommunication eliminate many gaps and barriers between Business to Business, Business to Customer and finally Customer to Customer.
Today, organization face a new challenge to capture business data and use it wisely to improve decision and enhance operations.
Around 1990 the term “data mining” emerge in the database community but it does not create enough job opportunities if we compare it to data warehouse job market. We should notice here that depending on the size of the company this role can exist or not. Algorithms, models, advanced analytics was more used by researcher. When there is a very complex problem, we hire profiles with advanced math skills. Often this type of problem is translated to an optimization problem to solve where math and computer are blended together to find an optimal solution.
Finance and market finance were one of the early adopters and precursor on hiring highly math-oriented skills. Everybody knows the quantitative analyst (Quant) profile after the diffusion of documentary like the “Quants: The Alchemists of Wall Street” and the movie about the famous trader “The wolf of Wall Street”.
The world finance crisis of 2008 pushed these profiles to the global scene. Algorithmic trading, statistical arbitrage, high-frequency trading, are example of system based on skilled people between them we find these scientist roles.
This article does not try to encourage or discourage the use of math and algorithms in our lives, because any brick through in technology or science can be used in a good manner or in a bad manner, and it is here where the deontology and rules are adopted to avoid miss-use of these technologies.
We can say with a grain of salt that fundamental science paired with software industry are disrupting many domain and Finance was one of them.
No organization or business can claim to be able to avoid the threat new agile start-ups are bringing to the market. They recognize the revolution that’s already started. One of this new way of doing business is to base the decisions not only on domain experts but also on deep data analysis.
2005 is also the year that Hadoop was created by Yahoo! built on top of Google’s MapReduce and was the Google File System paper that was published in October 2003. Open-source Hadoop is used by a lot of organizations to store huge amounts of data and to run distributed process on this data.
Big data start and was invented to deal with the explosion of data coming from smart devices and the web. The three main dimension of big data technology are:
  • Velocity
  • Volume
  • Variety
In this global digital world companies start to face these challenges and IT is considered by many organizations who should lead the transformation and create an agile and innovation culture.
IT are very good to setup system for data collection and storage. Providing to the business BI tools to visualize and analyze data. IT starts to hire this data scientist profile which is a person able to apply science on data already collected or purchased by the company.
This job profile was ranked as the sexiest job in the world by Forbes magazine. Data science is the most wanted profile now in IT world.
Data science combines several disciplines, among them operations research, physics, statistics, data mining, neuronal networks and computer science. Depending on requirement, organization will look for some skills over others.
In the job market today, we see the following profiles that are overlapping each other on the expertise:
  • Data Analyst
  • Data Engineer
  • Machine Learning Engineer
  • Data scientist
  • Predictive Modeler
Usually these profiles are incorporated on teams to deliver a product or when they are less involved, they play an adviser role on demand.
He can also be dedicated to one project where he provides the outcomes of his analysis to the project team. His delivery can be as simple as a mathematical formula or a sophisticated algorithm.
His implication in a data-oriented project can be sometimes source of some issues. If he came from a R&D background, he works more without a precise timeline whereas project has a timeline and sometimes we need to deliver as fast as possible to avoid losing momentum and opportunities.
So how we can manage this contradiction and still get the benefit of these skilled resources and deliver value to business?
In my opinion one of the solutions is in the project scope. If the software core feature is based on the outcome of a data scientist, we should ask these questions:
  • What is the confidence that we can tolerate on the outcome?
  • Do we have an intermediate solution that the business is satisfy with?
  • What is the expected ROI of the project?
  • Do we need a perfect solution before the go to Prod?
In the early stage of a product development we should define what is the scope of the data scientist in the project. The project manager needs to set a clear and realistic expectation from the him.
Sometimes it is difficult for the PM to handle the relation since the data scientist can not guaranty the result of his analysis.
One of the solutions that a company can use when the output of the data scientist is unpredictable is to manage the project like the following:
  • A data science team will be created, which worked across company verticals. and their main objective is to do R&D in different topics which are aligned with the enterprise strategy.
  • This team publish their findings and communicate the results for all the organization.
  • Any project can use these findings as they go in their deliveries.
  • The project manager or the business should create story, if we use agile methodology, and assign them to the team during the product development.
  • If the input of the team is critical, they should be involved from the beginning of the project.
A data scientist role in this scenario develop also business knowledge focusing more on the industry and it is specific problems. He develops his domain expertise and try to solve business issues pragmatically without pursuing the perfection.
In this configuration also, the data scientist can face some challenges:
  • Dirty data which need to be cleansed, filtered and normalized.
  • Lack of management support and direction
  • Results and Insight not used by decision makers
We live in a new digital era dominated by huge volume of collected data, a pressing need or necessity to change as fast as the competitors and the customer expectations.
In 2019 we will see increasing adoption of Big Data, Artificial Intelligence (AI), Machine Learning (ML), Internet of things and Cloud solutions.
Even if companies face enormous challenges and fail in many projects related to these technologies, they should continue their digital transformation a long with the adoption of agile approach at scale.
Saying that the data scientist profession or role will continue to be one of the hottest jobs in the IT market helping companies integrating more intelligence in their day to day operations.