Saturday, May 16, 2015

Is Rust good for data mining?


Rust 1.0 is just released. This is great achievement for Rust team! I have watched Rust for some time. One phase on its website summaries this language rather well: “zero-cost abstraction”. I was attracted to Rust by Poss’s article: Rust for functional programmers. But the more I know the language, the more I find that its syntax-level similarity to Haskell and ML is only superficial, what Rust really wants to be is a system programming language with safe and fine-granularity memory management.

C has “zero-cost”, however it is hard to build high-level abstractions in it. If we program concurrency in C, we shall explicitly use a thread library (e.g. OpenMP, PThreads) or we need to go lower to implement a thread using more primitive functions at the OS level.

On the other extreme, languages such as Haskell and Scala have “abstractions”. They can build beautiful APIs that others can use and don’t care about how they are implemented. There is a cost in abstraction. The memory usage of Haskell and Scala is harder to predict than C programs. When we chain functions such as map/filter, we don’t know exactly how many objects are created as immediate objects; it depends the underlying library implementation and the optimization ability of the compiler and these two are not common knowledge to the applications programmer . The extra cost also comes from the constant hidden inside the big O – using the iterator interface would has extra cost on each .next() function call while iterating a bare array in C is much faster.

How is it possible for Rust to have the zero-cost abstraction? Basically it achieves it through memory management system. This is the real innovative part of Rust, though some theory is built in academic papers and previous small languages, Rust is the first to provide a best engineering on it. In F#, we can easily write

col |> (fun e -> e*e) |> Seq.filter (fun e -> e%3==0) |> Seq.sum

and we don’t know how memory is managed. Will map and filter create many small objects to be GCed? The F# program can ignore these questions. He can of course dig into Seq module’s source code and the final compiled IL code to know the details. But by default, the F# programmer need not care.

If the same program is written in Rust, the programmer has to control exactly how each object is created. He knows it by writing extra annotations to the code. This adds burden to the programmer’s mind!

In data mining, writing correct code for the numerics and the algorithmic logics is already hard, how would a data miner want to put the programming issues in his mind? I would not. This is of course because I am not familiar with Rust’s borrow system. I believe after enough training, my skill can reach a state of caring less and less about memory when programming data mining applications. But in the first place, why should I? Fine memory control is not the primary issue of data mining applications. If performance is not that critical, any static language such as F# and Scala would have a fine performance. Need more performance? Code in C++, allocate all the memory deterministically, and avoid big-object copy and dynamic heap allocation when critical components are running!


  1. Great Post. Keep Sharing. Visit Naxa Solutions for read more about Data Mining.

  2. Thanks a lot for all your valuable article! We are really happy about the your thoughts... SAP Training in chennai

  3. I have been following you for a couple of months now but this is my first time commenting on a blog post. Thank you for sharing your knowledge and experience with us. Keep up the good work. Already bookmarked for future reference.

    SAP training in Chennai

  4. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.

    Digital Marketing Company in Chennai

  5. This blog is having the general information. Got a creative work and this is very different one. We have to develop our creativity mind. This blog helps for this. Thank you for this blog. this is very interesting and useful.
    Web Designing Training

  6. Funny conclusion. Not because I am not familiar also
    >I am not familiar with Rust’s borrow system
    >Need more performance? Code in C++

  7. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Data Mining, kindly contact us
    MaxMunus Offer World Class Virtual Instructor led training on Data Mining. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.

    For Free Demo Contact us:
    Name : Arunkumar U
    Email :
    Skype id: training_maxmunus
    Contact No.-+91-9738507310
    Company Website –

  8. Its fantatic explaintion lot of information gather it...nice article....
    seo company in Chennai

  9. A very nice guide. I will definitely follow these tips. Thank you for sharing such detailed article. I am learning a lot from you.Nice Article!
    java training in chennai


  10. Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much

    Personal Installment Loans
    Payday Cash Advance loan
    Title Car loan
    Cash Advance Loan

  11. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic.

    Pawn Shop

    Pawn Loans

    Pawn Shops

    Pawn Loan

    Pawn Shop near me

  12. We at COEPD glad to announce that we have introduced Dot Net Technologies Internship Programs (Self sponsored) for professionals who want to have hands on experience. This program is available in COEPD Hyderabad premises which is accompanied by IT Companies. It is intelligently dedicated to our firm participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Dot Net Technologies discipline. We assume Object-Oriented Programming concepts and teaches C#.NET, ADO.NET which helps the interns to build database-driven Web applications and Web Sites successfully. This internship is designed to gain theoretical knowledge and also hands-on practice and practical know-how to master the nitty-gritty of the Dot Net developer profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

  13. Great information. Thanks for sharing such useful knowledge. Great job!
    data mining services

  14. I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog. 

    rpa training in chennai
    rpa training in bangalore
    rpa course in bangalore
    best rpa training in bangalore
    rpa online training

  15. Hey, Wow all the posts are very informative for the people who visit this site. Good work! We also have a Website. Please feel free to visit our site. Thank you for sharing. AngularJS Training in Chennai | Best AngularJS Training Institute in Chennai

  16. Thanks for your great and helpful presentation I like your good service. I always appreciate your post. That is very interesting I love reading and I am always searching for informative information like this.iot training institutes in chennai | industrial iot training chennai | iot course fees in chennai | iot certification courses in chennai